Many elliptic curves have infinitely many rational points, although the Mordell–Weil theorem assures us that the group of rational points is finitely generated. Another natural Diophantine question is that of determining how many of the rational points on a given (affine) Weierstrass equation have integral coordinates. In this chapter we prove a theorem of Siegel that says that there are only finitely many such integral points. Siegel gave two proofs of his theorem, which we present in (IX §3) and (IX §4). Both proofs make use of techniques from the theory of Diophantine approximation, and thus do not provide an effective procedure for actually finding all of the integral points. However, Siegel’s second proof reduces the problem to that of solving the so-called unit equation, which in turn can be effectively resolved using methods from transcendence theory. We discuss effective solutions, without giving proofs, in (IX §5).

Unless otherwise specified, the notation and conventions for this chapter are the same as those for Chapter VIII. In addition, we set the following notation:

H, H K :

height functions, see (VIII §5).

n v :

\(= [K_{v} : \mathbb{Q}_{v}]\), the local degree for v ∈ M K , see (VIII §5).

S :

\(\subset M_{K}\), generally a finite set of absolute values containing \(M_{K}^{\infty }\).

R S :

the ring of S-integers of K,

$$\displaystyle{R_{S} =\{ x \in K : \mbox{ $v(x) \geq 0$ for all $v \in M_{K}$ with $v\notin S$}\}.}$$
R S :

the unit group of R S .

IX.1 Diophantine Approximation

The fundamental problem in the subject of Diophantine approximation is the question of how closely an irrational number can be approximated by a rational number.

Example 1.1.

For any rational number pq, we know that the quantity \({\bigl |p/q -\sqrt{2}\,\bigr |}\) is strictly positive, and since \(\mathbb{Q}\) is dense in \(\mathbb{R}\), an appropriate choice of pq makes it as small as desired. The problem is to make it small without taking p and q to be too large. The next two elementary results illustrate this idea.

Proposition 1.2.

(Dirichlet) Let  \(\alpha \in \mathbb{R}\) with  \(\alpha \notin \mathbb{Q}\) . Then there are infinitely many rational numbers  \(p/q \in \mathbb{Q}\) such that

$$\displaystyle{\left \vert \frac{p} {q}-\alpha \right \vert \leq \frac{1} {q^{2}}.}$$

Proof.

Let Q be a (large) integer and look at the set of real numbers

$$\displaystyle{{\bigl \{q\alpha - [q\alpha ] : q = 0,1,\ldots ,Q\bigr \}},}$$

where [ ⋅  ] denotes greatest integer. Since α is irrational, this set contains Q + 1 distinct numbers in the interval between 0 and 1. Dividing the interval [0, 1] into Q equal-sized pieces and applying the pigeonhole principle, we find that there are integers 0 ≤ q 1 < q 2 ≤ Q satisfying

$$\displaystyle{{\Bigl |{\bigl (q_{1}\alpha - [q_{1}\alpha ]\bigr )} -{\bigl ( q_{2}\alpha - [q_{2}\alpha ]\bigr )}\Bigr |} \leq \frac{1} {Q}.}$$

Hence

$$\displaystyle{\left \vert \frac{[q_{2}\alpha ] - [q_{1}\alpha ]} {q_{2} - q_{1}} -\alpha \right \vert \leq \frac{1} {(q_{2} - q_{1})Q} \leq \frac{1} {(q_{2} - q_{1})^{2}}.}$$

This provides one rational approximation to α having the desired property.

Finally, having obtained a list of such approximations, let pq be the one for which \(\vert p/q -\alpha \vert \) is smallest. Then taking \(Q > \vert p/q -\alpha \vert ^{-1}\) ensures that we get a new approximation that is not already in our list. Hence there exist infinitely many rational numbers satisfying the conditions of the proposition. □ 

Remark 1.2.1.

A result of Hurwitz says that the 1∕q 2 on the right-hand side of (IX.1.2) may be replaced by \(1/(\sqrt{5}\,q^{2})\), and that this result is best possible. See, e.g., [108, Theorem 194].

Proposition 1.3.

(Liouville [151]) Let  \(\alpha \in \bar{ \mathbb{Q}}\) have degree d ≥ 2 over  \(\mathbb{Q}\) , i.e., \(\bigl [\mathbb{Q}(\alpha ) : \mathbb{Q}] = d\) . There is a constant C > 0, depending on α, such that for all rational numbers p∕q we have

$$\displaystyle{\left \vert \frac{p} {q}-\alpha \right \vert \geq \frac{C} {q^{d}}.}$$

Proof.

We may assume that \(\alpha \in \mathbb{R}\), since otherwise \(C =\mathop{ \mathrm{Im}}\nolimits (\alpha )\) works. Let

$$\displaystyle{f(T) = a_{0}T^{d} + a_{ 1}T^{d-1} + \cdots + a_{ d} \in \mathbb{Z}[T]}$$

be a minimal polynomial for α, and let

$$\displaystyle{C_{1} =\sup {\bigl \{ f'(t) :\alpha -1 \leq t \leq \alpha +1\bigr \}}.}$$

Then the mean value theorem tells us that

$$\displaystyle{\left \vert f\left (\frac{p} {q}\right )\right \vert = \left \vert f\left (\frac{p} {q}\right ) - f(\alpha )\right \vert \leq C_{1}\left \vert \frac{p} {q}-\alpha \right \vert .}$$

On the other hand, we know that \(q^{d}f(p/q) \in \mathbb{Z}\), and further that f(pq) ≠ 0, since f has no rational roots. Hence

$$\displaystyle{\left \vert q^{d}f\left (\frac{p} {q}\right )\right \vert \geq 1.}$$

Setting \(C =\min \{ C_{1}^{-1},1\}\) and combining the last two inequalities yields

$$\displaystyle{\left \vert \frac{p} {q}-\alpha \right \vert \geq \frac{C} {q^{d}}\qquad \text{for all}\ p/q \in \mathbb{Q}.}$$

 □

Remark 1.3.1.

Liouville used his theorem to prove the existence of transcendental numbers; see Exercise 9.2. Note that in Liouville’s theorem it is quite easy to find a value for the constant C explicitly in terms of \(\alpha\). This is in marked contrast to the results that we consider in the rest of this section.

Dirichlet’s theorem (IX.1.2) says that every real number can be approximated by rational numbers to within 1∕q 2, while Liouville’s result (IX.1.3) says that algebraic numbers of degree d can be approximated no closer than Cq d. For quadratic irrationalities there is little more to say, but if d ≥ 3, then it is natural to ask for the best exponent on q. There is no particular reason to restrict the approximating values to \(\mathbb{Q}\), so we allow them to vary over any fixed number field K. Finally, in measuring the closeness of the approximation, we may use any absolute value on K.

Definition.

Let τ(d) be a positive real-valued function on the natural numbers. A number field K is said to have approximation exponent τ if it has the following property:

Let \(\alpha \in \bar{K}\), let \(d ={\bigl [ K(\alpha ) : K\bigr ]}\), and let v ∈ M K be an absolute value on K that has been extended to \(K(\alpha )\) in some fashion. Then for any constant C there exist only finitely many x ∈ K satisfying the inequality

$$\displaystyle{\vert x -\alpha \vert _{v} < CH_{K}(x)^{-\tau (d)}.}$$

Liouville’s elementary estimate (IX.1.3) says that \(\mathbb{Q}\) has approximation exponent \(\tau (d) = d+\epsilon\) for any ε > 0. This result has been successively improved by a number of mathematicians:

Table 1

In view of (IX.1.2), Roth’s result is essentially best possible, although it has been conjectured that the \(\epsilon\) can be replaced by some function \(\epsilon (d)\) such that ε(d) → 0 as \(d \rightarrow \infty \). We should also mention that Mahler showed how to handle several absolute values at once, and W. Schmidt [221, Chapter VI] dealt with the more difficult problem of simultaneously approximating several irrationals.

The main ideas that go into the proof of Roth’s theorem are quite beautiful, and at least in theory, relatively elementary. Unfortunately, to develop these ideas fully would take us rather far afield. Hence rather than including a complete proof, we are content to state here the result that we will need. In (IX §8) we briefly sketch the proof of Roth’s theorem without giving any of the myriad details.

Theorem 1.4.

(Roth’s Theorem) For every ε > 0, every number field K of degree d has approximation exponent

$$\displaystyle{\tau (d) = 2 +\epsilon .}$$

Proof.

See (IX §8) for a brief sketch of the proof. A nice exposition for \(K = \mathbb{Q}\) and the usual archimedean absolute value is given in [221, Chapter V]. For the general case, see [114, Part D] or [139, Chapter 7]. □ 

Example 1.5.

How do theorems on Diophantine approximation lead to results about Diophantine equations? Consider the simple example of trying to solve the equation

$$\displaystyle{x^{3} - 2y^{3} = a}$$

in integers \(x,y \in \mathbb{Z}\), where \(a \in \mathbb{Z}\) is fixed. Suppose that (x, y) is a solution with y ≠ 0. Let \(\zeta\) be a primitive cube root of unity, and factor the equation as

$$\displaystyle{\left (\frac{x} {y} -\root{3}\of{2}\right )\left (\frac{x} {y} -\zeta \root{3}\of{2}\right )\left (\frac{x} {y} -\zeta ^{2}\root{3}\of{2}\right ) = \frac{a} {y^{3}}.}$$

The second and third factors in the product are bounded away from 0, so we obtain an estimate of the form

$$\displaystyle{\left \vert \frac{x} {y} -\root{3}\of{2}\right \vert \leq \frac{C} {y^{3}},}$$

where the constant C is independent of x and y. Now (XI.1.4), or even Thue’s original theorem with \(\tau (d) = \frac{1} {2}d + 1+\epsilon\), shows that there are only finitely many possibilities for x and y. Hence the equation

$$\displaystyle{x^{3} - 2y^{3} = a}$$

has only finitely many solutions in integers. This type of argument will reappear in the proof of (IX.4.1); see also Exercise 9.6.

Remark 1.6.

The statement of (IX.1.4) says that there exist only finitely many elements of K having a certain property. This phrasing is felicitous because the proof of (IX.1.4) is not effective. In other words, the proof does not give an effective procedure that is guaranteed to produce all of the elements in the finite set. (See (IX.8.1) for a discussion of why this is so.) We note that as a consequence, all of the finiteness results that we prove in (IX §§2, 3) are ineffective, since they rely on (IX.1.4). Similarly, the proof in (IX.1.5) yields no explicit bound for  | x | and  | y | in terms of a. However, there are other methods, based on estimates for linear forms in logarithms, that are effective. We discuss such methods, without proof, in (IX §5).

IX.2 Distance Functions

A Diophantine inequality such as

$$\displaystyle{\vert x -\alpha \vert _{v} < CH_{K}(x)^{-\tau (d)}}$$

consists of two pieces. First, there is the height function H K (x), which measures the arithmetic size of x. We have already studied height functions and their transformation properties in some detail (VIII, §§5, 6). Second, there is the quantity  | xα |  v , which is a topological or metric measure of the distance from x to α, i.e., it measures distance in the v-adic topology. In this section we define a notion of v-adic distance on curves, deduce some of its basic properties, and reinterpret the main Diophantine approximation result from (IX §1) in terms of this distance function.

Definition.

Let CK be a curve, let \(v \in M_{K}\), and fix a point \(Q \in C(K_{v})\). Choose a function \(t_{Q} \in K_{v}(C)\) that has a zero of order e ≥ 1 at Q and no other zeros.Footnote 1 Then for P ∈ C(K v ), we define the (v-adic) distance from P to Q by

$$\displaystyle{d_{v}(P,Q) =\min \left \{{\bigl |t_{Q}(P)\bigr |}_{v}^{1/e},1\right \}.}$$

(If t Q has a pole at P, we formally set \({\bigl |t_{Q}(P)\bigr |} = \infty \), so d v (P, Q) = 1.)

Remark 2.1.

In practice, we fix the point Q and use the distance function d v (P, Q) to measure the distance from P to Q as P varies. It is clear that the distance function d v has the right qualitative property, i.e., d v (P, Q) is small if P is v-adically close to Q. On the other hand, the value of d v (P, Q) certainly depends on the choice of the function t Q , so possibly a better notation would be d v (P, t Q ). However, since we will use d v only to measure the rate at which a varying point approaches a fixed point, the next result shows that the choice of t Q is irrelevant for the statements of our theorems.

Proposition 2.2.

Let  \(Q \in C(K_{v})\) and let  \(F \in K_{v}(C)\) be a function that vanishes at Q. Then the limit

$$\displaystyle{\lim _{ \begin{array}{c}P\in C(K_{v}) \\ P\mathop{\rightarrow }\limits_{v\; }Q \\ \end{array}} \frac{\log {\bigl |F(P)\bigr |}_{v}} {\log d_{v}(P,Q)} =\mathop{ \mathrm{ord}}\nolimits _{Q}(F)}$$

exists and is independent of the choice of the function t Q used to define d v (P,Q).

Here  \(\mathop{\mathrm{ord}}\nolimits _{Q}(F)\) is the order of vanishing of F at Q as in (II §2), while the notation  \(P\mathop{\rightarrow }\limits_{v\; }Q\) means that  \(P \in C(K_{v})\) approaches Q in the v-adic topology, i.e., d v (P,Q) → 0.

Proof.

Let t Q be the function vanishing only at Q that we are using to define d v ( ⋅ , Q). Let \(e =\mathop{ \mathrm{ord}}\nolimits _{Q}(t_{Q})\) and \(f =\mathop{ \mathrm{ord}}\nolimits _{Q}(F)\). Then the function \(\phi = F^{e}/t_{Q}^{f}\) has neither a zero nor a pole at Q, so \({\bigl |\phi (P)\bigr |}_{v}\) is bounded away from 0 and \(\infty \) as \(P\mathop{\rightarrow }\limits_{v\; }Q\). Hence

$$\displaystyle\begin{array}{rcl} \lim _{ \begin{array}{c}P\in C(K_{v}) \\ P\mathop{\rightarrow }\limits_{v\; }Q \\ \end{array}} \frac{\log {\bigl |F(P)\bigr |}_{v}} {\log d_{v}(P,Q)}& =& \lim _{ \begin{array}{c}P\in C(K_{v}) \\ P\mathop{\rightarrow }\limits_{v\; }Q \\ \end{array}} \frac{\log {\bigl |F(P)\bigr |}_{v}} {\log {\bigl |t_{Q}(P)\bigr |}_{v}^{1/e}} {}\\ & =& f +\lim _{ \begin{array}{c}P\in C(K_{v}) \\ P\mathop{\rightarrow }\limits_{v\; }Q \\ \end{array}} \frac{\log {\bigl |\phi (P)\bigr |}_{v}} {\log {\bigl |t_{Q}(P)\bigr |}_{v}} {}\\ & =& f. {}\\ \end{array}$$

 □

Remark 2.2.1.

The use of the function t Q in the definition of distance is somewhat artificial and does not generalize well to higher-dimensional varieties. An alternative definition that does generalize uses a finite list of functions \(t_{1},\ldots ,t_{r} \in K(E)\) with the property that each t i vanishes at Q and such that \(t_{1},\ldots ,t_{r}\) have no other common zeros. Then, if we let e i denote the order of vanishing of t i at Q, a distance function d v may be defined by

$$\displaystyle{d_{v}(P,Q) =\min {\Bigl \{\max {\bigl \{ \vert t_{1}(P)\vert _{v}^{1/e_{1} },\ldots ,\vert t_{r}(P)\vert _{v}^{1/e_{r} }\bigr \}},1\Bigr \}}.}$$

This function is an example of a local height function; see [139, Chapter 10], [114, §B.8], or [261] for further details.

Next we examine the effect of finite maps on the distance between points. The crucial observation is that this effect depends on the ramification of the map, not on its degree. To see the difference, compare (IX.2.3) with (VIII.5.6).

Proposition 2.3.

Let C 1 ∕K and C 2 ∕K be curves, and let ϕ : C 1 → C 2 be a finite map defined over K. Let  \(Q \in C_{1}(K_{v})\) , and let e ϕ (Q) be the ramification index of ϕ at Q (II §2). Then

$$\displaystyle{\lim _{ \begin{array}{c}P\in C_{1}(K_{v}) \\ P\mathop{\rightarrow }\limits_{v\; }Q \\ \end{array}}\frac{\log d_{v}{\bigl (\phi (P),\phi (Q)\bigr )}} {\log d_{v}(P,Q)} = e_{\phi }(Q).}$$

Proof.

Let \(t_{Q} \in K_{v}(C_{1})\) be a function that vanishes to order e 1 ≥ 1 at Q and has no other zeros, and similarly let \(t_{\phi (Q)} \in K_{v}(C_{2})\) be a function that vanishes to order \(e_{2} \geq 1\) at \(\phi (Q)\) and has no other zeros. It follows from the definition of ramification index that

$$\displaystyle{\mathop{\mathrm{ord}}\nolimits _{Q}t_{\phi (Q)}\circ \phi = e_{\phi }(Q)\mathop{\mathrm{ord}}\nolimits _{\phi (Q)}t_{\phi (Q)} = e_{\phi }(Q)e_{2},}$$

so the functions \((t_{\phi (Q)}\circ \phi )^{e_{1}}\) and \(t_{Q}^{e_{\phi }(P)e_{2}}\) vanish to the same order at Q. Hence the function

$$\displaystyle{f = \frac{(t_{\phi (Q)}\circ \phi )^{e_{1}}} {t_{Q}^{e_{\phi }(Q)e_{2}}} \in K_{v}(C_{1})}$$

has neither a zero nor a pole at Q. It follows that \({\bigl |f(P)\bigr |}_{v}\) is bounded away from 0 and \(\infty \) as \(P\mathop{\rightarrow }\limits_{v\; }Q\). Therefore

$$\displaystyle\begin{array}{rcl} \frac{\log d_{v}{\bigl (\phi (P),\phi (Q)\bigr )}} {\log d_{v}(P,Q)} & =& \frac{\log {\bigl |t_{\phi (Q)}{\bigl (\phi (P)\bigr )}\bigr |}_{v}^{1/e_{2}}} {\log {\bigl |t_{Q}(P)\bigr |}_{v}^{1/e_{1}}} {}\\ & =& \frac{e_{\phi }(Q)\log {\bigl |t_{Q}(P)\bigr |}_{v}^{1/e_{1}} +\log {\bigl | f(P)\bigr |}_{v}} {\log {\bigl |t_{Q}(P)\bigr |}_{v}^{1/e_{1}}} {}\\ & \longrightarrow & e_{\phi }(Q)\qquad \text{as}\ P\mathop{\rightarrow }\limits_{v\; }Q. {}\\ \end{array}$$

 □

Finally, we reinterpret Roth’s theorem (IX.1.4) in terms of distance functions.

Corollary 2.4.

(of (IX.1.4)) Fix an absolute value  \(v \in M_{K}\) . Let C∕K be a curve, let f ∈ K(C) be a nonconstant function, and let  \(Q \in C(\bar{K})\) . Then

$$\displaystyle{\liminf _{ \begin{array}{c}P\in C(K) \\ P\mathop{\rightarrow }\limits_{v\; }Q \\ \end{array}} \frac{\log d_{v}(P,Q)} {\log H_{K}{\bigl (f(P)\bigr )}} \geq -2.}$$

( If Q is not a v-adic accumulation point of C(K), then we define the \(\liminf\) to be 0. )

Proof.

Replacing f by 1∕f if necessary, we may assume that \(f(Q)\neq \infty \). (Note that \(H_{K}{\bigl ((1/f)(P)\bigr )} = H_{K}{\bigl (f(P)\bigr )}\).) The function ff(Q) vanishes at Q, say to order e, so (IX.2.2) tells us that

$$\displaystyle{\liminf _{ \begin{array}{c}P\in C(K) \\ P\mathop{\rightarrow }\limits_{v\; }Q \\ \end{array}}\frac{\log {\bigl |f(P) - f(Q)\bigr |}_{v}} {\log d_{v}(P,Q)} = e.}$$

Hence

$$\displaystyle\begin{array}{rcl} \liminf _{ \begin{array}{c}P\in C(K) \\ P\mathop{\rightarrow }\limits_{v\; }Q \\ \end{array}} \frac{\log d_{v}(P,Q)} {\log H_{K}{\bigl (f(P)\bigr )}}& =& \liminf _{ \begin{array}{c}P\in C(K) \\ P\mathop{\rightarrow }\limits_{v\; }Q \\ \end{array}}\frac{\log {\bigl |f(P) - f(Q)\bigr |}_{v}} {e\log H_{K}{\bigl (f(P)\bigr )}} {}\\ & =& \frac{1} {e}\liminf _{ \begin{array}{c}P\in C(K) \\ P\mathop{\rightarrow }\limits_{v\; }Q \\ \end{array}}\left (\frac{\log {\bigl (H_{K}{\bigl (f(P)\bigr )}^{\tau }{\bigl |f(P) - f(Q)\bigr |}_{v}\bigr )}} {\log H_{K}{\bigl (f(P)\bigr )}} -\tau \right ). {}\\ \end{array}$$

We now set \(\tau = 2+\epsilon\). Then (IX.1.4) implies that

$$\displaystyle{H_{K}{\bigl (f(P)\bigr )}^{\tau }{\bigl |f(P) - f(Q)\bigr |}_{v} \geq 1}$$

for all but finitely many \(P \in C(K)\). Therefore

$$\displaystyle{\liminf _{ \begin{array}{c}P\in C(K) \\ P\mathop{\rightarrow }\limits_{v\; }Q \\ \end{array}} \frac{\log d_{v}(P,Q)} {\log H_{K}{\bigl (f(P)\bigr )}} \geq -\frac{\tau } {e} \geq -\frac{2+\epsilon } {e} .}$$

Since ε > 0 is arbitrary and e ≥ 1, this is the desired result. □ 

IX.3 Siegel’s Theorem

In this section we prove a result of Siegel that represents a significant improvement on the Diophantine approximation result (IX.2.4).

Theorem 3.1.

(Siegel) Let E∕K be an elliptic curve with  \(\#E(K) = \infty \) . Fix a point  \(Q \in E(\bar{K})\) , a nonconstant even function  \(f \in K(E)\) , and an absolute value \(v \in M_{K(Q)}\) . Then

$$\displaystyle{\lim _{ \begin{array}{c}P\in E(K) \\ h_{f}(P)\rightarrow \infty \\ \end{array}}\frac{\log d_{v}(P,Q)} {h_{f}(P)} = 0.}$$

Remark 3.1.1.

Although we prove (IX.3.1) only for even functions, it is in fact true in general; see Exercise 9.14d.

Before proving (IX.3.1), we give some indication of its power.

Corollary 3.2.1.

Let E∕K be an elliptic curve with Weierstrass coordinate functions x and y, let  \(S \subset M_{K}\) be a finite set of places containing  \(M_{K}^{\infty }\) , and let R S be the ring of S-integers of K. Then

$$\displaystyle{{\bigl \{P \in E(K) : x(P) \in R_{S}\bigr \}}}$$

is a finite set.

Proof.

We apply (IX.3.1) with the function f = x. Suppose that there is a sequence of distinct points \(P_{1},P_{2},\ldots \in E(K)\) with every \(x(P_{i}) \in R_{S}\). The definition of height then tells us that

$$\displaystyle{h_{x}(P_{i}) = \frac{1} {[K : \mathbb{Q}]}\sum _{v\in S}\log \max {\bigl \{1,{\bigl |x(P_{i})\bigr |}_{v}^{n_{v} }\bigr \}},}$$

since the terms with \(v\notin S\) have \({\bigl |x(P_{i})\bigr |}_{v} \leq 1\). Hence we can find a particular v ∈ S and a subsequence of the P i (which we relabel as \(P_{1},P_{2},\ldots\)) such that

$$\displaystyle{h_{x}(P_{i}) \leq \#S \cdot \log {\bigl | x(P_{i})\bigr |}_{v}\qquad \mbox{ for all $i = 1,2,\ldots \,$.}}$$

(Note that \(n_{v} \leq [K : \mathbb{Q}]\).) In particular, we see that \({\bigl |x(P_{i})\bigr |}_{v} \rightarrow \infty \), and since O is the only pole of x, it follows that \(d_{v}(P_{i},O) \rightarrow 0\).

The function x has a pole of order 2 at O and no other poles, so we may take as our distance function

$$\displaystyle{d_{v}(P_{i},O) =\min {\bigl \{{\bigl | x(P_{i})\bigr |}_{v}^{-1/2},1\bigr \}}.}$$

Then, for all sufficiently large i, we have

$$\displaystyle{\frac{-\log d_{v}(P_{i},O)} {h_{x}(P_{i})} \geq \frac{1} {2\#S}.}$$

This contradicts (IX.3.1), which says that the left-hand side approaches 0 as \(i \rightarrow \infty \). □ 

It is clear that the proof of (IX.3.2.1) works for any even function, not just x, since (IX.3.1) is given for all even functions. However, it is possible to reduce the case of arbitrary (not necessarily even) functions to the special case given in (IX.3.2.1). This reduction step, which we now give, is important in its own right, since it is used both in Siegel’s second proof of finiteness (IX.4.3.1) and with the effective methods provided by linear forms in logarithms (IX.5.7).

Corollary 3.2.2.

Let C∕K be a curve of genus one, let f ∈ K(C) be a nonconstant function, and let S and R S be as in (IX.3.2.1). Then

$$\displaystyle{{\bigl \{P \in C(K) : f(P) \in R_{S}\bigr \}}}$$

is a finite set. Further, (IX.3.2.2) follows formally from (IX.3.2.1).

Proof.

We are clearly proving something stronger if we extend the field K and enlarge the set S. We may thus assume that C(K) contains a pole Q of f, and taking Q to be the identity element, we view (C, Q) as an elliptic curve defined over K. Let x and y be coordinates on a Weierstrass equation for (C, Q), which we may take in the form

$$\displaystyle{y^{2} = x^{3} + Ax + B.}$$

We have \(f \in K(C) = K(x,y)\) and \({\bigl [K(x,y) : K(x)\bigr ]} = 2\), so we can write

$$\displaystyle{f(x,y) = \frac{\phi (x) +\psi (x)y} {\eta (x)} }$$

with polynomials \(\phi (x),\psi (x),\eta (x) \in K[x]\). Further, since

$$\displaystyle{\mathop{\mathrm{ord}}\nolimits _{Q}(x) = -2,\qquad \mathop{\mathrm{ord}}\nolimits _{Q}(y) = -3,\qquad \text{and}\qquad \mathop{\mathrm{ord}}\nolimits _{Q}(f) < 0,}$$

it follows that

$$\displaystyle{2\deg \eta <\max \{ 2\deg \phi ,2\deg \psi + 3\}.}$$

(This is the condition for f to have a pole at Q.) Next we compute

$$\displaystyle{{\bigl (f\eta (x) -\phi (x)\bigr )}^{2} ={\bigl (\psi (x)y\bigr )}^{2} =\psi (x)^{2}(x^{3} + Ax + B).}$$

Writing this out as a polynomial in x with coefficients in K[f], we see that the highest power of x comes from one of the three terms \(f^{2}\eta (x)^{2}\)\(\phi (x)^{2}\)\(\psi (x)^{2}x^{3}\). From above, the first of these has lower degree in x than the latter two, while the leading terms of \(\phi (x)^{2}\) and \(\psi (x)^{2}x^{3}\) cannot cancel, since they have different degrees. (One has even degree, the other odd degree.) It follows that x satisfies a monic polynomial with coefficients in K[f], i.e., x is integral over K[f]. Multiplying this monic polynomial by an appropriate element of K to “clear denominators,” we have shown that x satisfies a relation

$$\displaystyle{a_{0}x^{N} + a_{ 1}(f)x^{N-1} + \cdots + a_{ N-1}(f)x + a_{N}(f) = 0,}$$

where a 0 ∈ R S is nonzero and \(a_{i}(f) \in R_{S}[f]\) for \(1 \leq i \leq N\). Enlarging the set S, we may assume that \(a_{0} \in R_{S}^{{\ast}}\), and then dividing the polynomial by a 0, we may assume that a 0 = 1.

Now suppose that P ∈ C(K) satisfies \(f(P) \in R_{S}\). Then P is not a pole of x, and the relation

$$\displaystyle{x(P)^{N} + a_{ 1}{\bigl (f(P)\bigr )}x(P)^{N-1} + \cdots + a_{ N-1}{\bigl (f(P)\bigr )}x(P) + a_{N}{\bigl (f(P)\bigr )} = 0}$$

shows that x(P) is integral over R S . Since also \(x(P) \in K\) and R S is integrally closed, it follows that \(x(P) \in R_{S}\). This proves that

$$\displaystyle{{\bigl \{P \in C(K) : f(P) \in R_{S}\bigr \}} \subset {\bigl \{ P \in C(K) : x(P) \in R_{S}\bigr \}},}$$

and thus the finiteness assertion in (IX.3.2.1) implies the desired finiteness result described in (IX.3.2.2). □ 

Example 3.3.

Consider the Diophantine equation

$$\displaystyle{y^{2} = x^{3} + Ax + B,}$$

where \(A,B \in \mathbb{Z}\) and \(4A^{3} + 27B^{2}\neq 0\). The corollary (IX.3.2.1) says that this equation has only finitely many solutions \(x,y \in \mathbb{Z}\). What does (IX.3.1) say in this situation, say if we take Q = Of = x, and v the archimedean absolute value on \(\mathbb{Q}\)?

Label the nonzero rational points \(P_{1},P_{2},\ldots \in E(\mathbb{Q})\) in order of nondecreasing height, and write

$$\displaystyle{x(P_{i}) = \frac{a_{i}} {b_{i}} \in \mathbb{Q}}$$

as a fraction in lowest terms. Then

$$\displaystyle\begin{array}{rcl} \log d_{v}(P_{i},O)& =& \frac{1} {2}\log \min \left \{\left \vert \frac{b_{i}} {a_{i}}\right \vert ,1\right \}, {}\\ h_{x}(P_{i})& =& \log \max {\bigl \{\vert a_{i}\vert ,\vert b_{i}\vert \bigr \}}. {}\\ \end{array}$$

(Note that the \(\frac{1} {2}\) appears because x −1 has a zero of order 2 at O.) We see from (IX.3.1) that

$$\displaystyle{\lim _{i\rightarrow \infty }\frac{\min {\bigl \{\log \vert b_{i}/a_{i}\vert ,0\bigr \}}} {\max {\bigl \{\log \vert a_{i}\vert ,\log \vert b_{i}\vert \bigr \}}} = 0.}$$

Next let Q 1 and Q 2 be the zeros of the function x, where we allow Q 1 = Q 2. Then it is not hard to check that

$$\displaystyle{\log \min {\bigl \{{\bigl |x(P)\bigr |}_{v},1\bigr \}} =\log d_{v}(P,Q_{1}) +\log d_{v}(P,Q_{2}) + O(1)\quad \mbox{ for all $P \in E(K_{v})$},}$$

where the O(1) depends on the choice of the distance functions \(d_{v}(\,\cdot \,,Q_{i})\), but is independent of P; see Exercise 9.16. Writing \(v \in M_{\mathbb{Q}}^{\infty }\) for the usual archimedean absolute value on \(\mathbb{Q}\), we use (IX.3.1) twice to obtain

$$\displaystyle\begin{array}{rcl} \lim _{i\rightarrow \infty }\frac{\min {\bigl \{\log \vert a_{i}/b_{i}\vert ,0\bigr \}}} {\max {\bigl \{\log \vert a_{i}\vert ,\log \vert b_{i}\vert \bigr \}}} & =& \lim _{i\rightarrow \infty }\frac{\log \min {\bigl \{\bigl |x(P_{i})\vert ,1\bigr\}}} {h_{x}(P_{i})} {}\\ & =& \lim _{i\rightarrow \infty }\frac{\log d_{v}(P_{i},Q_{1}) +\log d_{v}(P_{i},Q_{2}) + O(1)} {h_{x}(P_{i})} {}\\ & =& 0. {}\\ \end{array}$$

Finally, combining the limit involving b i a i with the limit involving a i b i , it is easy to deduce that

$$\displaystyle{\lim _{i\rightarrow \infty }\frac{\log \vert a_{i}\vert } {\log \vert b_{i}\vert } = 1.}$$

In other words, when looking at the x-coordinates of the rational points on an elliptic curve, we will see that the numerators and the denominators tend to have about the same number of digits. This is a much stronger assertion than (IX.3.2.1), which merely says that there are only finitely many points whose denominator is 1.

Remark 3.4.

Siegel’s theorem (IX.3.2.1) is not effective, which means that the proof does not give an explicitly computable upper bound for the height of all integral points. However, Siegel’s proof can be made quantitative in the following sense; see for example [81]:

Given a nonsingular Weierstrass equation with coefficients in a number field K and given a finite set of absolute values S, there is a constant N, which can be explicitly calculated in terms of the field K, the set S, and the coefficients of the equation, such that the equation has no more than N integral solutions.

A subtler Diophantine problem, motivated by work of Dem’janenko and posed as a general conjecture by Serge Lang, is to give an intrinsic relationship between the number of integral points and the rank of the Mordell–Weil group.

Conjecture 3.5.

(Lang [135, page 140]) Let E∕K be an elliptic curve, and choose a quasiminimal Weierstrass equation for E∕K,

$$\displaystyle{E : y^{2} = x^{3} + Ax + B.}$$

( See Exercise  \(\mbox{ 8.14}\text{c}\) . ) Let  \(S \subset M_{K}\) be a finite set of places containing  \(M_{K}^{\infty }\) , and let R S be the ring of S-integers of K. There exists a constant C, depending only on K , such that

$$\displaystyle{\#{\bigl \{P \in E(K) : x(P) \in R_{S}\bigr \}} \leq C^{\#S+\mathop{\mathrm{rank}}\nolimits E(K)}.}$$

This conjecture is known to be true if one restricts attention to elliptic curves having integral j-invariant. More generally, the following is known.

Theorem 3.6.

Let E∕K, S, and R S be as in (IX.3.5).

  1. (a)

    (Silverman [104, 262]) There is a constant C, depending only on  \([K : \mathbb{Q}]\) and on the number of places  \(v \in M_{K}^{0}\) with  \(\mathop{\mathrm{ord}}\nolimits _{v}(j_{E}) < 0\) , such that

    $$\displaystyle{\#{\bigl \{P \in E(K) : x(P) \in R_{S}\bigr \}} \leq C^{\#S+\mathop{\mathrm{rank}}\nolimits E(K)}.}$$
  2. (b)

    (Hindry–Silverman [113]) Assume that the ABC conjecture ( with any exponent ) (VIII.11.4), (VIII.11.6) is true for the field K. Then there is a constant C, depending only on  \([K : \mathbb{Q}]\) and on the constants appearing in the ABC conjecture, such that

    $$\displaystyle{\#{\bigl \{P \in E(K) : x(P) \in R_{S}\bigr \}} \leq C^{\#S+\mathop{\mathrm{rank}}\nolimits E(K)}.}$$

We turn now to the proof of (IX.3.1). In broad outline, the argument goes as follows. Our theorem on Diophantine approximation (IX.2.4) gives us a bound, in terms of the height of P, on how fast P can approach Q. Suppose now that we write \(P = [m]P' + R\) and \(Q = [m]Q' + R\). Then (IX.2.3) tells us that the distance from P′ to Q′ is about the same as the distance from P to Q, since the map P ↦ [m]P + R is unramified. On the other hand, the height of P′ is much smaller than the height of P. Now applying (IX.2.4) to P′ and Q′ gives an improved estimate, and taking m sufficiently large gives the desired result.

Proof of (IX.3.1).

Choose a sequence of distinct points P i  ∈ E(K) satisfying

$$\displaystyle{\lim _{i\rightarrow \infty }\frac{\log d_{v}(P_{i},Q)} {h_{f}(P_{i})} = L =\liminf _{ \begin{array}{c}P\in E(K) \\ h_{f}(P)\rightarrow \infty \\ \end{array}}\frac{\log d_{v}(P,Q)} {h_{f}(P)} .}$$

Since \(d_{v}(P,Q) \leq 1\) and h f (P) ≥ 0 for all points \(P \in E(K)\), we have L ≤ 0. It thus suffices to prove that L ≥ 0.

Let m be a large integer. From the weak Mordell–Weil theorem (VIII.1.1), the quotient group E(K)∕mE(K) is finite. Hence some coset contains infinitely many of the P i . Replacing {P i } by a subsequence, we may assume that

$$\displaystyle{P_{i} = [m]P_{i}' + R,}$$

where P i ′, R ∈ E(K) and where R does not depend on i. We use standard properties of height functions to compute

$$\displaystyle\begin{array}{rcl} m^{2}h_{ f}(P_{i}')& =& h_{f}{\bigl ([m]P_{i}'\bigr )} + O(1)\qquad \qquad \text{using (VIII.6.4b),} {}\\ & =& h_{f}(P_{i} - R) + O(1) {}\\ & \leq & 2h_{f}(P_{i}) + O(1)\qquad \qquad \text{using (VIII.6.4a).} {}\\ \end{array}$$

Note that the O(1) is independent of i.

We next do an analogous computation with distance functions. If P i is bounded away from Q in the v-adic topology, then \(\log d_{v}(P_{i},Q)\) is bounded, so clearly L = 0. Otherwise we can replace P i with a subsequence such that \(P_{i}\mathop{\rightarrow }\limits_{v\; }Q\). It follows that \([m]P_{i}'\mathop{\rightarrow }\limits_{v\; }Q - R\), so the sequence P i ′ accumulates to at least one of the m 2 possible m th roots of QR. Again taking a subsequence, we can find a point \(Q' \in E(\bar{K})\) satisfying

$$\displaystyle{P_{i}'\mathop{\longrightarrow }\limits_{v }Q'\qquad \text{and}\qquad Q = [m]Q' + R.}$$

We next observe that the map E → E defined by \(P\mapsto [m]P + R\) is everywhere unramified (III.4.10c), so (IX.2.3) tells us that

$$\displaystyle{\lim _{i\rightarrow \infty }\frac{\log d_{v}(P_{i},Q)} {\log d_{v}(P_{i}',Q')} = 1.}$$

Combining this with the height inequality yields

$$\displaystyle{L =\lim _{i\rightarrow \infty }\frac{\log d_{v}(P_{i},Q)} {h_{f}(P_{i})} \geq \lim _{i\rightarrow \infty } \frac{\log d_{v}(P_{i}',Q')} {\frac{1} {2}m^{2}h_{f}(P_{i}') + O(1)}.}$$

(Note that the \(\log d_{v}\) expressions are negative, which reverses the inequality.)

We now apply the theorem on Diophantine approximation (IX.2.4) to the sequence \(\{P_{i}'\} \subset E(K)\) as it converges v-adically to \(Q' \in E(\bar{K})\). This yields

$$\displaystyle{\liminf _{i\rightarrow \infty } \frac{\log d_{v}(P_{i}',Q')} {[K : \mathbb{Q}]h_{f}(P_{i}')} \geq -2.}$$

(The factor of \([K : \mathbb{Q}]\), which in any case is not important, arises because h f is the absolute height, while (IX.2.4) is stated using the relative height H K .) Combining the last two inequalities yields

$$\displaystyle{L \geq -\frac{4[K : \mathbb{Q}]} {m^{2}} .}$$

The field K is fixed, while the value of m is arbitrary, which completes the proof that L ≥ 0. □ 

IX.4 The \(\boldsymbol{S}\)-Unit Equation

The finiteness of S-integral points on elliptic curves (IX.3.2.1) is a special case of Siegel’s general result that an (affine) curve CK of genus at least one has only finitely many S-integral points; see [114, Theorem D.9.1] or [139, Chapter 8, Theorem 2.4]. Of course, for curves C of genus two or greater, Siegel’s result is superseded by Faltings’ theorem [82, 84], which asserts that the full set of rational points C(K) is finite.

Siegel gave a second proof of his theorem that applies to a restricted set of curves, but that does include all elliptic curves. This second method is important because, when combined with results from linear forms in logarithms (XI §5), it leads to an effective procedure for finding all S-integral points. In this section we describe Siegel’s alternative proof.

The idea is to reduce the problem of solving for S-integral points on a curve to the problem of solving several equations of the form

$$\displaystyle{ax + by = 1}$$

in S-units. We start with a quick sketch of how solving this S-unit equation can be reduced to a Diophantine approximation theorem such as (IX.1.4). This ineffective theorem can then be replaced by an effective estimate as described in (IX §5).

Theorem 4.1.

Let  \(S \subset M_{K}\) be a finite set of places, and let a,b ∈ K . Then the equation

$$\displaystyle{ax + by = 1}$$

has only finitely many solutions in S-units  \(x,y \in R_{S}^{{\ast}}\) .

Ineffective Proof (Sketch).

Let m be a large integer. Dirichlet’s S-unit theorem [142, V §1] implies that the quotient group \(R_{S}^{{\ast}}/(R_{S}^{{\ast}})^{m}\) is finite, so we can choose a finite set of coset representatives \(c_{1},\ldots ,c_{r} \in R_{S}^{{\ast}}\). Then any solution (x, y) to the original equation can be written as

$$\displaystyle{x = c_{i}X^{m},\qquad y = c_{ j}Y ^{m},}$$

for some \(X,Y \in R_{S}^{{\ast}}\) and some choice of c i and c j , and thus (X, Y ) is a solution to the equation

$$\displaystyle{ac_{i}X^{m} + bc_{ j}Y ^{m} = 1.}$$

Since there are only finitely many choices for c i and c j , it suffices to prove that for any \(\alpha ,\beta \in K^{{\ast}}\), the equation

$$\displaystyle{\alpha X^{m} +\beta Y ^{m} = 1}$$

has only finitely many solutions \(X,Y \in R_{S}\).

Suppose that there are infinitely many such solutions. Then, since

$$\displaystyle{H_{K}(Y ) =\prod _{v\in S}\max {\bigl \{1,\vert Y \vert _{v}^{n_{v} }\bigr \}},}$$

we can choose some \(v \in S\) so that there are infinitely many solutions satisfying

$$\displaystyle{\vert Y \vert _{v} \geq H_{K}(Y )^{1/([K:\mathbb{Q}]\#S)}.}$$

(Note that \(n_{v} \leq [K : \mathbb{Q}]\).) Let \(\gamma \in \bar{K}\) be a solution to

$$\displaystyle{\gamma ^{m} = -\beta /\alpha .}$$

We will specify later which \(m^{\text{th}}\) root to take. The idea is that if m is large enough, then XY provides too close an approximation to \(\gamma\).

We factor the left-hand side of the equation \(\alpha X^{m} +\beta Y ^{m} = 1\) to obtain

$$\displaystyle{\prod _{\zeta \in \boldsymbol{\mu }_{m}}\left (\frac{X} {Y }-\zeta \gamma \right ) = \frac{1} {\alpha Y ^{m}}.}$$

Since there are supposed to be infinitely many solutions, we may assume that H K (Y ) is large, so also  | Y |  v is large. Then from the equality

$$\displaystyle{\prod _{\zeta \in \boldsymbol{\mu }_{m}}\left \vert \frac{X} {Y }-\zeta \gamma \right \vert _{v} = \frac{1} {\vert \alpha Y ^{m}\vert _{v}},}$$

we see that XY must be close to one of the \(\zeta \gamma\) values. Replacing \(\gamma\) by the appropriate \(\zeta \gamma\), we may assume that \(\vert X/Y -\gamma \vert _{v}\) is quite small. But then \(\vert X/Y -\zeta \gamma \vert _{v}\) cannot be too small for \(\zeta \neq 1\), since

$$\displaystyle{\left \vert \frac{X} {Y }-\zeta \gamma \right \vert _{v} \geq {\bigl |\gamma (1-\zeta )\bigr |}_{v} -\left \vert \frac{X} {Y }-\gamma \right \vert _{v}.}$$

Hence we can find a constant C 1, independent of XY , such that

$$\displaystyle{\left \vert \frac{X} {Y }-\gamma \right \vert _{v} \leq \frac{C_{1}} {\vert Y \vert _{v}^{m}}.}$$

(See Exercise 9.5.) Finally, from the expression

$$\displaystyle{\alpha \left (\frac{X} {Y }\right )^{m} = \left ( \frac{1} {Y }\right )^{m}-\beta ,}$$

one easily deduces that

$$\displaystyle{H_{K}\!\left (\frac{X} {Y }\right ) \leq C_{2}H_{K}(Y ),}$$

where C 2 depends on only \(\alpha\)\(\beta\), and m. Combining all of the above estimates yields

$$\displaystyle{\left \vert \frac{X} {Y }-\gamma \right \vert _{v} \leq CH_{K}\!\left (\frac{X} {Y }\right )^{-m/([K:\mathbb{Q}]\#S)}.}$$

But if we take \(m > 2[K : \mathbb{Q}]\#S\), then Roth’s theorem (IX.1.4) says that there are only finitely many possibilities for XY . Further, since

$$\displaystyle{Y ^{m} = \left (\alpha \left (\frac{X} {Y }\right )^{m}+\beta \right )^{-1}\qquad \text{and}\qquad X = \left (\frac{X} {Y }\right )Y,}$$

each ratio XY corresponds to at most m possible pairs (X, Y ). This contradicts our original assumption that there are infinitely many solutions, which completes the proof of (IX.4.1). □ 

Remark 4.2.1.

There is a great similarity in the methods of proof for Siegel’s theorem (IX.3.1) and the S-unit equation (IX.4.1). In both cases, we start with a point in a finitely generated group, namely P ∈ E(K) for the former and \((x,y) \in R_{S}^{{\ast}}\times R_{S}^{{\ast}}\) for the latter. Next we pull back using the multiplication-by-m map in the group to produce a new solution whose height is much smaller than the original solution but that closely approximates another point defined over a finite extension of K. Finally, we invoke a theorem on Diophantine approximation, such as (IX.1.4), to complete the proof.

Remark 4.2.2.

The proof that we have given for (IX.4.1) is ineffective because it makes use of Roth’s theorem (IX.1.4). However, just as for Siegel’s theorem, it is possible to make (IX.4.1) quantitative, i.e., to give an upper bound on the number of solutions. One might expect, a priori, that such a bound would depend on the field K and on the set of primes S, but Evertse proved the following uniform result for the S-unit equation that is an analogue of Lang’s conjecture (IX.3.5) for elliptic curves. The proof, which we omit, is quite intricate.

Theorem 4.2.3.

(Evertse [80]) Let  \(S \subset M_{K}\) be a finite set of places containing  \(M_{K}^{\infty }\) , and let  \(a,b \in K^{{\ast}}\) . Then the equation

$$\displaystyle{ax + by = 1}$$

has at most  \(3 \times 7^{[K:\mathbb{Q}]+2\#S}\) solutions in S-units  \(x,y \in R_{S}^{{\ast}}\) .

To see the analogy with (IX.3.5), note that R S is a finitely generated group of rank # S − 1. Thus the bound in (IX.3.5) has the form \(C^{\mathop{\mathrm{rank}}\nolimits R_{S}^{{\ast}}+\mathop{\mathrm{rank}}\nolimits E(K)+1 }\), while the bound in (IX.4.2.3) may be written as \(C^{\mathop{\mathrm{rank}}\nolimits R_{S}^{{\ast}}+1 }\).

We next describe Siegel’s reduction of S-integral points on hyperelliptic curves to solutions of the S-unit equation. Although we do not do so, the reader should note that every step in this reduction process can be made effective.

Theorem 4.3.

(Siegel) Let  \(f(x) \in K[x]\) be a polynomial of degree d ≥ 3 with distinct roots in  \(\bar{K}\) . Then the equation

$$\displaystyle{y^{2} = f(x)}$$

has only finitely many solutions in S-integers  \(x,y \in R_{S}\) .

Proof.

We are clearly proving something stronger if we take a finite extension of K and enlarge the set S. Thus we may assume that f splits over K, say

$$\displaystyle{f(x) = a(x -\alpha _{1})(x -\alpha _{2})\cdots (x -\alpha _{d})\qquad \text{with}\ \alpha _{1},\ldots ,\alpha _{d} \in K.}$$

Enlarging S, we may assume that the following statements are true:

  1. (i)

     \(a \in R_{S}^{{\ast}}\).

  2. (ii)

     \(\alpha _{i} -\alpha _{j} \in R_{S}^{{\ast}}\) for all ij.

  3. (iii)

     R S is a principal ideal domain.

Now suppose that \(x,y \in R_{S}\) satisfy \(y^{2} = f(x)\). Let \(\mathfrak{p}\) be a prime ideal of R S . Then \(\mathfrak{p}\) divides at most one xα i , since if it divides both xα i and xα j , then it divides \(\alpha _{i} -\alpha _{j}\), contradicting (ii). Further, we see from (i) that \(\mathfrak{p}\) does not divide a. It follows from the equation

$$\displaystyle{y^{2} = a(x -\alpha _{ 1})(x -\alpha _{2})\cdots (x -\alpha _{d})}$$

that \(\mathop{\mathrm{ord}}\nolimits _{\mathfrak{p}}(x -\alpha _{i})\) is even, and since this is true for all primes, the ideal \((x -\alpha _{i})R_{S}\) is the square of an ideal in R S . From (iii) we know that R S is a principal ideal domain, so there are elements \(z_{i} \in R_{S}\) and units \(b_{i} \in R_{S}^{{\ast}}\) such that

$$\displaystyle{x -\alpha _{i} = b_{i}z_{i}^{2}\qquad \text{for}\quad i = 1,2,\ldots ,d.}$$

Now let LK be the extension of K obtained by adjoining to K the square root of every element of R S . Note that LK is a finite extension, since Dirichlet’s S-unit theorem tells us that \(R_{S}^{{\ast}}/(R_{S}^{{\ast}})^{2}\) is finite. Let \(T \subset M_{L}\) be the set of places of L lying over elements of S, and let R T be the ring of T-integers in L. By construction, each b i is a square in R T , say \(b_{i} = \beta _{i}^{2}\), so

$$\displaystyle{x -\alpha _{i} = (\beta _{i}z_{i})^{2}.}$$

Taking the difference of any two of these equations yields

$$\displaystyle{\alpha _{j} -\alpha _{i} = (\beta _{i}z_{i} -\beta _{j}z_{j})(\beta _{i}z_{i} + \beta _{j}z_{j}).}$$

Note that \(\alpha _{j} -\alpha _{i} \in R_{T}^{{\ast}}\), while each of the two factors on the right is in R T . It follows that each of these factors is a unit,

$$\displaystyle{\beta _{i}z_{i} \pm \beta _{j}z_{j} \in R_{T}^{{\ast}}\qquad \text{for}\ i\neq j.}$$

To complete the proof we use Siegel’s identity:

$$\displaystyle{\frac{\beta _{1}z_{1} \pm \beta _{2}z_{2}} {\beta _{1}z_{1} -\beta _{3}z_{3}} \mp \frac{\beta _{2}z_{2} \pm \beta _{3}z_{3}} {\beta _{1}z_{1} -\beta _{3}z_{3}} = 1.}$$

This gives two elements of R T that sum to 1, so (IX.4.1) says that there are only finitely many choices for

$$\displaystyle{\frac{\beta _{1}z_{1} + \beta _{2}z_{2}} {\beta _{1}z_{1} -\beta _{3}z_{3}}\qquad \text{and}\qquad \frac{\beta _{1}z_{1} -\beta _{2}z_{2}} {\beta _{1}z_{1} -\beta _{3}z_{3}}.}$$

Multiplying these two numbers, we find that there are only finitely many possibilities for

$$\displaystyle{ \frac{\alpha _{2} -\alpha _{1}} {(\beta _{1}z_{1} -\beta _{3}z_{3})^{2}},}$$

hence only finitely many for

$$\displaystyle{\beta _{1}z_{1} -\beta _{3}z_{3},}$$

and thus only finitely many for

$$\displaystyle{\beta _{1}z_{1} = \frac{1} {2}\left ((\beta _{1}z_{1} -\beta _{3}z_{3}) + \frac{\alpha _{3} -\alpha _{1}} {\beta _{1}z_{1} -\beta _{3}z_{3}}\right ).}$$

Finally, since

$$\displaystyle{x =\alpha _{1} + (\beta _{1}z_{1})^{2},}$$

there are only finitely many possible values for x, and each x value gives at most two y values. □ 

Corollary 4.3.1.

Let C∕K be a curve of genus one and let  \(f \in K(C)\) be a nonconstant function. Then there are only finitely many points  \(P \in C(K)\) such that  \(f(P) \in R_{S}\) .

Proof.

The reduction procedure described in (IX.3.2.2) says that it suffices to consider the case that f is the x-coordinate of a Weierstrass equation. The case f = x is covered by (IX.4.3). □ 

IX.5 Effective Methods

In 1949, Gelfond and Schneider independently solved Hilbert’s problem concerning the transcendence of \(2^{\sqrt{2}}\). They actually proved the following strong transcendence criterion.

Theorem 5.1.

(Gelfond, Schneider) Let  \(\alpha ,\beta \in \bar{ \mathbb{Q}}\) with  \(\alpha \neq 0,1\) and  \(\beta \notin \mathbb{Q}\) . Then  \(\alpha ^{\beta }\) is transcendental.

Gelfond rephrased his result in terms of logarithms: If \(\alpha _{1},\alpha _{2} \in \bar{ \mathbb{Q}}^{{\ast}}\) and if \(\log \alpha _{1}\) and \(\log \alpha _{2}\) are linearly independent over \(\mathbb{Q}\), then they are linearly independent over \(\bar{\mathbb{Q}}\). He further showed that it is possible to give an explicit lower bound for

$$\displaystyle{\vert \beta _{1}\log \alpha _{1} + \beta _{2}\log \alpha _{2}\vert }$$

whenever this quantity is nonzero, and he noted that many Diophantine problems could be solved effectively if one knew an analogous result for sums of arbitrarily many logarithms. Alan Baker proved such a theorem in 1966. The proof is quite involved, so we are content to quote the following version.

Theorem 5.2.

(Baker) Let  \(\alpha _{1},\ldots ,\alpha _{n} \in K^{{\ast}}\) and let  \(\beta _{1},\ldots ,\beta _{n} \in K\) . For any constant κ, define

$$\displaystyle{\tau (\kappa ) =\tau (\kappa ;\alpha _{1},\ldots ,\alpha _{n},\beta _{1},\ldots ,\beta _{n}) = h{\bigl ([1,\beta _{1},\ldots ,\beta _{n}]\bigr )}h{\bigl ([1,\alpha _{1},\ldots ,\alpha _{n}]\bigr )}^{\kappa }.}$$

N.B. These are logarithmic height functions. Fix an embedding  \(K \subset \mathbb{C}\) and let | ⋅ | be the corresponding absolute value. Assume that

$$\displaystyle{\beta _{1}\log \alpha _{1} + \cdots + \beta _{n}\log \alpha _{n}\neq 0.}$$

Then there are effectively computable constants C > 0 and κ > 0, depending only on n and  \([K : \mathbb{Q}]\) , such that

$$\displaystyle{\vert \beta _{1}\log \alpha _{1} + \cdots + \beta _{n}\log \alpha _{n}\vert > C^{-\tau (\kappa )}.}$$

Proof.

See [11] or [135, VIII, Theorem 1.1]. □ 

Remark 5.2.1.

We have restricted ourselves in (XI.5.2) to the case of the archimedean absolute value. There are analogous results in the nonarchimedean case, although minor technical difficulties arise due to the fact that the \(\mathfrak{p}\)-adic logarithm is defined only in a neighborhood of 1. See (IX.5.6) for a further discussion.

It is not immediately clear how Baker’s theorem (IX.5.2) can be applied to give a bound for the solutions of the S-unit equation. We start with an elementary lemma; see also Exercise 9.8.

Lemma 5.3.

Let V be a finite-dimensional vector space over  \(\mathbb{R}\) . Given any basis  \(\mathbf{e} =\{ e_{1},\ldots ,e_{n}\}\) for V , let  \(\|\,\cdot \,\|_{\mathbf{e}}\) be the sup norm with respect to  \(\mathbf{e}\) , i.e.,

$$\displaystyle{\|x\|_{\mathbf{e}} = \left \|\sum x_{i}e_{i}\right \|_{\mathbf{e}} =\max {\bigl \{ \vert x_{i}\vert \bigr \}}.}$$

Let  \(\mathbf{f} =\{ f_{1},\ldots ,f_{n}\}\) be another basis for V . There are positive constants c 1 and c 2 , depending on  \(\mathbf{e}\) and  \(\mathbf{f}\) , such that for all x ∈ V ,

$$\displaystyle{c_{1}\|x\|_{\mathbf{e}} \leq \| x\|_{\mathbf{f}} \leq c_{2}\|x\|_{\mathbf{e}}.}$$

Proof.

Let \(A = (a_{ij})\) be the change of basis matrix from \(\mathbf{e}\) to \(\mathbf{f}\), so \(e_{i} =\sum _{j}a_{ij}f_{j}\), and let \(\|A\| =\max {\bigl \{ \vert a_{ij}\vert \bigr \}}\). Then for any \(x =\sum _{i}x_{i}e_{i} \in V\) we have \(x =\sum _{i,j}x_{i}a_{ij}f_{j}\), so

$$\displaystyle{\|x\|_{\mathbf{f}} =\max _{j}{\biggl \{{\biggl |\sum _{i}x_{i}a_{ij}\biggr |}\biggr \}} \leq n\max _{i,j}{\bigl \{\vert a_{ij}\vert \bigr \}}\max _{i}{\bigl \{\vert x_{i}\vert \bigr \}} = n\|A\|\,\|x\|_{\mathbf{e}}.}$$

This gives one inequality, and the other follows by symmetry. □ 

We apply (IX.5.3) to the following situation. Let \(S \subset M_{K}\) be a finite set of places containing \(M_{K}^{\infty }\), let \(s = \#S\), and choose a basis \(\alpha _{1},\ldots ,\alpha _{s-1}\) for the free part of R S . Then every \(\alpha \in R_{S}^{{\ast}}\) can be written uniquely as

$$\displaystyle{\alpha =\zeta \alpha _{ 1}^{m_{1} }\cdots \alpha _{s-1}^{m_{s-1} }}$$

with integers \(m_{1},\ldots ,m_{s-1}\) and a root of unity ζ. Define the size of α (relative to \(\{\alpha _{1},\ldots ,\alpha _{s-1}\}\)) by

$$\displaystyle{m(\alpha ) =\max {\bigl \{ \vert m_{i}\vert \bigr \}}.}$$

Lemma 5.4.

With notation as above, there are positive constants c 1 and  \(c_{2}\) , depending only on K and S, such that every  \(\alpha \in R_{S}^{{\ast}}\) satisfies

$$\displaystyle{c_{1}h(\alpha ) \leq m(\alpha ) \leq c_{2}h(\alpha ).}$$

Proof.

Let \(S =\{ v_{1},\ldots ,v_{s}\}\) and, to ease notation, let \(n_{i} = n_{v_{i}}\) be the local degree corresponding to v i . We consider the S-regulator homomorphism

$$\displaystyle{\rho _{S} : R_{S}^{{\ast}}\longrightarrow \mathbb{R}^{s},\qquad \alpha \longmapsto {\bigl (n_{ 1}v_{1}(\alpha ),\ldots ,n_{s}v_{s}(\alpha )\bigr )}.}$$

Note that the image of ρ S lies in the hyperplane \(H =\{ x_{1} + \cdots + x_{s} = 0\}\), and Dirichlet’s S-unit theorem says that the image of ρ S spans H. Let \(\|\,\cdot \,\|_{1}\) be the sup norm on \(\mathbb{R}^{s}\) relative to the standard basis, and let \(\|\,\cdot \,\|_{2}\) be the sup norm relative to the basis

$$\displaystyle{{\bigl \{\rho _{S}(\alpha _{1}),\ldots ,\rho _{S}(\alpha _{s-1}),(1,1,\ldots ,1)\bigr \}}.}$$

Here \(\rho _{S}(\alpha _{1}),\ldots ,\rho _{S}(\alpha _{s-1})\) span H, and we have added one extra vector in order to span all of \(\mathbb{R}^{s}\). From (IX.5.3) we find positive constants c 1 and c 2 such that

$$\displaystyle{c_{1}\|x\|_{1} \leq \| x\|_{2} \leq c_{2}\|x\|_{1}\qquad \text{for all}\ x \in \mathbb{R}^{s}.}$$

Now let \(\alpha \in R_{S}^{{\ast}}\) and write \(\rho _{S}(\alpha ) =\sum m_{i}\rho _{S}(\alpha _{i})\). Then directly from the definitions we have

$$\displaystyle\begin{array}{rcl} {\bigl \|\rho _{S}(\alpha )\bigr \|}_{2}& =& \max {\bigl \{\vert m_{i}\vert \bigr \}} = m(\alpha ), {}\\ {\bigl \|\rho _{S}(\alpha )\bigr \|}_{1}& =& \max {\bigl \{n_{i}\vert v_{i}(\alpha )\vert \bigr \}}, {}\\ h_{K}(\alpha )& =& \sum \max {\bigl \{0,-n_{i}v_{i}(\alpha )\bigr \}}. {}\\ \end{array}$$

(Note that the sum for h K (α) needs to include only the absolute values in S, since by assumption \(v(\alpha ) = 0\) for all \(v\notin S\).) It remains to compare \({\bigl \|\rho _{S}(\alpha )\bigr \|}_{1}\) and \(h_{K}(\alpha )\).

In general, for any \(x = (x_{1},\ldots ,x_{s}) \in H\), we can compare \(\|x\|_{1}\) to the height \(h(x) =\sum \max \{ 0,-x_{i}\}\). First, since \(\max \{0,-x_{i}\} \leq \vert x_{i}\vert \), we have the obvious estimate

$$\displaystyle{h(x) \leq s\|x\|_{1}.}$$

On the other hand, if we sum the identity

$$\displaystyle{x_{i} =\max \{ 0,x_{i}\} -\max \{ 0,-x_{i}\}}$$

for \(1 \leq i \leq s\) and use the fact that x ∈ H, i.e., \(\sum x_{i} = 0\), we obtain

$$\displaystyle{0 = h(-x) - h(x),}$$

and hence \(h(-x) = h(x)\). This allows us to compute

$$\displaystyle\begin{array}{rcl} 2h(x)& =& h(x) + h(-x) {}\\ & =& \sum {\bigl (\max \{0,-x_{i}\} +\max \{ 0,x_{i}\}\bigr )} {}\\ & =& \sum \vert x_{i}\vert {}\\ &\geq & \max {\bigl \{\vert x_{i}\vert \bigr \}} {}\\ & =& \|x\|_{1}. {}\\ \end{array}$$

Thus \(\frac{1} {2}\|x\|_{1} \leq h(x) \leq s\|x\|_{1}\), and combining this with the earlier estimates gives the desired result,

$$\displaystyle{(c_{1}/s)h_{K}(\alpha ) \leq m(\alpha ) \leq 2c_{2}h_{K}(\alpha ).}$$

 □

We now have the tools needed to show how solving the S-unit equation can be reduced to the problem of giving bounds for linear forms in logarithms.

Theorem 5.5.

Fix a,b ∈ K . There exists an effectively computable constant \(C = C(K,S,a,b)\) such that any solution  \((\alpha ,\beta ) \in R_{S}^{{\ast}}\times R_{S}^{{\ast}}\) to the S-unit equation

$$\displaystyle{a\alpha + b\beta = 1}$$

satisfies H(α) < C.

Proof.

Let \((\alpha ,\beta )\) be a solution and choose the absolute value v in S for which \(\vert \alpha \vert _{v}\) is largest. Then, since \(\vert \alpha \vert _{w} = 1\) for all \(w\notin S\), we have

$$\displaystyle{\vert \alpha \vert _{v}^{[K:\mathbb{Q}]s} \geq \prod _{ w\in S}\max {\bigl \{1,\vert \alpha \vert _{w}^{n_{w} }\bigr \}} = H_{K}(\alpha ),}$$

and hence

$$\displaystyle{\vert \alpha \vert _{v} \geq H(\alpha )^{1/s}.}$$

(Here, as usual, s = # S.)

To simplify our discussion, we will assume that v is archimedean, which is certainly true if, for example, \(S = M_{K}^{\infty }\). (For arbitrary S, see the discussion in (IX.5.6).) The mean value theorem applied to the function \(\log (x)\) yields

$$\displaystyle{\left \vert \frac{\log x -\log y} {x - y}\right \vert \leq \frac{1} {\min {\bigl \{\vert x\vert ,\vert y\vert \bigr \}}}.}$$

We apply this inequality with \(x = a\alpha\) and \(y = -b\beta\), so \(x - y = 1\), and we obtain

$$\displaystyle\begin{array}{rcl} \vert \log a\alpha -\log b\beta \vert & \leq & \min {\bigl \{\vert a\alpha \vert ,\vert a\alpha - 1\vert \bigr \}}^{-1} {}\\ & \leq & 2{\bigl (\vert a\vert H(\alpha )^{1/s}\bigr )}^{-1}. {}\\ \end{array}$$

(For the last line, we have assumed that \(\vert \alpha \vert > 2/\vert a\vert \), since otherwise we have the excellent bound \(H(\alpha ) \leq \vert \alpha \vert ^{s} \leq (2/\vert a\vert )^{s}\).)

Let \(\alpha _{1},\ldots ,\alpha _{s-1}\) be a basis for R S , and write

$$\displaystyle{\alpha =\zeta \alpha _{ 1}^{m_{1} }\cdots \alpha _{s-1}^{m_{s-1} }\qquad \text{and}\qquad \beta =\zeta '\alpha _{1}^{m'_{1} }\cdots \alpha _{s-1}^{m'_{s-1} }.}$$

Substituting this into the previous inequality yields

$$\displaystyle{\left \vert \sum _{i=1}^{s-1}(m_{ i} - m_{i}')\log \alpha _{i} +\log \left (\frac{a\zeta } {b\zeta '}\right )\right \vert \leq \frac{c_{1}} {H(\alpha )^{1/s}},}$$

where here and in what follows, the constants \(c_{1},c_{2},\ldots\) are effectively computable and depend only on KSa, and b.

From the equality \(a\alpha + b\beta = 1\), it is easy to obtain an estimate

$$\displaystyle{{\bigl |h(\alpha ) - h(\beta )\bigr |} \leq c_{2},}$$

and applying (IX.5.4) yields

$$\displaystyle{c_{3}m(\alpha ) \leq m(\beta ) \leq c_{4}m(\alpha ).}$$

(Clearly we may assume that \(m(\alpha ) \geq 1\) and \(m(\beta ) \geq 1\).) In particular,

$$\displaystyle{\vert m_{i} - m_{i}'\vert \leq m(\alpha ) + m(\beta ) \leq c_{5}h(\alpha ).}$$

Letting \(q_{i} = m_{i} - m_{i}'\) and \(\gamma = a\zeta /b\zeta '\) to ease notation, we have the inequality

$$\displaystyle{\vert q_{1}\log \alpha _{1} + \cdots + q_{s-1}\log \alpha _{s-1} +\log \gamma \vert \leq c_{1}H(\alpha )^{-1/s}.}$$

We now apply Baker’s theorem (IX.5.2). This gives a lower bound of the form

$$\displaystyle{\vert q_{1}\log \alpha _{1} + \cdots + q_{s-1}\log \alpha _{s-1} +\log \gamma \vert \geq c_{6}^{-\tau },}$$

where

$$\displaystyle{\tau = h{\bigl ([1,q_{1},\ldots ,q_{s-1}]\bigr )}h{\bigl ([1,\alpha _{1},\ldots ,\alpha _{s-1},\gamma ]\bigr )}^{\kappa }}$$

and κ is a constant depending only on K and s. But from above,

$$\displaystyle{h{\bigl ([1,q_{1},\ldots ,q_{s-1}]\bigr )} =\log \max {\bigl \{ 1,\vert q_{1}\vert ,\ldots ,\vert q_{s-1}\vert \bigr \}}\leq \log {\bigl ( c_{5}h(\alpha )\bigr )}.}$$

Combining the upper and lower bounds for the linear form in logarithms and using this estimate yields

$$\displaystyle{c_{7}^{\log (c_{5}h(\alpha ))} \leq c_{ 1}H(\alpha )^{1/s}.}$$

(Note that the basis \(\alpha _{1},\ldots ,\alpha _{s-1}\) depends only on the field K and the set S, so we have absorbed the \(h{\bigl ([1,\alpha _{1},\ldots ,\alpha _{s-1},\gamma ]\bigr )}^{\kappa }\) into c 7.) Now a little bit of algebra gives

$$\displaystyle{H(\alpha ) \leq c_{8}h(\alpha )^{c_{9} },}$$

and since \(h(\alpha ) =\log H(\alpha )\), this implies the desired bound for \(H(\alpha )\). □ 

Remark 5.6.

In order to apply the argument given in (IX.5.5) to a nonarchimedean absolute value, it is necessary to make some minor technical alterations. The main difficulty is that the logarithm function in the \(\mathfrak{p}\)-adic setting converges only in a neighborhood of 1. What one does is to take a subgroup of finite index in \(R_{S}^{{\ast}}\) that is generated by S-units that are \(\mathfrak{p}\)-adically close to 1, together with a uniformizer at \(\mathfrak{p}\). Then, assuming that \(\vert \alpha \vert _{\mathfrak{p}}\) is sufficiently large, one shows that \(a\alpha /b\beta\) is \(\mathfrak{p}\)-adically close to 1. Now applying the above argument to some power of \(a\alpha /b\beta\) gives a well-defined linear form in \(\mathfrak{p}\)-adic logarithms, and from then on the argument goes just the same. For the final step, of course, one must use a \(\mathfrak{p}\)-adic analogue of Baker’s theorem. For further details on the reduction step, see for example [135, VI §1].

Remark 5.7.

In order to obtain an effective bound for the points on an elliptic curve satisfying \(f(P) \in R_{S}\), where f is an arbitrary nonconstant function, it is necessary to make the reduction step given in (IX.3.2.2) effective. This essentially involves giving an effective version of the Riemann–Roch theorem, which has been done by Coates [48]. As the reader might guess from the number of reduction steps involved, the effective bounds that come out of the proofs are quite large. To indicate the magnitudes involved, we quote two results; see also (IX.7.2), and (IX.7.4).

Theorem 5.8.

  1. (a)

    (Baker [11, page 45]) Let  \(A,B,C,D \in \mathbb{Z}\) satisfy

    $$\displaystyle{\max {\bigl \{\vert A\vert ,\vert B\vert ,\vert C\vert ,\vert D\vert \bigr \}}\leq H,}$$

    and assume that

    $$\displaystyle{E : Y ^{2} = AX^{3} + BX^{2} + CX + D}$$

    is an elliptic curve. Then any point  \(P = (x,y) \in E(\mathbb{Q})\) with  \(x,y \in \mathbb{Z}\) satisfies

    $$\displaystyle{\max {\bigl \{\vert x\vert ,\vert y\vert \bigr \}} <\exp \left ((10^{6}H)^{10^{6} }\right ).}$$
  2. (b)

    (Baker–Coates [12]) Let  \(F(X,Y ) \in \mathbb{Z}[X,Y ]\) be an absolutely irreducible polynomial such that the curve F(X,Y ) = 0 has genus one. Let n be the degree of F, and assume that the coefficients of F all have absolute value at most H. Then any solution to F(x,y) = 0 with  \(x,y \in \mathbb{Z}\) satisfies

    $$\displaystyle{\max {\bigl \{\vert x\vert ,\vert y\vert \bigr \}} <\exp \exp \exp \left ((2H)^{10^{n^{10} } }\right ).}$$

Remark 5.8.1.

There is an extensive literature on effective bounds for S-integral solutions to equations of the form y m = f(x); see for example [32, 96, 131, 268, 279, 301]. To quote one instance, we mention that [301] improves (IX.5.8a) to

$$\displaystyle{\max {\bigl \{\vert x\vert ,\vert y\vert \bigr \}}\leq \exp \left (cH^{270}(\log H)^{54}\right )}$$

for an absolute constant c.

Linear Forms in Elliptic Logarithms

Rather than reducing the problem of integral points on an elliptic curve to the question of solutions to the S-unit equation, and thence as above to bounds for linear forms in logarithms, one can instead work directly with the analytic parametrization of the elliptic curve. We briefly indicate how this is done in the simplest case.

Let \(E/\mathbb{Q}\) be an elliptic curve given by a Weierstrass equation

$$\displaystyle{E : y^{2} = 4x^{3} - g_{ 2}x - g_{3}\qquad \text{with}\quad g_{2},g_{3} \in \mathbb{Z}.}$$

We are interested in bounding the height of points \(P \in E(\mathbb{Q})\) that satisfy \(x(P) \in \mathbb{Z}\). Let

$$\displaystyle{\phi : \mathbb{C}/\varLambda \longrightarrow E(\mathbb{C})}$$

be the analytic parametrization of \(E(\mathbb{C})\) given by the Weierstrass \(\wp \)-function and its derivative (VI.5.1.1). We fix a basis \(\{\omega _{1},\omega _{2}\}\) for the lattice \(\varLambda\). Let

$$\displaystyle{\psi : E(\mathbb{C})\longrightarrow \mathbb{C}}$$

be the map that is inverse to \(\phi\) and takes values in the fundamental parallelogram spanned by \(\omega _{1}\) and \(\omega _{2}\), shifted to be centered at 0. The map \(\phi\) is the elliptic exponential map, and choosing a fundamental domain for the elliptic logarithm map ψ is analogous to choosing a principal value for the ordinary logarithm function \(\log : \mathbb{C}^{{\ast}}\rightarrow \mathbb{C}\). (The analogy becomes even clearer if we identify \(\mathbb{C}^{{\ast}}\) with \(\mathbb{C}/\mathbb{Z}\).)

Fix a basis \(P_{1},\ldots ,P_{r}\) for the free part of \(E(\mathbb{Q})\). Given any point \(P \in E(\mathbb{Q})\), we can write

$$\displaystyle{P = q_{1}P_{1} + \cdots + q_{r}P_{r} + T}$$

with integers \(q_{1},\ldots ,q_{r}\) and a torsion point \(T \in E_{\text{tors}}(\mathbb{Q})\). It follows that

$$\displaystyle{\psi (P) = q_{1}\psi (P_{1}) + \cdots + q_{r}\psi (P_{r}) +\psi (T)\pmod \varLambda ,}$$

so there are integers m 1 and m 2 such that

$$\displaystyle{\psi (P) = q_{1}\psi (P_{1}) + \cdots + q_{r}\psi (P_{r}) +\psi (T) + m_{1}\omega _{1} + m_{2}\omega _{2}.}$$

Suppose now that P is a large integral point, i.e., \(x(P) \in \mathbb{Z}\) and \({\bigl |x(P)\bigr |}\) is large. Then P is close to O in the complex topology on \(E(\mathbb{C})\), so ψ(P) is close to 0. More precisely, since \(\wp (z) = x{\bigl (\phi (z)\bigr )}\) behaves like z −2 for z close to 0, we see that

$$\displaystyle{{\bigl |\psi (P)\bigr |}^{2} \leq c_{ 1}{\bigl |x(P)\bigr |}^{-1} = c_{ 1}H{\bigl (x(P)\bigr )}^{-1}.}$$

We are using the fact that if \(x \in \mathbb{Z}\) with \(x\neq 0\), then H(x) =  | x | . The constant c 1 depends on g 2 and g 3, but not on P.

On the other hand, since the canonical height is quadratic and positive definite from (VIII.9.3) and (VIII.9.6), we can estimate

$$\displaystyle\begin{array}{rcl} \log H{\bigl (x(P)\bigr )}& =& h_{x}(P) = 2\hat{h}(P) + O(1) {}\\ & =& 2\hat{h}{\Bigl (\sum q_{i}P_{i} + T\Bigr )} + O(1) {}\\ & \geq & c_{2}\max {\bigl \{\vert q_{i}\vert \bigr \}}^{2}, {}\\ \end{array}$$

where c 2 depends on E and the choice of the basis \(P_{1},\ldots ,P_{r}\). (See Exercise 9.8.) Substituting this above, we obtain an upper bound for our linear form in elliptic logarithms,

$$\displaystyle{{\bigl |q_{1}\psi (P_{1}) + \cdots + q_{r}\psi (P_{r}) +\psi (T) + m_{1}\omega _{1} + m_{2}\omega _{2}\bigr |} \leq c_{3}^{-\max \{\vert q_{i}\vert \}^{2} }.}$$

Further, since \(\omega _{1}\) and \(\omega _{2}\) are \(\mathbb{R}\)-linearly independent, it is easy to see that

$$\displaystyle{\max {\bigl \{\vert m_{1}\vert ,\vert m_{2}\vert \bigr \}}\leq c_{4}\max {\bigl \{\vert q_{i}\vert \bigr \}},}$$

where c 4 depends on E\(\{P_{i}\}\)\(\omega _{1}\), and \(\omega _{2}\). Thus, if we let

$$\displaystyle{q =\max {\bigl \{ \vert q_{1}\vert ,\ldots ,\vert q_{r}\vert ,\vert m_{1}\vert ,\vert m_{2}\vert \bigr \}},}$$

then we obtain the estimate

$$\displaystyle{{\bigl |q_{1}\psi (P_{1}) + \cdots + q_{r}\psi (P_{r}) +\psi (T) + m_{1}\omega _{1} + m_{2}\omega _{2}\bigr |} \leq c_{5}^{-q^{2} }.}$$

Now the desired finiteness result follows if we can find a lower bound for the left-hand side having the form \(C^{-\tau (q)}\) with \(\tau (q)/q^{2} \rightarrow 0\) as \(q \rightarrow \infty \). The first effective estimate of this sort was proven by Masser [159] in the case that E has complex multiplication. The general case was proven by Wüstholz [313, 314], who had to overcome significant technical difficulties associated with the necessary zero and multiplicity estimates.

It remains to discuss the question of effectivity. The reduction to linear forms in ordinary logarithms via the S-unit equation is fully effective. It is possible to give an explicit upper bound for the height of any S-integral point of E(K) in easily computed quantities associated to KS, and E. One of these quantities, for example, is a bound for the heights of generators of the unit group R S . In the analogous reduction to linear forms in elliptic logarithms, we similarly use a set of generators of the Mordell–Weil group E(K), and the bound for the integral points depends on the heights of these generators. Unfortunately, as we have noted in (VIII.3.2) (see also Chapter X), the proof of the Mordell–Weil theorem is not effective. Thus although the approach to integral points on elliptic curves via elliptic logarithms is more natural than the roundabout route through the S-unit equation, it is likely to remain ineffective until an effective proof of the Mordell–Weil theorem is found. On the other hand, we should mention that if one is able to find a basis for the Mordell–Weil group, for example using the techniques in Chapter X, then the method of elliptic logarithms often provides the best known algorithm for finding the integral points on a given elliptic curve. See for example [58, 59, 96, 268, 279, 315].

IX.6 Shafarevich’s Theorem

Recall that an elliptic curve EK has good reduction at a finite place \(v \in M_{K}\) if it has a Weierstrass equation whose coefficients are v-integral and whose discriminant is a v-adic unit (VII §5).

Theorem 6.1.

(Shafarevich [242]) Let  \(S \subset M_{K}\) be a finite set of places containing  \(M_{K}^{\infty }\) . Then up to isomorphism over K, there are only finitely many elliptic curves E∕K having good reduction at all primes not in S.

Proof.

Clearly we are proving something stronger if we enlarge S, so we may assume that S contains all primes of K lying over 2 and 3. Enlarging S further, we may also assume that the ring of S-integers R S has class number one.

Under these assumptions, we see from (VIII.8.7) that any elliptic curve EK has a Weierstrass equation of the form

$$\displaystyle{E : y^{2} = x^{3} + Ax + B,\qquad A,B \in R_{ S},}$$

with discriminant \(\varDelta = -16(4A^{3} + 27B^{2})\) satisfying

$$\displaystyle{\varDelta R_{S} = \mathcal{D}_{E/K}R_{S}.}$$

Here \(\mathcal{D}_{E/K}\) is the minimal discriminant of EK; see (VIII §8). If we further assume that E has good reduction outside S, then \(\mathop{\mathrm{ord}}\nolimits _{v}(\mathcal{D}_{E/K}) = 0\) for all places \(v\notin S\), so \(\varDelta\) is in R S .

Assume now that we are given a list of elliptic curves \(E_{1}/K,E_{2}/K,\ldots \,\), each of which has good reduction outside of S. We associate to each E i a Weierstrass equation as above, say with coefficients \(A_{i},B_{i} \in R_{S}\) and discriminant \(\varDelta _{i} \in R_{S}^{{\ast}}\). Breaking the sequence of E i into finitely many subsequences according to the residue class of \(\varDelta _{i}\) in the finite group \(R_{S}^{{\ast}}/(R_{S}^{{\ast}})^{12}\), we may replace the original sequence with an infinite subsequence satisfying \(\varDelta _{i} = CD_{i}^{12}\) for a fixed C and with \(D_{i} \in R_{S}^{{\ast}}\).

The relation \(\varDelta = -16(4A^{3} + 27B^{2})\) implies that for each i, the point

$$\displaystyle{\left (-\frac{12A_{i}} {D_{i}^{4}} , \frac{108B_{i}} {D_{i}^{6}} \right )}$$

is an S-integral point on the elliptic curve

$$\displaystyle{Y ^{2} = X^{3} - 27C.}$$

Siegel’s theorem (IX.3.2.1) says that there are only finitely many such points, and thus only finitely many possibilities for \(A_{i}/D_{i}^{4}\) and \(B_{i}/D_{i}^{6}\). However, if

$$\displaystyle{ \frac{A_{i}} {D_{i}^{4}} = \frac{A_{j}} {D_{j}^{4}}\qquad \text{and}\qquad \frac{B_{i}} {D_{i}^{6}} = \frac{B_{j}} {D_{j}^{6}},}$$

then the change of variables

$$\displaystyle{x = (D_{i}/D_{j})^{2}x',\qquad y = (D_{ i}/D_{j})^{3}y',}$$

gives a K-isomorphism from E i to E j . Hence the sequence \(E_{1},E_{2},\ldots\) contains only finitely many K-isomorphism classes of elliptic curves. □ 

Example 6.1.1.

There are no elliptic curves \(E/\mathbb{Q}\) having everywhere good reduction; see Exercise 8.15. There are 24 curves \(E/\mathbb{Q}\) having good reduction outside of \(\{2\}\) and 784 curves \(E/\mathbb{Q}\) having good reduction outside of {2, 3}; for the complete list, see [19, Table 4]. Similar lists have been compiled for various quadratic fields; see for example [147] or [204].

Shafarevich’s theorem (IX.6.1) has a number of important applications. We content ourselves with the following two corollaries.

Corollary 6.2.

Fix an elliptic curve E∕K. Then there are only finitely many elliptic curves E′∕K that are K-isogenous to E.

Proof.

If E and E′ are isogenous over K, then (VII.7.2) says that E and E′ have the same set of primes of bad reduction. Now apply (IX.6.1). □ 

Corollary 6.3.

(Serre) Let E∕K be an elliptic curve with no complex multiplication. Then for all but finitely many primes  \(\ell\) , the group of  \(\ell\) -torsion points  \(E[\ell]\) has no nontrivial  \(G_{\bar{K}/K}\) -invariant subgroups. ( In other words, the representation of  \(G_{\bar{K}/K}\) on  \(E[\ell]\) is irreducible. )

Proof.

Suppose that \(\varPhi _{\ell} \subset E[\ell]\) is a nontrivial \(G_{\bar{K}/K}\)-invariant subgroup of \(E[\ell]\). We know that \(E[\ell]\mathop{\cong}(\mathbb{Z}/\ell\mathbb{Z})^{2}\), so \(\varPhi _{\ell}\) is necessarily cyclic of order \(\ell\). We apply (III.4.12) to produce an elliptic curve \(E_{\ell}/K\) and an isogeny \(\phi _{\ell} : E \rightarrow E_{\ell}\) with \(\ker (\phi _{\ell}) =\varPhi\). The Galois invariance of \(\varPhi\) ensures that the curve \(E_{\ell}\) and the isogeny \(\phi _{\ell}\) are defined over K.

Each \(E_{\ell}\) is K-isogenous to E, so (IX.6.2) says that the \(E_{\ell}\) fall into finitely many K-isomorphism classes. Suppose that \(E_{\ell}\mathop{\cong}E_{\ell'}\) for two primes \(\ell\) and \(\ell'\). Then the composition

$$\displaystyle{E\mathop{\longrightarrow}\limits_{}^{\quad \phi _{\ell}\quad }E_{\ell}\mathop{\cong}E_{\ell'}\mathop{\longrightarrow}\limits_{}^{\quad \hat{\phi }_{\ell'}\quad }E}$$

defines an endomorphism of E of degree

$$\displaystyle{(\deg \phi _{\ell})(\deg \hat{\phi }_{\ell'}) =\ell\ell '.}$$

By assumption, \(\mathop{\mathrm{End}}\nolimits (E) = \mathbb{Z}\), so every endomorphism of E has degree n 2 for some \(n \in \mathbb{Z}\). This shows that \(\ell=\ell '\), and thus that \(E_{\ell}\not\cong E_{\ell'}\) for \(\ell\neq \ell'\). Therefore there are only finitely many primes \(\ell\) for which such a subgroup \(\varPhi _{\ell}\) and curve \(E_{\ell}\) can exist. □ 

Example 6.4.

For \(K = \mathbb{Q}\), results of Mazur [166] and Kenku [125] give a statement that is far more precise than (IX.6.2). They show that for a given elliptic curve \(E/\mathbb{Q}\), there are at most eight \(\mathbb{Q}\)-isomorphism classes of elliptic curves \(E'/\mathbb{Q}\) that are \(\mathbb{Q}\)-isogenous to E. Further, if \(\phi : E \rightarrow E'\) is a \(\mathbb{Q}\)-isogeny whose kernel is a cyclic group, then either

$$\displaystyle{1 \leq \deg \phi \leq 19\qquad \text{or}\qquad \deg \phi \in \{ 21,25,27,37,43,67,163\}.}$$

It is no coincidence that the possibilities for \(\deg \phi\) are values of d for which \(\mathbb{Q}(\sqrt{-d}\,)\) has class number one. The class number one condition means that the elliptic curve corresponding to the lattice

$$\displaystyle{\mathbb{Z} + \mathbb{Z}\left (\tfrac{1} {2} + \tfrac{1} {2}\sqrt{-d}\,\right )}$$

via (VI.5.1.1) is isomorphic to an elliptic curve defined over \(\mathbb{Q}\). (See (C.11.3.1) for details.) Now we need merely observe that multiplication by \(\sqrt{-d}\) gives an isogeny from E to itself that is defined over \(\mathbb{Q}\) and whose kernel \(\varPhi\) is invariant under the action of \(G_{\bar{\mathbb{Q}}/\mathbb{Q}}\). Then \(E \rightarrow E/\varPhi\) is a cyclic isogeny of degree d between elliptic curves defined over \(\mathbb{Q}\).

Remark 6.5.

An examination of the proof of (IX.6.1) reveals an interesting possibility. If we had some other proof of (IX.6.1) that did not use either Siegel’s theorem or Diophantine approximation techniques, then we could deduce that the equation

$$\displaystyle{Y ^{2} = X^{3} + D}$$

has only finitely many solutions \(X,Y \in R_{S}\). For given such a solution, the equation

$$\displaystyle{y^{2} = x^{3} - Xx - Y }$$

defines an elliptic curve with good reduction outside of the set

$$\displaystyle{S \cup \{\text{primes dividing}\ 2\ \text{and}\ 3\}.}$$

Hence, assuming (IX.6.1), there can be only finitely many such curves, and we could argue back to the finiteness of the number of pairs (X, Y ). Building on this idea, Parshin [203] showed how a generalization of (IX.6.1) to curves of higher genus (which had already been conjectured by Shafarevich [242]) could be used to prove Mordell’s conjecture that curves of genus at least 2 have only finitely many rational points. The subsequent proof of Shafarevich’s conjecture by Faltings [82, 84] completed this chain of reasoning. Faltings’ proof, together with Parshin’s idea, also gives a proof of Siegel’s theorem (IX.3.2) that does not involve the use of Diophantine approximation. Subsequent to Faltings’ proof of the Mordell conjecture, Vojta [299] gave a somewhat more natural proof based on Diophantine approximation methods. For an exposition of this latter proof, see for example [114, Part E].

IX.7 The Curve \(\boldsymbol{Y ^{2} = X^{3} + D}\)

Many of the general results known and conjectured about the arithmetic of elliptic curves were originally noticed and tested on various special sorts of equations, such as the one given in the title of this section. For example, long before the work of Mordell and Siegel led to general finiteness results such as (IX.3.2.1), many special cases had been proven by a variety of methods. (See, e.g., [185, Chapter 26].) The next result gives two examples in which the complete set of integral solutions can be obtained by relatively elementary means.

Proposition 7.1.

  1. (a)

    (V.A. Lebesgue) The equation

    $$\displaystyle{y^{2} = x^{3} + 7}$$

    has no solutions in integers  \(x,y \in \mathbb{Z}\) .

  2. (b)

    (Fermat) The only integral solutions to the equation

    $$\displaystyle{y^{2} = x^{3} - 2}$$

    are (x,y) = (3,±5).

Proof.

(a) Suppose that \(x,y \in \mathbb{Z}\) satisfy \(y^{2} = x^{3} + 7\). We first observe that x must be odd, since no integer of the form 8k + 7 is a square. Next we rewrite the equation as

$$\displaystyle{y^{2} + 1 = x^{3} + 8 = (x + 2)(x^{2} - 2x + 4).}$$

Since x is odd,

$$\displaystyle{x^{2} - 2x + 4 = (x - 1)^{2} + 3 \equiv 3\pmod 4,}$$

so there exists at least one prime \(p \equiv 3\ (\text{ mod}\ 4)\) that divides \(x^{2} - 2x + 4\). But then \(y^{2} + 1 \equiv 0\ (\text{ mod}\ p)\), which is not possible.

(b)Suppose that we have a solution \(x, y \in \mathbb{Z}\) to \(y^{2} = x^{3} - 2\). We factor the equation as

$$\displaystyle{(y + \sqrt{-2}\,)(y -\sqrt{-2}\,) = x^{3}.}$$

The ring \(R = \mathbb{Z}[\sqrt{-2}\,]\) is a principal ideal domain, and the greatest common divisor of \(y + \sqrt{-2}\) and \(y -\sqrt{-2}\) in R divides \(2\sqrt{-2}\), so we see that \(y + \sqrt{-2}\) has one of the following forms:

$$\displaystyle{y + \sqrt{-2} =\zeta ^{3}\text{or}\sqrt{-2}\zeta ^{3}\text{or}2\zeta ^{3}\quad \text{for some}\ \zeta \in R.}$$

Applying complex conjugation gives

$$\displaystyle{y -\sqrt{-2} =\bar{\zeta } ^{3}\text{or} -\sqrt{-2}\bar{\zeta }^{3}\text{or}2\bar{\zeta }^{3},}$$

and taking the product yields

$$\displaystyle{x^{3} = y^{2} + 2 = (\zeta \bar{\zeta })^{3}\text{or}2(\zeta \bar{\zeta })^{3}\text{or}4(\zeta \bar{\zeta }^{3}).}$$

Since \(x \in \mathbb{Z}\) and \(\zeta \bar{\zeta }\in \mathbb{Z}\), only the first case is possible, so

$$\displaystyle{y + \sqrt{-2} =\zeta ^{3}\qquad \text{and}\qquad y -\sqrt{-2} =\bar{\zeta } ^{3}.}$$

Subtracting these two equations gives

$$\displaystyle{2\sqrt{-2} =\zeta ^{3} -\bar{\zeta }^{3} = (\zeta -\bar{\zeta })(\zeta ^{2} +\zeta \bar{\zeta } +\bar{\zeta }^{2}).}$$

We write \(\zeta = a + b\sqrt{-2}\) with \(a,b \in \mathbb{Z}\) and substitute to obtain

$$\displaystyle{2\sqrt{-2} = 2\sqrt{-2}\,b(3a^{2} - 2b^{2}).}$$

Since a and b are in \(\mathbb{Z}\), we must have

$$\displaystyle{b = \pm 1\qquad \text{and}\qquad 3a^{2} - 2b^{2} = \pm 1,}$$

where the signs are the same. It follows that \((a,b) = (\pm 1,1)\), and working back through the various substitutions yields the values \((x,y) = (3,\pm 5)\). □ 

Remark 7.1.1.

It is worth remarking that the result in (IX.7.1b) is far more interesting than that in (IX.7.1a). The reason is that the Mordell–Weil group over \(\mathbb{Q}\) of the elliptic curve \(y^{2} = x^{3} + 7\) is trivial, so (IX.7.1a) reflects the fact that the equation has no rational solutions. On the other hand, the Mordell–Weil group of \(y^{2} = x^{3} - 2\) is infinite cyclic (see Exercise 10.19), so (IX.7.1b) says that among the infinitely many rational points, only two have integer coordinates.

Baker applied his effective estimate for linear forms in logarithms to give an explicit upper bound, in terms of D, for the integral solutions to \(y^{2} = x^{3} + D\). This bound was refined by Stark, who proved the following result.

Theorem 7.2.

(Stark [273]) For every ε > 0 there is an effectively computable constant  \(C_{\epsilon }\) , depending only on  \(\epsilon\) , such that if  \(D \in \mathbb{Z}\) with  \(D\neq 0\) and if  \(x,y \in \mathbb{Z}\) are solutions to the equation

$$\displaystyle{y^{2} = x^{3} + D,}$$

then

$$\displaystyle{\log \max {\bigl \{\vert x\vert ,\vert y\vert \bigr \}}\leq C_{\epsilon }\vert D\vert ^{1+\epsilon }.}$$

Example 7.3.

Stark’s estimate (IX.7.2) gives a bound for x and y that is slightly worse than exponential in D. It is natural to ask whether this bound is of the correct order of magnitude. Various people have conducted computer searches for large solutions, see for example [75, 106, 134]. Among the interesting examples found, we mention:

$$\displaystyle\begin{array}{rcl} 378,661^{2}& =& 5234^{3} + 17, {}\\ 911,054,064^{2}& =& 939,787^{3} - 307, {}\\ 149,651,610,621^{2}& =& 28,187,351^{3} + 1090, {}\\ 447,884,928,428,402,042,307,918^{2}& =& 5,853,886,516,781,223^{3} - 1641843. {}\\ \end{array}$$

Although these examples show that x and y may be quite large in comparison to D, a close examination of the data led M. Hall to make the following conjecture, which was partly generalized by Lang.

Conjecture 7.4.

  1. (a)

    (Hall [106]) For every  \(\epsilon > 0\) there is a constant  \(C_{\epsilon }\) , depending only on  \(\epsilon\) , such that for all  \(D \in \mathbb{Z}\) with  \(D\neq 0\) and for all  \(x,y \in \mathbb{Z}\) satisfying

    $$\displaystyle{y^{2} = x^{3} + D,}$$

    we have

    $$\displaystyle{\vert x\vert \leq C_{\epsilon }D^{2+\epsilon }.}$$
  2. (b)

    (Hall–Lang [138]) There are absolute constants C and  \(\kappa\) such that for every elliptic curve  \(E/\mathbb{Q}\) given by a Weierstrass equation

    $$\displaystyle{y^{2} = x^{3} + Ax + B\qquad \text{with}\ A,B \in \mathbb{Z}}$$

    and for every integral point  \(P \in E(\mathbb{Q})\) , i.e., satisfying  \(x(P) \in \mathbb{Z}\) , we have

    $$\displaystyle{{\bigl |x(P)\bigr |} \leq C\max {\bigl \{\vert A\vert ,\vert B\vert \bigr \}}^{\kappa }.}$$

The evidence for these conjectures is fragmentary. They are true for function fields, for which Davenport [57] proved (IX.7.4a) and Schmidt proved (IX.7.4b). Vojta [298, 4 §4] has shown that (IX.7.4a) over number fields is a consequence of his very general Nevanlinna-type conjectures for algebraic varieties. It is also easy to deduce (IX.7.4a) from the ABC conjecture; see Exercise 9.17. However, both Vojta’s conjectures and the ABC conjecture are well beyond the reach of current techniques. (See also Exercise 9.10 for a proof that the exponent in (IX.7.4a) cannot be improved.) Aside from these few facts, very little is known. It is worth pointing out that the effective techniques from (IX §5) seem intrinsically incapable of leading to estimates as strong as those described in (IX.7.4). We briefly explain the problem for the equation \(y^{2} = x^{3} + D\).

When performing the reduction to the S-unit equation, we use a number field K whose discriminant looks like a power of D. The Brauer–Siegel theorem says that \(\log (h_{K}\mathcal{R}_{K}) \sim \frac{1} {2}\log d_{K}\) as \([K : \mathbb{Q}]/\log d_{K} \rightarrow 0\), where h K is the class number, \(\mathcal{R}_{K}\) the regulator, and d K the absolute discriminant of K. (See, e.g., [142, Chapter XVI].) In general there is no reason to expect the class number of K to be large, so the best that we can hope for is to find a bound for the regulator that is a power of  | D | . Since the regulator is the determinant of the logarithms of a basis for the unit group R , the resulting bounds for the heights \(H(\alpha _{i})\) of generators \(\alpha _{i} \in R^{{\ast}}\) will be exponential in  | D | . This eventually leads to an exponential bound for x and y as in (IX.7.2).

There is a similar problem if we try to prove (IX.7.4) using linear forms in elliptic logarithms or by following Siegel’s method of proof as in (IX.3.1), even assuming that we could prove strong effective versions of Roth’s theorem and the Mordell–Weil theorem. The difficulty is that it is likely that the best possible upper bound for generators of the Mordell–Weil group of \(y^{2} = x^{3} + D\) has the form \(\hat{h}(P) \leq C\vert D\vert ^{\kappa }\), cf. (VIII.10.2). Here \(\hat{h}\) is a logarithmic height, so this again leads to a bound for the x-coordinate of integral points that is exponential in D.

The problem in both cases can be explained most clearly by the analogy given in (IX.4.2.1). When solving the S-unit equation or when finding integral points on elliptic curves, one is initially given a finitely generated group (\(R_{S}^{{\ast}}\times R_{S}^{{\ast}}\), respectively E(K)) and a certain exceptional subset (solutions to \(ax + by = 1\), respectively points with x(P) ∈ R S ). The first step is to choose a basis for the finitely generated group and express the exceptional points in terms of the basis. The difficulty that arises in trying to prove (IX.7.4) or the analogous estimate for the S-unit equation is that in general, the best (conjectural) upper bound for the heights of the basis elements is exponentially larger than the desired upper bound for the exceptional points! The moral of this story, assuming the validity of various conjectures, is that a randomly chosen elliptic curve \(E/\mathbb{Q}\) is unlikely to have any integral points at all.

IX.8 Roth’s Theorem—An Overview

In this section we give a brief sketch of the principal steps that go into the proof of Roth’s theorem (IX.1.4). None of the steps are tremendously deep, but the details required to make them rigorous are quite lengthy. For the full proof, see for example [114, Part D], [139, Chapter 7], or [221].

We assume that we are given an \(\alpha \in \bar{K}\), an absolute value \(v \in M_{K}\), and positive real numbers \(\epsilon\) and C. We then want to prove that there are only finitely many \(x \in K\) satisfying

$$\displaystyle{\vert x -\alpha \vert _{v} \leq CH_{K}(x)^{-2-\epsilon }.}$$

Step I: An Auxiliary Polynomial

For any given integers \(m,d_{1},\ldots ,d_{m}\), one uses elementary estimates and the pigeonhole principle to construct a polynomial

$$\displaystyle{P(X_{1},\ldots ,X_{m}) \in R[X_{1},\ldots ,X_{m}]}$$

of degree d i in X i such that P vanishes to fairly high order (in terms of m and the d i ) at the point \((\alpha ,\ldots ,\alpha )\). Further, one shows that P may be chosen with coefficients having fairly small heights, the bound for the heights being given explicitly in terms of \(\alpha\)m, and the d i .

Step II: An Upper Bound for P

Suppose now that we are given elements \(x_{1},\ldots ,x_{m} \in K\) satisfying

$$\displaystyle{\vert x_{i} -\alpha \vert _{v} \leq CH_{K}(x_{i})^{-2-\epsilon }\qquad \text{for}\ 1 \leq i \leq m.}$$

Using the Taylor series expansion for \(P(X_{1},\ldots ,X_{m})\) around \((\alpha ,\ldots ,\alpha )\) and the fact that P vanishes to high order at \((\alpha ,\ldots ,\alpha )\), one shows that \({\bigl |P(x_{1},\ldots ,x_{m})\bigr |}_{v}\) is fairly small.

Step III: A Nonvanishing Result (Roth’s Lemma)

Suppose that the degrees \(d_{1},\ldots ,d_{m}\) are fairly rapidly decreasing, where the rate of decrease depends on m, and suppose that \(x_{1},\ldots ,x_{m} \in K\) have the property that their heights are fairly rapidly increasing, the rate of increase depending on m and \(d_{1},\ldots ,d_{m}\). Suppose further that \(P(X_{1},\ldots ,X_{m}) \in R[X_{1},\ldots ,X_{m}]\) has degree d i in X i and coefficients whose heights are bounded in terms of d 1 and h(x 1). Then one shows that P does not vanish to too high an order at \((x_{1},\ldots ,x_{m})\).

This is the hardest step in Roth’s theorem. In Thue’s original theorem, he used a polynomial of the form \(P(X,Y ) = f(X) + g(X)Y\) and obtained an approximation exponent \(\tau (d) = \frac{1} {2}d+\epsilon\). The improvements of Siegel, Gelfond, and Dyson used a general polynomial in two variables. It was clear at that time that the way to obtain \(\tau (d) = 2+\epsilon\) was to use polynomials in more variables; the only stumbling block was the lack of a nonvanishing result such as the one that we have just described.

The proof of Roth’s lemma is by induction on m, the number of variables in the polynomial P. If P factors as

$$\displaystyle{P(X_{1},\ldots ,X_{m}) = F(X_{1})G(X_{2},\ldots ,X_{m}),}$$

then the induction proceeds fairly smoothly. Of course, such a factorization is unlikely to happen. What one does is to construct differential operators \(\mathcal{D}_{ij}\) such that the generalized Wronskian determinant \(\det (\mathcal{D}_{ij}P)\) is a nonzero polynomial that does factor in the above fashion. It is then a delicate matter to estimate the degrees and heights of the coefficients of the resulting polynomial and to show that they have not grown too large to allow the inductive hypothesis to be applied.

Step IV: The Final Estimate

Suppose that the inequality

$$\displaystyle{\vert x -\alpha \vert _{v} \leq CH_{K}(x)^{-2-\epsilon }}$$

has infinitely many solutions x ∈ K. We derive a contradiction as follows.

First choose a value for m, depending on \(\epsilon\)C, and \({\bigl [K(\alpha ) : K\bigr ]}\). Second, choose \(x_{1},\ldots ,x_{m} \in K\) in succession satisfying

$$\displaystyle{\vert x_{i} -\alpha \vert _{v} \leq CH_{K}(x_{i})^{-2-\epsilon },}$$

such that \(H_{K}(x_{1})\) is large, depending on m, and such that \(H_{K}(x_{i+1}) > H_{K}(x_{i})^{\kappa }\) for some constant κ depending on m. Third, choose a large integer d 1, depending on m and the \(H_{K}(x_{i})\), and then choose \(d_{2},\ldots ,d_{m}\) in terms of d 1 and the \(H_{K}(x_{i})\). We are now ready to apply the initial three steps.

Using Step I, choose a polynomial \(P(X_{1},\ldots ,X_{m})\) of degree d i in X i such that P vanishes to high order at \((\alpha ,\ldots ,\alpha )\). The order of vanishing depends on m and \(d_{1},\ldots ,d_{m}\). From Step III, we know that P does not vanish to too high an order at \((x_{1},\ldots ,x_{m})\), so we can choose a low-order partial derivative that does not vanish,

$$\displaystyle{z = \frac{\partial ^{i_{1}+\cdots +i_{m}}} {\partial X_{1}^{i_{1}}\cdots \partial X_{m}^{i_{m}}}P(x_{1},\ldots ,x_{m})\neq 0.}$$

From Step II, we know that \(\vert z\vert _{v}\) is fairly small. On the other hand, since \(z\neq 0\), we can use the product formula to show that \(\vert z\vert _{v}\) cannot be too small. Specifically, we have \(\vert z\vert _{v} \geq H_{K}(z)^{-1}\); see Exercise 9.9. Next, using elementary triangle inequality estimates, we find a lower bound for \(H_{K}(z)^{-1}\). Combining this lower bound with the earlier upper bound, some algebra gives a contradiction. It follows that the inequality

$$\displaystyle{\vert x -\alpha \vert _{v} \leq CH_{K}(x)^{-2-\epsilon }}$$

has only finitely many solutions.

Remark 8.1.

In examining the proof sketch of Roth’s theorem, especially the sequence of choices in Step IV, it is clear why we do not obtain an effective procedure for finding all x ∈ K satisfying \(\vert x -\alpha \vert _{v} \leq CH_{K}(x)^{-2-\epsilon }\). What the proof shows is that we cannot find a long sequence of x i whose heights grow sufficiently rapidly, where the terms “long sequence” and “sufficiently rapidly” can be made completely explicit in terms of K\(\alpha\)\(\epsilon\), and C. The difficulty is that the required growth of the height of each x i is given in terms of its predecessor. What this boils down to is that if we can find a large number of good approximations to \(\alpha\) whose heights are sufficiently large, then we can obtain a bound for all other good approximations to \(\alpha\) in terms of the approximations that we already know. Unfortunately, the bounds that come out of Roth’s theorem are so large that it is highly unlikely that there exists even a single good approximation to \(\alpha\) having the requisite height.

Using an elaboration of the above argument, one can prove quantitative versions of Roth’s theorem such as in the following result.

Theorem 8.2.

([103, 173]) Let  \(K/\mathbb{Q}\) be a number field, let  \(\alpha \in \bar{K}\setminus K\) , and let  \(S \subset M_{K}\) be a finite set of absolute values, each of which is extended in some way to  \(K(\alpha )\) . Let  \(\epsilon > 0\) . There are constants C 1 and C 2 , depending only on  \(\epsilon\) and  \({\bigl [K(\alpha ) : K\bigr ]}\) , such that the inequality

$$\displaystyle{\prod _{v\in S}\min {\bigl \{\vert x -\alpha \vert _{v}^{n_{v} },1\bigr \}} \leq CH_{K}(x)^{-2-\epsilon }}$$

has at most 4 #S C 1 solutions x ∈ K satisfying  \(H_{K}(x) >{\bigl ( 2H_{K}(\alpha )\bigr )}^{C_{2}}\) .

Of course, the constant C 2 in (IX.8.2) turns out to be sufficiently large that it is highly unlikely that there are any x ∈ K satisfying the two conditions of the theorem. But the proof of Roth’s theorem does not preclude the existence of large solutions, and it provides no tools with which to find them if they do exist!

Exercises

9.1. Let \({\bigl (\phi (n)\bigr )}_{n=1,2,\ldots }\) be a sequence of positive numbers. We say that a number \(\alpha \in \mathbb{R}\) is ϕ-approximable (over  \(\mathbb{Q}\)) if there are infinitely many \(p/q \in \mathbb{Q}\) satisfying

$$\displaystyle{\left \vert \alpha -\frac{p} {q}\right \vert < \frac{1} {q\phi (q)}.}$$

For example, Roth’s theorem says that no element of \(\bar{\mathbb{Q}}\) is \(n^{1+\epsilon }\)-approximable.

  1. (a)

    Prove that for any \(\epsilon > 0\), the set

    $$\displaystyle{\{\alpha \in \mathbb{R} : \mbox{ $\alpha $ is $n^{1+\epsilon }$-approximable}\}}$$

    is a set of measure 0.

  2. (b)

    More generally, prove that if the series \(\sum _{n\geq 1}1/\phi (n)\) converges, then the set

    $$\displaystyle{\{\alpha \in \mathbb{R} : \mbox{ $\alpha $ is $\phi $-approximable}\}}$$

    is a set of measure 0.

9.2.

  1. (a)

    Use Liouville’s theorem (IX.1.3) to prove that the number \(\sum _{n\geq 1}2^{-n!}\) is transcendental.

  2. (b)

    More generally, let \({\bigl (e(n)\bigr )}_{n=1,2,\ldots }\) be a sequence of real numbers with the property that for every d > 0 there is a constant C d  > 0 such that

    $$\displaystyle{e(n) \geq C_{d}n^{d}\qquad \text{for all}\ n = 1, 2,\ldots \,.}$$

    (In complexity theory terminology, one says that the growth rate of the function e(n) is faster than polynomial.) Let \(b \geq 2\) be an integer. Prove that the number \(\sum _{n\geq 1}b^{-e(n)}\) is transcendental.

  3. (c)

    Use (b) to prove that there are uncountably many transcendental numbers.

9.3. For each integer \(m\neq 0\), let

$$\displaystyle{N(m) = \#{\bigl \{(x,y) \in \mathbb{Z} : y^{2} = x^{3} + m\bigr \}}.}$$

Note that (IX.3.2) tells us that N(m) is finite.

  1. (a)

    Prove that N(m) can be arbitrarily large. (Hint. Choose an m 0 such that \(y^{2} = x^{3} + m_{0}\) has infinitely many rational solutions. Then clear the denominators of a lot of them.)

  2. (b)

    More precisely, prove that there is an absolute constant c > 0 such that

    $$\displaystyle{N(m) > c{\bigl (\log \vert m\vert \bigr )}^{1/3}}$$

    for infinitely many \(m \in \mathbb{Z}\). (Hint. Use height functions to estimate the size of the denominators cleared in (a).)

  3. (c)

    ** Prove or disprove that N(m) is unbounded as m ranges over sixth-power-free integers, i.e., integers divisible by no nontrivial sixth power.

  4. (d)

    Suppose that there is a value of m 0 such that the Mordell–Weil group \(E_{0}(\mathbb{Q})\) of the elliptic curve \(E_{0} : y^{2} = x^{3} + m_{0}\) has rank r. Using an elaboration of the argument in (b), prove that there is an absolute constant c > 0 such that

    $$\displaystyle{N(m) > c{\bigl (\log \vert m\vert \bigr )}^{r/(r+2)}}$$

    for infinitely many \(m \in \mathbb{Z}\).

  5. (e)

    ** Let \(\epsilon > 0\). Prove or disprove that

    $$\displaystyle{\lim _{\vert m\vert \rightarrow \infty }\frac{N(m)} {{\bigl (\log \vert m\vert \bigr )}^{1+\epsilon }} = 0.}$$

9.4. Let \(E/\mathbb{Q}\) be an elliptic curve and let \(P \in E(\mathbb{Q})\) be a point of infinite order.

  1. (a)

    For each prime \(p \in \mathbb{Z}\) at which E has good reduction, let n p be the order of the reduced point \(\tilde{P}\) in the finite group \(\tilde{E}(\mathbb{F}_{p})\). Prove that the set

    $$\displaystyle{\{n_{p} : \mbox{ $p$ prime}\}}$$

    contains all but finitely many positive integers. (Hint. You will need the strong form of Siegel’s theorem; see (IX.3.3).)

  2. (b)

    An alternative formulation for (a) is to write \(x(nP) = a_{n}^{}/d_{n}^{2}\) as a fraction in lowest terms. The sequence \((d_{n})_{n\geq 1}\) is an elliptic divisibility sequence.Footnote 2 A prime p is called a primitive divisor of d n if \(p\mid d_{n}\) and \(p \nmid d_{m}\) for all m < n. Prove that all but finitely many terms in the sequence d n have a primitive divisor. (This is an analogue for elliptic curves of a classical result for the multiplicative group that is due to Bang and Zsigmondy [317].)

9.5.

  1. (a)

    Let \(f(T) = a_{0}T^{n} + \cdots + a_{n} \in \mathbb{Z}[T]\) be a polynomial with \(a_{0}a_{n}\neq 0\) and with distinct roots \(\xi _{1},\ldots ,\xi _{n} \in \mathbb{C}\). Let \(A =\max {\bigl \{ \vert a_{0}\vert ,\ldots ,\vert a_{n}\vert \bigr \}}\). Prove that for every rational number \(t \in \mathbb{Q}\),

    $$\displaystyle{{\bigl |f(t)\bigr |} \geq (2n^{2}A)^{-n}\min {\bigl \{\vert t -\xi _{ 1}\vert ,\ldots ,\vert t -\xi _{n}\vert \bigr \}}.}$$
  2. (b)

    Let \(f(T) = a_{0}T^{n} + \cdots + a_{n} \in K[T]\) be a polynomial with \(a_{0}a_{n}\neq 0\) and with distinct roots \(\xi _{1},\ldots ,\xi _{n} \in \bar{K}\). Let \(S \subset M_{K}\) be a finite set of places of K, each extended in some fashion to \(\bar{K}\). Prove that there is a constant C f  > 0, depending only on f, such that for every \(t \in K\),

    $$\displaystyle{\prod _{v\in S}\min {\bigl \{1,{\bigl |f(t)\bigr |}_{v}^{n_{v}}\bigr \}} \geq C_{ f}\prod _{v\in S}\min _{1\leq i\leq n}{\bigl \{1,\vert t -\xi _{i}\vert _{v}^{n_{v}}\bigr \}}.}$$
  3. (c)

    Find an explicit expression for the constant C f appearing in (b), where your value for C f should depend only on n and \(H_{K}{\bigl ([a_{0},\ldots ,a_{n}]\bigr )}\).

9.6.

  1. (a)

    Let \(F(X,Y ) \in \mathbb{Z}[X,Y ]\) be a homogeneous polynomial of degree \(d \geq 3\) with nonzero discriminant. Prove that for every nonzero integer b, Thue’s equation

    $$\displaystyle{F(X,Y ) = b}$$

    has only finitely many solutions \((x,y) \in \mathbb{Z}^{2}\). (Hint. Let \(f(T) = F(T, 1)\), and write \(b = F(x,y) = y^{d}f(x/y)\). Now use Exercise 9.5a and (IX.1.4).)

  2. (b)

    More generally, let \(F(X,Y ) \in K[X,Y ]\) be a homogeneous polynomial of degree \(d \geq 3\) with nonzero discriminant, and let \(S \subset M_{K}\) be a finite set of places containing \(M_{K}^{\infty }\). Prove that for every \(b \in K^{{\ast}}\), the equation

    $$\displaystyle{F(X,Y ) = b}$$

    has only finitely many solutions \((x,y) \in R_{S} \times R_{S}\).

  3. (c)

    Let \(f(X) \in K[X]\) be a polynomial with at least two distinct roots in \(\bar{K}\), let \(S \subset M_{K}\) be as in (b), and let \(n \geq 3\) be an integer. Prove that the equation

    $$\displaystyle{Y ^{n} = f(X)}$$

    has only finitely many solutions \((x,y) \in R_{S} \times R_{S}\). (Hint. Mimic the proof of (IX.4.3) until you end up with a number of equations of the form \(aW^{n} + bZ^{n} = c\), and then use (b).)

9.7. Let \(E/K\) be an elliptic curve without complex multiplication. Prove that for every prime \(\ell\), the representation of \(G_{\bar{K}/K}\) on the \(\mathbb{Q}_{\ell}\)-vector space \(T_{\ell}(E) \otimes \mathbb{Q}_{\ell}\) is irreducible.

9.8.

  1. (a)

    Let \(\|\,\cdot \,\|\) be the usual Euclidean norm on \(\mathbb{R}^{n}\), and let \(\{v_{1},\ldots ,v_{n}\}\) be a basis for \(\mathbb{R}^{n}\). Prove that there is a constant c > 0, depending only on n and \(\{v_{1},\ldots ,v_{n}\}\), such that

    $$\displaystyle{{\Biggl \|\sum _{i=1}^{n}a_{ i}v_{i}\Biggr \|} \geq c\max {\bigl \{\vert a_{i}\vert \bigr \}}\qquad \text{ for all}\ a_{1},\ldots ,a_{n} \in \mathbb{R}.}$$
  2. (b)

    Let \(\varLambda \subset \mathbb{R}^{n}\) be a lattice. Prove that there exist a basis \(\{v_{1},\ldots ,v_{n}\}\) for \(\varLambda\) and a constant c n  > 0 depending only on n such that

    $$\displaystyle{{\Biggl \|\sum _{i=1}^{n}a_{ i}v_{i}\Biggr \|} \geq c_{n}\sum _{i=1}^{n}\|a_{ i}v_{i}\|\qquad \text{ for all}\ a_{1},\ldots ,a_{n} \in \mathbb{R}.}$$

    (Hint. Ideally, one would like to choose an orthogonal basis for \(\varLambda\). This is not generally possible, but mimic the Gram–Schmidt process to find a basis that is reasonably orthogonal.)

  3. (c)

    Let \(\|\,\cdot \,\|_{1}\) and \(\|\,\cdot \,\|_{2}\) be norms on \(\mathbb{R}^{n}\), i.e., they satisfy \(\|v\| \geq 0\), \(\|v\| = 0\) if and only if \(v = 0\), \(\|av\| \leq \vert a\vert \,\|v\|\), and \(\|v + w\| \leq \| v\| +\| w\|\). Prove that there are constants \(c_{1},c_{2} > 0\) such that

    $$\displaystyle{c_{1}\|v\|_{1} \leq \| v\|_{2} \leq c_{2}\|v\|_{1}\qquad \text{for all}\ v \in \mathbb{R}^{n}.}$$
  4. (d)

    Let Q be a positive definite quadratic form on \(\mathbb{R}^{n}\). Prove that there is a constant c > 0, depending on n and Q, such that for any integral lattice point \((a_{1},\ldots ,a_{n}) \in \mathbb{Z}^{n} \subset \mathbb{R}^{n}\),

    $$\displaystyle{Q(a_{1},\ldots ,a_{n}) \geq c\max {\bigl \{\vert a_{1}\vert ,\ldots ,\vert a_{n}\vert \bigr \}}^{2}.}$$
  5. (e)

    Let \(E/K\) be an elliptic curve and let \(P_{1},\ldots ,P_{r}\) be a basis for the free part of E(K). Prove that there is a constant c > 0, depending on E and \(P_{1},\ldots ,P_{r}\), such that for all integers \(m_{1},\ldots ,m_{r}\),

    $$\displaystyle{\hat{h}(m_{1}P_{1} + \cdots + m_{r}P_{r}) \geq c\max {\bigl \{\vert m_{1}\vert ,\ldots ,\vert m_{r}\vert \bigr \}}^{2}.}$$

9.9. Let \(z \in K\) with \(z\neq 0\).

  1. (a)

    Prove that for any \(v \in M_{K}\),

    $$\displaystyle{\vert z\vert _{v} \geq H_{K}(z)^{-1}.}$$
  2. (b)

    More generally, prove that for any (not necessarily finite) set of absolute values \(S \subset M_{K}\),

    $$\displaystyle{\prod _{v\in S}\min {\bigl \{1,\vert z\vert _{v}^{n_{v}}\bigr \}} \geq H_{ K}(z)^{-1}.}$$

    (This lemma, as trivial as it appears, lies at the heart of all known proofs in Diophantine approximation and transcendence theory. In its simplest guise, namely for \(K = \mathbb{Q}\), it asserts nothing more than the fact that there are no positive integers smaller than one!)

9.10. Prove that there is an (absolute) constant C > 0 such that the inequality

$$\displaystyle{0 < \vert y^{2} - x^{3}\vert < C\sqrt{\vert x\vert }}$$

has infinitely many solutions \((x,y) \in \mathbb{Z}\). (Hint. Verify the identity

$$\displaystyle{(t^{2} - 5)^{2}{\bigl ((t + 9)^{2} + 4\bigr )} - (t^{2} + 6t - 11)^{3} = -1728(t - 2).}$$

Take solutions to \(y^{2} - 2v^{2} = -1\) and set \(t = 2u - 9\). Show that this leads to a value \(C = 432\sqrt{2}+\epsilon\) for any \(\epsilon > 0\).)

9.11.

  1. (a)

    Let \(d \equiv 2\ (\text{ mod}\ 4)\) and let \(D = d^{3} - 1\). Prove that the equation

    $$\displaystyle{y^{2} = x^{3} + D}$$

    has no solution in integers \(x,y \in \mathbb{Z}\).

  2. (b)

    For each of the primes p in the set \(\{11, 19, 43, 67, 163\}\), find all solutions \(x,y \in \mathbb{Z}\) to the equation

    $$\displaystyle{y^{2} = x^{3} - p.}$$

    (Hint. Work in the ring \(R = \mathbb{Z}\left [\frac{1} {2}(1 + \sqrt{-p}\,)\right ]\). Note that R is a principal ideal domain and that 2 does not split in R.)

9.12. Let \(E/\mathbb{Q}\) be an elliptic curve given by a Weierstrass equation

$$\displaystyle{E : y^{2} + a_{ 1}xy + a_{3}y = x^{3} + a_{ 2}x^{2} + a_{ 4}x + a_{6}}$$

with \(a_{1},\ldots ,a_{6} \in \mathbb{Z}\), and let \(P \in E(\mathbb{Q})\) be a point of infinite order.

  1. (a)

    Suppose that \(x{\bigl ([m]P\bigr )} \in \mathbb{Z}\) for some integer \(m \geq 1\). Prove that \(x(P) \in \mathbb{Z}\). (This result is often useful when one is searching for integral points on elliptic curves of rank 1. See Exercise 9.13 for an example.)

  2. (b)

    More generally, for any \(m \geq 1\), write \(x(mP) = a_{m}/d_{m}^{2} \in \mathbb{Q}\) as a fraction in lowest terms. Prove that

    $$\displaystyle{m\mid n\Longrightarrow d_{m}\mid d_{n}.}$$

    Thus the sequence \((d_{m})_{m\geq 1}\) is a divisibility sequence; see Exercise 3.36.

9.13. Let \(E/\mathbb{Q}\) be the elliptic curve

$$\displaystyle{E : y^{2} + y = x^{3} - x.}$$

For this exercise you may assume that \(E(\mathbb{Q})\) has rank 1. (For a proof that \(\mathop{\mathrm{rank}}\nolimits E(\mathbb{Q}) = 1\), see Exercise 10.9.)

  1. (a)

    Prove that \(E_{\text{tors}}(\mathbb{Q}) =\{ O\}\), and hence that \(E(\mathbb{Q})\mathop{\cong}\mathbb{Z}\).

  2. (b)

    Prove that (0, 0) is a generator for \(E(\mathbb{Q})\). (Hint. Make a sketch of \(E(\mathbb{R})\) and show that (0, 0) is not on the identity component. Use Exercise 9.12 to conclude that a generator for \(E(\mathbb{Q})\) must be a point with integer coordinates on the nonidentity component, and find all such points.)

  3. (c)

    Find all of the integer points in \(E(\mathbb{Q})\). (Hint. Let P = (0, 0). Suppose that [m]P is integral. Write m = 2a n with n odd and use Exercise 9.12 to show that[n]P is integral. Use an argument as in (b) to find all possible values of n, and then do some computations to find the possible a values.)

  4. (d)

    Solve the following classical number theory problem: Find all positive integers that are simultaneously the product of two consecutive integers and the product of three consecutive integers.

9.14. Let CK be a curve and let \(f,g \in K(C)\) be nonconstant functions.

  1. (a)

    * Prove that

    $$\displaystyle{\lim _{ \begin{array}{c}P\in C(\bar{K}) \\ h_{f}(P)\rightarrow \infty \\ \end{array}}\frac{h_{f}(P)} {h_{g}(P)} = \frac{\deg f} {\deg g}.}$$
  2. (b)

    Prove that for every \(\epsilon > 0\) there exists a constant \(c = c(f,g,\epsilon )\) such that

    $$\displaystyle{\left \vert \frac{1} {\deg f}h_{f}(P) -\frac{1} {\deg g}h_{g}(P)\right \vert <\epsilon h_{f}(P) + c\qquad \text{for all}\ P \in C(\bar{K}).}$$
  3. (c)

    Let C be an elliptic curve. Prove that there is a constant \(c = c(f,m,\epsilon )\) such that

    $$\displaystyle{{\bigl |h_{f}{\bigl ([m]P\bigr )} - m^{2}h_{ f}(P)\bigr |} <\epsilon h_{f}(P) + c\qquad \text{for all}\ P \in C(\bar{K}).}$$
  4. (d)

    Prove that (IX.3.1) is true for all nonconstant functions f ∈ K(E). Use this to prove the finiteness result (IX.3.2.2) directly, without first reducing to (IX.3.2.1).

9.15. For a given Q ∈ C(K v ), let d v be the distance function defined in (IX §2), and let d v alt denote the distance function given by the alternative definition in (IX.2.2.1). Prove that the ratio \(d_{v}^{\text{alt}}(P,Q)/d_{v}(P,Q)\) is bounded for \(P \in C(K_{v})\).

9.16. Let \(C/K\) be a curve, let \(f \in K(C)\) be a nonconstant function, and write the divisor of zeros of f as

$$\displaystyle{\mathop{\mathrm{div}}\nolimits _{0}(f) =\sum _{ \begin{array}{c}Q\in C(\bar{K}) \\ \mathop{\mathrm{ord}}\nolimits _{Q}(f)>0 \\ \end{array}}\mathop{ \mathrm{ord}}\nolimits _{Q}(f)(Q) = n_{1}(Q_{1}) + n_{2}(Q_{2}) + \cdots + n_{r}(Q_{r}).}$$

Replacing K by an extension field, we assume that \(Q_{1},\ldots ,Q_{r} \in C(K)\). Let \(v \in M_{K}\). Prove that

$$\displaystyle\begin{array}{rcl} \log \min {\bigl \{{\bigl |f(P)\bigr |}_{v}, 1\bigr \}}& =& n_{1}\log d_{v}(P,Q_{1}) + \cdots + n_{r}\log d_{v}(P,Q_{r}) + O(1) {}\\ & & \qquad \qquad \qquad \qquad \qquad \qquad \text{for all}\ P \in C(K_{v}), {}\\ \end{array}$$

where the O(1) depends on f and the choice of distance functions, but is independent of P.

9.17. Let \(\epsilon > 0\), and let m and n be positive integers satisfying nm > n + m. Assuming that the ABC conjecture (VIII.11.4) is true, prove the following assertions (see also Exercise 8.22):

  1. (a)

    There is a constant \(C = C(\epsilon ,m,n)\) such that if

    $$\displaystyle{y^{m} = x^{n} + D\qquad \text{with}\ x,y,D \in \mathbb{Z}\ \text{and}\ D\neq 0,}$$

    then

    $$\displaystyle{\vert x\vert ^{nm-n-m} \leq C\vert D\vert ^{m+\epsilon }\qquad \text{and}\qquad \vert y\vert ^{nm-n-m} \leq C\vert D\vert ^{n+\epsilon }.}$$

    (This is a generalized version of Hall’s conjecture (IX.7.4).)

  2. (b)

    Suppose now that \(D\neq 0\) is fixed. If \(\max \{m,n\}\) is sufficiently large, then the equation \(y^{m} = x^{n} + D\) has no solutions \(x,y \in \mathbb{Z}\) with \(x,y\notin \{0,\pm 1\}\). (Hint. You’ll need to keep track of how the constant in (a) depends on m and n.)

9.18. Let E be the elliptic curve \(y^{2} = x^{3} + 2089\).

  1. (a)

    Prove that the points

    $$\displaystyle{P_{1} = (-12, 19), P_{2} = (-10, 33), P_{3} = (-4, 45), P_{4} = (3, 46),}$$

    are independent points in \(E(\mathbb{Q})\).

  2. (b)

    * Prove that \(E(\mathbb{Q})\mathop{\cong}\mathbb{Z}^{4}\) and that P 1, P 2, P 3, P 4 are a basis for \(E(\mathbb{Q})\).

  3. (c)

    Find 10 more points (x, y) in \(E(\mathbb{Q})\) with \(x,y \in \mathbb{Z}\) and y > 0. Express these integral points in terms of the basis listed in (a).