Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Notation.

For any real valued function f and positive function g, we write f = O(g) to indicate that there exists a positive constant c such that \(\vert f\vert < cg\), and also write f = o(g) to indicate that fg → 0. We write \(\Vert z\Vert\) to denote the distance of a real number z to the nearest integer. Furthermore, \(c_{0},c_{1},c_{2},\ldots\) denote positive constants which may depend on some of the parameters that arise from our discussion.

1.1 Pell’s Equation : Bounded Fluctuations

Our starting point is the well-known Pell’s equation, a standard part of any introductory course on number theory. The theory of Pell’s equation, while mostly elementary, is nevertheless one of the most beautiful chapters in the whole of mathematics. Also, it is very important, since the concept of units plays a key role in algebraic number theory.

We briefly recall the main results. Consider, for simplicity, the concrete equation \(x^{2} - 2y^{2} = \pm 1\). This equation has infinitely many integral solutions; in fact, the set of all integral solutions \((x_{k},y_{k}) \in \mathbf{Z}^{2}\) forms a cyclic group generated by the least positive solution. More precisely, we have

$$\displaystyle{x_{k} + y_{k}\sqrt{2} = \pm (1 + \sqrt{2})^{k},\quad k \in \mathbf{Z}.}$$

All integral solutions of \(x^{2} - 2y^{2} = 1\) are given by \(x_{k} + y_{k}\sqrt{2} = \pm (1 + \sqrt{2})^{2k}\), while all integral solutions of \(x^{2} - 2y^{2} = -1\) are given by \(x_{k} + y_{k}\sqrt{2} = \pm (1 + \sqrt{2})^{2k+1}\). In particular, all positive integer solutions of \(x^{2} - 2y^{2} = 1\) are given by

$$\displaystyle{x_{k} + y_{k}\sqrt{2} = (1 + \sqrt{2})^{2k} = (3 + 2\sqrt{2})^{k},\quad k = 1,2,3,\ldots.}$$

Taking the algebraic conjugate \(x_{k} - y_{k}\sqrt{2} = (3 - 2\sqrt{2})^{k}\), and combining these two equations, we obtain the explicit formulas

$$\displaystyle{x_{k} = \frac{(3 + 2\sqrt{2})^{k} + (3 - 2\sqrt{2})^{k}} {2} \quad \mbox{ and}\quad y_{k} = \frac{(3 + 2\sqrt{2})^{k} - (3 - 2\sqrt{2})^{k}} {2\sqrt{2}}.}$$

Since \(0 < 3 - 2\sqrt{2} < \frac{1} {5}\), we have

$$\displaystyle{x_{k} = \mbox{ the nearest integer to}\ \frac{1} {2}(3 + 2\sqrt{2})^{k}}$$

and

$$\displaystyle{y_{k} = \mbox{ the nearest integer to}\ \frac{1} {2\sqrt{2}}(3 + 2\sqrt{2})^{k}.}$$

If k is large, the error is very small. For example, the 10-th solution of \(x^{2} - 2y^{2} = 1\) in positive integers is the pair x 10 = 22, 619, 537 and y 10 = 15, 994, 428. Here we find

$$\displaystyle{\frac{1} {2}(3 + 2\sqrt{2})^{10} = 22619536.99999998895\ldots }$$

and

$$\displaystyle{ \frac{1} {2\sqrt{2}}(3 + 2\sqrt{2})^{10} = 15994428.000000007815\ldots.}$$

Let \(F(N) = F(\sqrt{2};1;N)\) denote the number of positive integer solutions of the Pell equation \(x^{2} - 2y^{2} = 1\) up to N, in the senseFootnote 1 that x ≥ 1 and 1 ≤ y ≤ N. We have

$$\displaystyle{k \leq F(N)\quad \mbox{ if and only if}\quad \frac{(3 + 2\sqrt{2})^{k} - (3 - 2\sqrt{2})^{k}} {2\sqrt{2}} \leq N,}$$

which implies the asymptotic formula

$$\displaystyle{ F(N) = F(\sqrt{2};1;N) = \frac{\log N} {\log (3 + 2\sqrt{2})} + O(1). }$$
(4.1)

The formula (4.1) says that the counting function \(F(N) = F(\sqrt{2};1;N)\) has an extremely predictable, almost deterministic behavior: it is c 2logN plus some bounded error term.

Note that (4.1) has some far-reaching generalizations. Let \([\gamma _{1},\gamma _{2}]\) be an arbitrary interval, and let \(F(\sqrt{2};[\gamma _{1},\gamma _{2}];N)\) denote the number of positive integer solutions of the Pell inequality \(\gamma _{1} \leq x^{2} - 2y^{2} \leq \gamma _{2}\), with x ≥ 1 and 1 ≤ y ≤ N. By using the theory of indefinite binary quadratic forms, it is easy to prove the following analog of (4.1). We have

$$\displaystyle{ F(\sqrt{2};[\gamma _{1},\gamma _{2}];N) = c_{0}\log N + O(1), }$$
(4.2)

where the constant factor \(c_{0} = c_{0}(\sqrt{2};\gamma _{1},\gamma _{2})\) is independent of N.

Furthermore, we can switch from \(\sqrt{2}\) to any other quadratic irrational α. This means that α is a root of a quadratic equation Ax 2 + Bx + C = 0 with integral coefficients such that the discriminant B 2 − 4AC ≥ 2 is not a complete square. An equivalent definition is that \(\alpha = (a + \sqrt{d})/b\) for some integers a, b, d such that b ≠ 0 and d ≥ 2 is not a complete square. Note that the quadratic irrationals are characterized by their continued fractions. The continued fractions of α is finally periodic if and only if α is a quadratic irrational. For example,

$$\displaystyle{\frac{24 -\sqrt{15}} {17} = 1 + \frac{1} {5+} \frac{1} {2+} \frac{1} {3+} \frac{1} {2+} \frac{1} {3+}\ldots = [1;5,2,3,2,3,2,3,\ldots ] = [1;5,\overline{2,3}].}$$

Let us go back to (4.2) and to the special case \(\alpha = \sqrt{2}\). If − 2 < γ 1 ≤ −1 and 1 ≤ γ 2 < 2, then

$$\displaystyle{ c_{0}(\sqrt{2};\gamma _{1},\gamma _{2}) = \frac{1} {\log (1 + \sqrt{2})} = \frac{2} {\log (3 + 2\sqrt{2})}. }$$
(4.3)

If \(-1 <\gamma _{1} \leq 1 \leq \gamma _{2} < 2\), then

$$\displaystyle{ c_{0}(\sqrt{2};\gamma _{1},\gamma _{2}) = \frac{1} {\log (3 + 2\sqrt{2})}. }$$
(4.4)

Finally, if \(-1 <\gamma _{1} \leq \gamma _{2} < 1\), then of course

$$\displaystyle{ c_{0}(\sqrt{2};\gamma _{1},\gamma _{2}) = 0. }$$
(4.5)

1.2 The Naive Area Principle

It is very interesting to compare these well-known asymptotic results about the number of solutions of the Pell equation/inequality to what we like to call the Naive Area Principle , a natural guiding intuition in lattice point theory. It goes roughly as follows. If a nice region has a large area, then it should contain a large number of lattice points, and the number of lattice points is close to the area.

Of course, the heart of the matter is how we define a nice region precisely. Consider, for example, the infinite open horizontal strip of height one, given by 0 < y < 1, − < x < . It has infinite area, but it does not contain any lattice point. The reader is likely to agree that the infinite strip is a nice region, so the Naive Area Principle is clearly violated here.

A less trivial example comes from the Pell inequality

$$\displaystyle{ -\frac{1} {2} \leq x^{2} - 2y^{2} \leq \frac{1} {2}. }$$
(4.6)

This is a hyperbolic region of infinite area, and contains no lattice point except the origin. The reader is again likely to agree that the hyperbolic region (4.6) is also nice, so this is again a violation of the Naive Area Principle.

Next we switch from (4.6) to the general Pell inequality

$$\displaystyle{ \gamma _{1} \leq x^{2} - 2y^{2} \leq \gamma _{ 2}, }$$
(4.7)

where \(-\infty <\gamma _{1} <\gamma _{2} < \infty \) are arbitrary real numbers. Of course, the hyperbolic region (4.7) has infinite area. What we want to compute is the area of a finite segment. Consider the finite region

$$\displaystyle\begin{array}{rcl} H(\sqrt{2};[\gamma _{1},\gamma _{2}];N) = \left \{(x,y) \in \mathbf{R}^{2}:\gamma _{ 1} \leq x^{2} - 2y^{2} \leq \gamma _{ 2},\ x \geq 1,\ 1 \leq y \leq N\right \}.& &{}\end{array}$$
(4.8)

If N is very large compared to the pair of constants γ 1 and γ 2, then the finite region \(H(\sqrt{2};[\gamma _{1},\gamma _{2}];N)\) looks like a hyperbolic needle . It is easy to give a good estimate for the area of this hyperbolic needle. We have

$$\displaystyle{ \mathrm{area}(H(\sqrt{2};[\gamma _{1},\gamma _{2}];N)) = \frac{\gamma _{2} -\gamma _{1}} {2\sqrt{2}} \log N + O(1), }$$
(4.9)

where the implicit constant in the term O(1) is independent of N, but may depend on γ 1 and γ 2.

The proof of (4.9) is based on the familiar factorization

$$\displaystyle{ x^{2} - 2y^{2} = (x + y\sqrt{2})(x - y\sqrt{2}), }$$
(4.10)

and on the computation of the Jacobian of the corresponding substitution; this explains the factor \(2\sqrt{2}\) in the denominator in (4.9). The details are easy, and go as follows. In view of the factorization (4.10), it is more convenient to compute the area of the following slight variant of the region (4.9). Let

$$\displaystyle\begin{array}{rcl} & & H^{{\ast}}(\sqrt{2};[\gamma _{ 1},\gamma _{2}];N) \\ & & \quad =\{ (x,y) \in \mathbf{R}^{2}:\gamma _{ 1} \leq x^{2} - 2y^{2} \leq \gamma _{ 2},\ 1 \leq x + y\sqrt{2} \leq 2\sqrt{2}N\}.\qquad {}\end{array}$$
(4.11)

Consider the substitution

$$\displaystyle{ u_{1} = x + y\sqrt{2},\quad u_{2} = x - y\sqrt{2}, }$$
(4.12)

which is equivalent to

$$\displaystyle{x = \frac{u_{1} + u_{2}} {2},\quad y = \frac{u_{1} - u_{2}} {2\sqrt{2}}.}$$

The corresponding determinant is

$$\displaystyle{\frac{\partial (u,v)} {\partial (x,y)} = \left \vert \begin{array}{cc} 1& -\sqrt{2}\\ 1 & \sqrt{2} \end{array} \right \vert = 2\sqrt{2}.}$$

Applying the substitution (4.12), we have

$$\displaystyle\begin{array}{rcl} & & \mathrm{area}(H^{{\ast}}(\sqrt{2};[\gamma _{ 1},\gamma _{2}];N)) = \frac{1} {2\sqrt{2}}\int _{1}^{2\sqrt{2}N}\left (\int _{\gamma _{ 1}/u_{1}}^{\gamma _{2}/u_{1} }\,\mathrm{d}u_{2}\right )\mathrm{d}u_{1} \\ & & \quad = \frac{1} {2\sqrt{2}}\int _{1}^{2\sqrt{2}N}\frac{\gamma _{2} -\gamma _{1}} {u_{1}} \,\mathrm{d}u_{1} = \frac{\gamma _{2} -\gamma _{1}} {2\sqrt{2}} \log N + O(1). {}\end{array}$$
(4.13)

Simple geometric consideration shows that

$$\displaystyle{\mathrm{area}(H(\sqrt{2};[\gamma _{1},\gamma _{2}];N)) =\mathrm{ area}(H^{{\ast}}(\sqrt{2};[\gamma _{ 1},\gamma _{2}];N)) + O(1),}$$

and so (4.13) implies (4.9).

Now let us return to the Naive Area Principle. Comparing (4.2), (4.8) and (4.9), it is reasonable to expect, in view of the Naive Area Principle, that the counting function \(F(\sqrt{2};[\gamma _{1},\gamma _{2}];N)\) is close to the area of the hyperbolic needle \(H(\sqrt{2};[\gamma _{1},\gamma _{2}];N)\). In other words, it is reasonable to expect that

$$\displaystyle{ c_{0}(\sqrt{2};\gamma _{1},\gamma _{2}) = \frac{\gamma _{2} -\gamma _{1}} {2\sqrt{2}}. }$$
(4.14)

Unfortunately, the Naive Area Principle is almost always violated in the quantitative sense that (4.14) fails for the overwhelming majority of the choices \(-\infty <\gamma _{1} <\gamma _{2} < \infty \). In fact, the two sides of (4.14) have completely different behavior. The left-hand side of has discrete jumps and the right-hand side is a continuous function of γ 1 and γ 2. For example, as γ 1 and γ 2 run in the interval − 2 < γ 1 < γ 2 < 2, the constant factor \(c_{0}(\sqrt{2};\gamma _{1},\gamma _{2})\) has only 3 possible values, namely

$$\displaystyle{0,\quad \frac{1} {\log (3 + 2\sqrt{2})},\quad \frac{2} {\log (3 + 2\sqrt{2})};}$$

see (4.3)–(4.5). This shows, in a quantitative way, how the general Pell inequality (4.7) violates the Naive Area Principle.

1.3 The Giant Leap in the Inhomogeneous Case: Extra Large Fluctuations

Using the familiar factorization (4.10), we can rewrite the Pell equation \(x^{2} - 2y^{2} = \pm 1\), restricted to positive integers, as

$$\displaystyle\begin{array}{rcl} \vert x^{2} - 2y^{2}\vert \leq 1\quad \mbox{ or}\quad \vert y\sqrt{2} - x\vert (y\sqrt{2} + x) \leq 1\quad \mbox{ or}\quad \Vert y\sqrt{2}\Vert (y\sqrt{2} + x) \leq 1,& &{}\end{array}$$
(4.15)

where \(\Vert z\Vert\) denotes, as usual, the distance of a real number z from the nearest integer. Notice that in (4.15), x is the nearest integer to \(y\sqrt{2}\), which is an irrational number. Since \(y\sqrt{2} \approx x\), the inequality (4.15) is basically equivalent to the vague inequality

$$\displaystyle{ \Vert y\sqrt{2}\Vert \leq \frac{1 + o(1)} {2\sqrt{2}y}. }$$
(4.16)

The vagueness of (4.16) comes from the additional term o(1), which tends to 0 as y → . The formula (4.16) is ambiguous, but surely every mathematician understands what we are talking about here.

An expert in number theory would classify (4.16) as a typical problem in diophantine approximation. Next we give a nutshell summary of diophantine approximation .

The classical problem in the theory of diophantine approximation is to find good rational approximations of irrational numbers. More precisely, we want to decide whether an inequality

$$\displaystyle{ \Vert n\alpha \Vert < \frac{1} {n\varphi (n)}\quad \mbox{ or}\quad \left \vert \alpha -\frac{m} {n} \right \vert < \frac{1} {n^{2}\varphi (n)}, }$$
(4.17)

or in general,

$$\displaystyle{ \Vert n\alpha -\beta \Vert < \frac{1} {n\varphi (n)}, }$$
(4.18)

where α is a given irrational number and β is a given real number, has infinitely many integral solutions in n, and if this is the case, to determine the solutions, or at least the asymptotic number of integral solutions. Here \(\varphi (n)\) is a positive increasing function of n.

The diophantine inequality (4.17) is said to be homogeneous, whereas the diophantine inequality (4.18) is said to be inhomogeneous. For example, in the homogeneous case, the best possible result is Hurwitz’s well-known theorem , that for any irrational number α, the inequality

$$\displaystyle{\Vert n\alpha \Vert < \frac{1} {\sqrt{5}n}}$$

has infinitely many positive integer solutions.

In the inhomogeneous case, we can mention an old result of Kronecker, that for any irrational number α and any real number β, the inequality

$$\displaystyle{\Vert n\alpha -\beta \Vert < \frac{3} {n}}$$

has infinitely many positive integer solutions. Perhaps the strongest inhomogeneous result is Minkowski’s theorem , that for any irrational number α, the inequality

$$\displaystyle{\Vert n\alpha -\beta \Vert < \frac{1} {4n}}$$

has infinitely many integer but not necessarily positive solutions, unless 0 < β < 1 is an integer multiple of α modulo one.

The homogeneous case (4.17) has a complete theory based on the effectiveness of the tool of continued fractions. These are classical results due mostly to Euler and Lagrange. Unfortunately, we know much less about the inhomogeneous case. Very recently, the author proved some new results in this direction, and basically covered the case when α is an arbitrary quadratic irrational and β is a typical real number. These results form a large part of the forthcoming book [2]; see also the recent papers [8, 9].

Before formulating our main results, we want to first elaborate on the connection between homogeneous/inhomogeneous diophantine inequalities, such as (4.17) and (4.18), and homogeneous/inhomogeneous Pell inequalities.

1.3.1 Homogeneous and Inhomogeneous Pell Inequalities

The general form of a quadratic curve on the plane is

$$\displaystyle{ a_{11}x^{2} + a_{ 12}xy + a_{22}y^{2} + a_{ 13}x + a_{23}y + a_{33} = 0. }$$
(4.19)

We are interested in the integral solutions \((x,y) \in \mathbf{Z}^{2}\) of an arbitrary inequality

$$\displaystyle{ \gamma _{1} \leq a_{11}x^{2} + a_{ 12}xy + a_{22}y^{2} + a_{ 13}x + a_{23}y \leq \gamma _{2}, }$$
(4.20)

where \(\gamma _{1} <\gamma _{2}\) are given real numbers. Note that the inequality (4.20) defines a plane region, and the boundary consists of two curves of the type (4.19). In the case of negative discriminant \(D = a_{12}^{2} - 4a_{11}a_{22} < 0\), the inequality (4.20) defines a bounded region where the boundary curves are two ellipses. The case of positive discriminant \(D = a_{12}^{2} - 4a_{11}a_{22} > 0\) is much more interesting, because then the inequality (4.20) defines an unbounded region, where the boundary curves are two hyperbolas, and thus we have a chance for infinitely many integral solutions of (4.20).

For simplicity, assume that the coefficients \(a_{11},a_{12},a_{22}\) in (4.20) are integers and \(D = a_{12}^{2} - 4a_{11}a_{22} > 0\). We can factorize the quadratic part in the form

$$\displaystyle{ a_{11}x^{2} + a_{ 12}xy + a_{22}y^{2} = a_{ 11}(x -\alpha y)(x -\alpha ^{{\prime}}y), }$$
(4.21)

where

$$\displaystyle{ \alpha = \frac{-a_{12} + \sqrt{D}} {2a_{11}} \quad \mbox{ and}\quad \alpha ^{{\prime}} = \frac{-a_{12} -\sqrt{D}} {2a_{11}}. }$$
(4.22)

Using (4.21), we can rewrite (4.20) in the form

$$\displaystyle{ \gamma _{1} \leq (x -\alpha y +\rho _{1})(x -\alpha ^{{\prime}}y +\rho _{ 2}) \leq \gamma _{2}, }$$
(4.23)

where

$$\displaystyle{\rho _{1} +\rho _{2} = \frac{a_{13}} {a_{11}}\quad \mbox{ and}\quad \alpha ^{{\prime}}\rho _{ 1} +\alpha \rho _{2} = -\frac{a_{23}} {a_{11}}.}$$

Note that γ 1, γ 2 are generic numbers; the pair \(\gamma _{1},\gamma _{2}\) in (4.20) is not necessarily the same as the pair \(\gamma _{1},\gamma _{2}\) in (4.23).

Without loss of generality we can assumeFootnote 2 that \(\vert a_{12}\vert \leq a_{11} \leq \sqrt{D/3}\), and then we have \(\alpha > 0 >\alpha ^{{\prime}}\).

For simplicity, assume that the interval \([\gamma _{1},\gamma _{2}]\) is symmetric with respect to 0, so that it is of the form \([\gamma _{1},\gamma _{2}] = [-\gamma,\gamma ]\). Assume also that we are interested in the positive integral solutions of (4.23). Since α > 0 > α , for large positive x and y, the second factor \((x -\alpha ^{{\prime}}y +\rho _{2})\) in (4.23) is also large and positive, implying that the first factor \((x -\alpha y +\rho _{1})\) in (4.23) has to be very small. In other words, x has to be the nearest integer to (α yρ 1). It follows that the symmetric version of (4.20), namely

$$\displaystyle{-\gamma \leq a_{11}x^{2} + a_{ 12}xy + a_{22}y^{2} + a_{ 13}x + a_{23}y \leq \gamma,}$$

where γ > 0 is a given real number, is equivalent to the diophantine inequality

$$\displaystyle{ \Vert y\alpha -\rho _{1}\Vert < \frac{c} {y + O(1)},\quad \mbox{ where}\quad c = \frac{\gamma } {\alpha -\alpha ^{{\prime}}} = \frac{\gamma a_{11}} {\sqrt{D}}. }$$
(4.24)

Let us return to the inequality (4.20). If the linear part \(a_{13}x + a_{23}y\) in the middle is missing, i.e. \(a_{13} = a_{23} = 0\), then we have a complete theory based on Pell’s equation. More precisely, write \(Q(x,y) = a_{11}x^{2} + a_{12}xy + a_{22}y^{2}\). Then \(\gamma _{1} \leq Q(x,y) \leq \gamma _{2}\) if and only if

$$\displaystyle{Q(x,y) = m\quad \mbox{ for some $m \in \mathbf{Z}$ satisfying $\gamma _{1} \leq m \leq \gamma _{2}$}.}$$

We have a complete characterization of the integral solutions of Q(x, y) = m for any integer m as follows. For any integer m, there is a finite list of primary solutions, say, (x j , y j ), j ∈ J, where \(\vert J\vert < \infty \), such that every solution x = u, y = v of Q(x, y) = m can be written in the form

$$\displaystyle{u -\alpha v = \pm \left (\frac{u_{0} + v_{0}\sqrt{D}} {2} \right )^{n}(x_{ j} -\alpha y_{j})}$$

for some j ∈ J and n ∈ Z, where x = u 0 > 0, y = v 0 > 0 is the least positive solution of Pell’s equation \(x^{2} - Dy^{2} = 4\). As a byproduct, we deduceFootnote 3 that the number of positive integral solutions of the inequality

$$\displaystyle{\gamma _{1} \leq Q(x,y) \leq \gamma _{2},\quad 1 \leq x \leq N,\ 1 \leq y \leq N}$$

has the simple asymptotic form clogN + O(1), where \(c = c(a_{11},a_{12},a_{22},\gamma _{1},\gamma _{2})\) is a constant and the error term O(1) is uniformly bounded as N → .

Exactly the same holds if there is a non-zero linear part \(a_{13}x + a_{23}y\) in (4.20), but its effect cancels out. Note that ρ 1 in (4.23) is an integer.

Finally, if ρ 1 is not an integer, then we say that (4.23) is an inhomogeneous Pell inequality. In view of (4.24), an inhomogeneous Pell inequality (4.23) is basically equivalent to an inhomogeneous diophantine inequality

$$\displaystyle{ \Vert n\alpha -\beta \Vert < \frac{c} {n} }$$
(4.25)

with \(c =\gamma a_{11}/\sqrt{D}\), where α is a quadratic irrational defined in (4.22). The inequality (4.25) is a special case of (4.18) where \(\varphi (n)\) is a constant.

1.3.2 Some Results

One of the main results in the forthcoming book [2] describes the asymptotic behavior of the number of positive integral solutions of (4.20) for every non-square integer discriminant D > 0 and almost all \(a_{13},a_{23}\). The number of solutions

  • exhibits extra large fluctuations, proportional to the area,

  • satisfies an elegant Central Limit Theorem, and

  • satisfies a shockingly precise Law of the Iterated Logarithm ; see Theorems 3A and B below.

For notational simplicity, we formulate the results in the special case of discriminant D = 8, which corresponds to the most famous quadratic irrational \(\alpha = \sqrt{2}\).

Since the class number of the discriminant D = 8 is one, the general form of an inhomogeneous Pell inequality of discriminant D = 8 is

$$\displaystyle{ \gamma _{1} \leq (x +\beta _{1})^{2} - 2(y +\beta _{ 2})^{2} \leq \gamma _{ 2}, }$$
(4.26)

where \(\gamma _{1} <\gamma _{2}\) and \(\beta _{1},\beta _{2} \in [0,1)\) are fixed constants. For notational simplicity, we restrict ourselves to symmetric intervals [−γ, γ] in (4.26); note that everything works similarly for general intervals \([\gamma _{1},\gamma _{2}]\).

The factorization

$$\displaystyle{ (x +\beta _{1})^{2} - 2(y +\beta _{ 2})^{2} = (x +\beta -y\sqrt{2})(x +\beta ^{{\prime}} + y\sqrt{2}), }$$
(4.27)

where \(\beta =\beta _{1} -\beta _{2}\sqrt{2}\) and \(\beta ^{{\prime}} =\beta _{1} +\beta _{2}\sqrt{2}\), clearly indicates that the asymptotic number of integral solutions of (4.26) depends heavily on the local behavior of \(n\sqrt{2}\bmod 1\). In fact, (4.26) is essentially equivalent to the inhomogeneous diophantine inequality

$$\displaystyle{ \Vert n\sqrt{2}-\beta \Vert < \frac{c} {n}, }$$
(4.28)

with \(c =\gamma /2\sqrt{2}\).

To turn the vague term essentially equivalent into a precise statement, we proceed as follows. Let \(F(\sqrt{2};\beta _{1},\beta _{2};\gamma;N)\) be the number of integral solutions \((x,y) \in \mathbf{Z}^{2}\) of the inequality (4.26) with \(\gamma _{2} =\gamma\) and γ 1 = −γ satisfying 1 ≤ y ≤ N and x ≥ 1. It means counting lattice points in a long and narrow hyperbola segment. Next let \(f(\sqrt{2};\beta;c;N)\) denote the number of integral solutions n of the inequality (4.28) satisfying 1 ≤ n ≤ N, where \(\beta =\beta _{1} -\beta _{2}\sqrt{2}\). Now essentially equivalent means that for almost all pairs \(\beta _{1},\beta _{2}\), we have \(F(\sqrt{2};\beta _{1},\beta _{2};\gamma;N) - f(\sqrt{2};\beta;c;N) = O(1)\) as N → , where \(c =\gamma /2\sqrt{2}\). More precisely, we have

Lemma 1.

Let γ > 0 and β 2 be arbitrary real numbers. Then for almost all β 1 , there exists a finite \(0 < C(\beta _{1},\beta _{2},\gamma ) < \infty \) such that

$$\displaystyle{\int _{0}^{1}C(\beta _{ 1},\beta _{2},\gamma )\,\mathrm{d}\beta < \infty }$$

and

$$\displaystyle{\vert F(\sqrt{2};\beta _{1},\beta _{2};\gamma;N) - f(\sqrt{2};\beta;c;N)\vert < C(\beta _{1},\beta _{2},\gamma )}$$

for all N ≥ 1, where \(c =\gamma /2\sqrt{2}\) and \(\beta =\beta _{1} -\beta _{2}\sqrt{2}\) .

We postpone the simple proof to Sect. 4.3.

In view of Lemma 1, it suffices to study the special case β 2 = 0 and β 1 = β. We have

$$\displaystyle{ -\gamma \leq (x+\beta )^{2} - 2y^{2} \leq \gamma, }$$
(4.29)

where γ > 0 and β ∈ [0, 1) are fixed constants. For simplicity, let \(F(\sqrt{2};\beta;\gamma;N)\) denote the number of integral solutions (x, y) ∈ Z 2 of (4.29) satisfying 1 ≤ y ≤ N and x ≥ 1. Note that \(F(\sqrt{2};\beta;\gamma;N)\) counts the number of lattice points in a long and narrow hyperbola segment, or hyperbolic needle, located along a lineFootnote 4 of slope \(1/\sqrt{2}\); see Fig. 4.1.

Fig. 4.1
figure 1

A hyperbolic needle

In the special case γ = 1 and β = 0, the inequality (4.29) becomes the simplest Pell equation \(x^{2} - 2y^{2} = \pm 1\). The integral solutions \((x_{k},y_{k})\) form a cyclic group generated by the smallest positive solution x = y = 1 in the well-known way. We have \(x_{k} + y_{k}\sqrt{2} = (1 + \sqrt{2})^{k}\), implying the familiar asymptotic formula

$$\displaystyle{ F(\sqrt{2};\beta = 0;\gamma = 1;N) = \frac{\log N} {\log (1 + \sqrt{2})} + O(1), }$$
(4.30)

where \(1 + \sqrt{2}\) is the fundamental unit of the real quadratic field \(\mathbf{Q}(\sqrt{2})\).

In sharp contrast to the bounded fluctuation in the homogeneous case β = 0, the inhomogeneous case can exhibit extra large fluctuations proportional to the area; see Theorem 3 below. To explain this, first we have to compute the mean value of \(F(\sqrt{2};\beta;\gamma;N)\) as β runs through the unit interval 0 ≤ β < 1.

Lemma 2.

We have

$$\displaystyle{ \int _{0}^{1}F(\sqrt{2};\beta;\gamma;N)\,\mathrm{d}\beta = \frac{\gamma } {\sqrt{2}}\log N + O(1), }$$
(4.31)

where the implicit constant in the term O(1) is independent of N, but may depend on γ. Moreover, for an arbitrary subinterval 0 ≤ a < b ≤ 1, we have

$$\displaystyle{ \lim _{N\rightarrow \infty }\frac{ \frac{1} {b-a}\int _{a}^{b}F(\sqrt{2};\beta;\gamma;N)\,\mathrm{d}\beta } {\log N} = \frac{\gamma } {\sqrt{2}}. }$$
(4.32)

The estimates (4.31) and (4.32) express the almost trivial geometric fact that the average number of lattice points contained in all the translated copies of a given region, a hyperbola segment in our special case, is precisely the area of the region; see Lemma 5. We shall give a detailed proof of Lemma 2 in Sect. 4.3.

Now we are ready to formulate our first, and weakest, extra large fluctuation result, demonstrating that the fluctuations can be proportional to the area. This result is hardly more than a warmup for, or simplest illustration of, the main results that will come later.

Theorem 3.

For \(\gamma = \frac{1} {2}\) , there are continuum many divergence points β ∈ [0,1) in the sense that

$$\displaystyle{ \limsup _{n\rightarrow \infty }\frac{F(\sqrt{2};\beta ^{{\ast}};\gamma = 1/2;n)} {\log n} >\liminf _{n\rightarrow \infty }\frac{F(\sqrt{2};\beta ^{{\ast}};\gamma = 1/2;n)} {\log n}. }$$
(4.33)

Note that the fluctuation c 3logn in \(F(\sqrt{2};\beta ^{{\ast}};\gamma = 1/2;n)\) is as large as possible, apart from a constant factor. This follows from Lemma 4 in the next section. It is fair to say that Theorem 3 represents a sophisticated violation of the Naive Area Principle.

We postpone the proof of Theorem 3 to Sect. 4.3.

Note that Theorem 3 has a far-reaching generalization. It holds for every γ > 0, and we actually have the stronger inequality

$$\displaystyle{ \limsup _{n\rightarrow \infty }\frac{F(\sqrt{2};\beta ^{{\ast}};\gamma;n)} {\log n} > \frac{\gamma } {\sqrt{2}} >\liminf _{n\rightarrow \infty }\frac{F(\sqrt{2};\beta ^{{\ast}};\gamma;n)} {\log n}. }$$
(4.34)

We shall return to this in Sect. 4.4; see Theorem 12.

Another far-reaching generalization of Theorem 3 will be discussed in Sect. 4.9; see Theorem 21.

Finally, an extra large fluctuation type result for arbitrary point sets, instead of the set Z 2 of lattice points, will be discussed in Sect. 4.10; see Theorem 30.

We refer to these extra large fluctuation type results as superirregularity .

2 Defending the Naive Area Principle

The estimate (4.30) and inequality (4.33) display the two extreme cases: (1) the negligible bounded fluctuations around the main value which is a constant multiple of logN; and (2) the extra large fluctuations proportional to the area. But what kind of fluctuations do we have for a typical β satisfying 0 < β < 1? We show that for a typical β, the asymptotic number of solutions \(F(\sqrt{2};\beta;\gamma;N)\), as N → , justifies the Naive Area Principle. And beyond that, a more thorough look reveals randomness.

Talking about randomness, note that the two most important parameters of a random variable are the expectation, or mean value, and the variance. For the function \(F(\sqrt{2};\beta;\gamma;N)\), the estimate (4.31) gives the expectation.

Explaining why the natural scaling is exponential. Note that for any 1 < M < N, the counting function is slowly changing in the sense that

$$\displaystyle{ F(\sqrt{2};\beta;\gamma;N) - F(\sqrt{2};\beta;\gamma;M) = O(\log (N/M)), }$$
(4.35)

where c 4log(NM) is the corresponding area. The geometric reason behind this is the exponentially sparse occurrence of lattice points in the corresponding long and narrow tilted hyperbola. The proof of (4.35) is a straightforward application of Lemma 4 below.

We have the following corollary of (4.35). If M = cN, i.e. n runs through the interval cN < n < N with some constant 0 < c < 1, then the fluctuation of \(F(\sqrt{2};\beta;\gamma;N)\) is a trivial O(1). This negligible constant size change O(1) in (4.35), as n runs through cN < n < N, explains why it is more natural to switch to the exponential scaling \(F(\sqrt{2};\beta;\gamma;\mathrm{e}^{N})\). In the rest of this discussion, we shall often prefer the exponential scaling.

The variance comes from the following non-trivial result. For any γ > 0, there is a positive effective constant σ = σ(γ) > 0 such that

$$\displaystyle{\lim _{N\rightarrow \infty } \frac{1} {N}\int _{0}^{1}\left (F(\sqrt{2};\beta;\gamma;\mathrm{e}^{N}) - \frac{\gamma } {\sqrt{2}}N\right )^{2}\,\mathrm{d}\beta =\sigma ^{2}(\gamma ).}$$

The proof of this limit formula is based on a combination of Fourier analysis (Poisson summation formula , Parseval formula ) and the arithmetic of the quadratic number field \(\mathbf{Q}(\sqrt{2})\); see [2].

The first probabilistic result, nicely fitting the general scheme of determinism vs. randomness, is the following; for the proof, see [2].

Theorem A (Central Limit Theorem).

The renormalized counting function

$$\displaystyle{\frac{F(\sqrt{2};\beta;\gamma;\mathrm{e}^{N}) - (\gamma /\sqrt{2})N} {\sigma (\gamma )\sqrt{N}},\quad 0 \leq \beta < 1,}$$

has a standard normal limit distribution as N →∞.

To give at least some vague intuition behind Theorem A, we write

$$\displaystyle{G_{j}(\beta ) = F(\sqrt{2};\beta;\gamma;\mathrm{e}^{j}) - F(\sqrt{2};\beta;\gamma;\mathrm{e}^{j-1}),\quad j = 1,2,\ldots,N.}$$

In other words, G j (β) is the number of integral solutions n ∈ N of (4.29) satisfying \(\mathrm{e}^{j-1} < n \leq \mathrm{ e}^{j}\).

Note that G j (β) is a bounded function. This follows from Lemma 4 below, and from the obvious geometric fact that any short hyperbola segment corresponding to G j is basically a rectangle. More precisely, any short hyperbola segment corresponding to G j can be approximated by an inscribed rectangle R 1 of slope \(1/\sqrt{2}\) and a circumscribed rectangle R 2 of slope \(1/\sqrt{2}\) such that the ratio of the two areas is uniformly bounded by an absolute constant.

It is time now to formulate

Lemma 4.

Every tilted rectangle of slope \(1/\sqrt{2}\) and area \(\frac{1} {5}\) contains at most one lattice point.

We postpone the proof of this simple but important result to the next section.

Lemma 4 can be easily generalized. The same proof gives that for any quadratic irrational α, there is a positive constant \(c_{5} = c_{5}(\alpha ) > 0\) such that every tilted rectangle of slope α and area c 5 contains at most one lattice point.

Our key intuition is that the bounded function G j (β) resembles the j-th Rademacher function , so the sum

$$\displaystyle{F(\sqrt{2};\beta;\gamma;\mathrm{e}^{N}) - \frac{\gamma } {\sqrt{2}}N =\sum _{ j=1}^{N}\left (G_{ j}(\beta ) - \frac{\gamma } {\sqrt{2}}\right ),}$$

as a function of β ∈ [0, 1), behaves like a sum of N independent Bernoulli variables

$$\displaystyle{F(\sqrt{2};\beta;\gamma;\mathrm{e}^{N}) - \frac{\gamma } {\sqrt{2}}N \approx \underbrace{\mathop{\pm 1 \pm 1 \pm \ldots \pm 1}}\limits _{N},}$$

referred to often as an N-step random walk .

Our next result, Theorem B, can be interpreted as a variant of Khintchine’s famous Law of the Iterated Logarithm in probability theory ; see [21]. We show that the number of solutions \(F(\sqrt{2};\beta;\gamma;\mathrm{e}^{n})\) of (4.29) oscillates between the sharp bounds

$$\displaystyle\begin{array}{rcl} \frac{\gamma } {\sqrt{2}}n -\sigma \sqrt{n}\sqrt{(2+\varepsilon )\log \log n}& <& F(\sqrt{2};\beta;\gamma;\mathrm{e}^{n}) \\ & <& \frac{\gamma } {\sqrt{2}}n +\sigma \sqrt{n}\sqrt{(2+\varepsilon )\log \log n},\qquad {}\end{array}$$
(4.36)

where \(\varepsilon > 0\), as n →  for almost all β. Note that (4.36) fails with \(2-\varepsilon\) in place of \(2+\varepsilon\), where \(\varepsilon > 0\). Here the main term \((\gamma /\sqrt{2})n\) means the area, so (4.36) can be considered a highly sophisticated justification of the Naive Area Principle.

The estimate (4.36) is particularly interesting in view of the fact that the classical Circle Problem is unsolved, and seems to be hopeless by current techniques. What (4.36) means is that we can solve a Hyperbola Problem instead of the Circle Problem. More precisely, we can prove for long and narrow tilted hyperbola segments what nobody can prove for large concentric circles. Namely, we can show that for almost all centers, i.e. for almost all values of the translation parameter β, the number of lattice points asymptotically equals the area plus an error which, even in the worst case scenario, is about the square root of the area. For circles the corresponding maximum error should be the square root of the circumference.

The Law of the Iterated Logarithm is one of the most famous results in classical probability theory, and describes the maximum fluctuation in the infinite one-dimensional random walk . The term infinite random walk refers to an infinite sequence of random Bernoulli trials, where each trial is tossing a fair coin. Of course, coin tossing belongs to the physical world; it is not a mathematical concept. But there is a well-known pure mathematical problem, which is considered equivalent. We can study the digit distribution of a typical real number written in binary form

$$\displaystyle{\beta = \frac{b_{1}} {2} + \frac{b_{2}} {2^{2}} + \frac{b_{3}} {2^{3}}+\ldots,}$$

where each b i  = 0 or 1; here we have assumed for simplicity that 0 < β < 1. The infinite 0-1 sequence

$$\displaystyle{b_{1} = b_{1}(\beta ),b_{2} = b_{2}(\beta ),b_{3} = b_{3}(\beta ),\ldots,}$$

i.e. the sequence of binary digits of 0 < β < 1, represents an infinite heads-and-tails sequence, say, with 1 as heads and 0 as tails. The sum

$$\displaystyle{B_{n} = B_{n}(\beta ) = b_{1} + b_{2} + b_{3} +\ldots +b_{n}}$$

counts the number of 1’s, or heads, among the first n binary digits of 0 < β < 1. Borel’s classical theorem about normal numbers asserts that

$$\displaystyle{\frac{B_{n}(\beta )} {n} \rightarrow \frac{1} {2}\quad \mbox{ for almost all $0 <\beta < 1$}.}$$

Let \(S_{n} = S_{n}(\beta )\) denote the corresponding error term

$$\displaystyle{S_{n} = S_{n}(\beta ) = 2B_{n}(\beta ) - n = \mbox{ number of heads} -\mbox{ number of tails},}$$

so that \(S_{n} = S_{n}(\beta )\) represents the number of heads minus the number of tails among the first n random trials, or coin tosses.

A well-known theorem of Khintchine [21] asserts that

$$\displaystyle{\limsup _{n\rightarrow \infty } \frac{S_{n}(\beta )} {\sqrt{2n\log \log n}} = 1\quad \mbox{ for almost all $0 <\beta < 1$}.}$$

Note that Khintchine’s Theorem is a far-reaching quantitative improvement on Borel’s famous theorem on normal numbers. The long form of Khintchine’s Theorem says that for any \(\varepsilon > 0\) and almost all β, we have the following two statements:

  • \(S_{n}(\beta ) < (1+\varepsilon )\sqrt{2n\log \log n}\) for all sufficiently large values of n; and

  • \(S_{n}(\beta ) > (1-\varepsilon )\sqrt{2n\log \log n}\) for infinitely many values of n.

This strikingly elegant and precise result is the simplest form of the so-called Law of the Iterated Logarithm , usually called Khintchine’s form .

Let us return to (4.36). The fact that it is an analog of Khintchine’s Law of the Iterated Logarithm suggests the vague intuition that the lattice point counting function \(F(\sqrt{2};\beta;\gamma;\mathrm{e}^{n})\) behaves like a generalized digit sum as β runs through 0 < β < 1.

What we are going to actually formulate below are two generalizations or refinements of (4.36); see Theorem B. The first generalization is that for almost all β, (4.36) holds for all γ, or in general, for all intervals \([\gamma _{1},\gamma _{2}]\). This is a variant of the so-called Cassels’s form of the Law of the Iterated Logarithm ; see [12].

The second generalization of (4.36) is the Kolmogorov–Erdős form , an ultimate convergence-divergence criterion, which contains Khintchine’s form as a simple corollary; see [14, 15, 22].

Theorem B (Law of the Iterated Logarithm).

  1. (i)

    Let \(\varepsilon > 0\) be an arbitrarily small but fixed constant. Then for almost all β,

    $$\displaystyle\begin{array}{rcl} \frac{\gamma } {\sqrt{2}}n -\sigma \sqrt{(2+\varepsilon )n\log \log n}& <& F(\sqrt{2};\beta;\gamma;\mathrm{e}^{n}) \\ & <& \frac{\gamma } {\sqrt{2}}n +\sigma \sqrt{(2+\varepsilon )n\log \log n}\qquad {}\end{array}$$
    (4.37)

    holds for all γ > 0 and for all sufficiently large n, i.e. for all n > n 0 (β,γ).

  2. (ii)

    Let \(\varphi (n)\) be an arbitrary positive increasing function of n. Let γ > 0 be fixed. Then for almost all β,

    $$\displaystyle{F(\sqrt{2};\beta;\gamma;\mathrm{e}^{n}) > \frac{\gamma } {\sqrt{2}}n +\varphi (n)\sigma \sqrt{n}}$$

    holds for infinitely many values of n if and only if the series

    $$\displaystyle{ \sum _{n=1}^{\infty }\frac{\varphi (n)} {n} \mathrm{e}^{-\varphi ^{2}(n)/2 } }$$
    (4.38)

    diverges. The same conclusion holds for the other inequality

    $$\displaystyle{F(\sqrt{2};\beta;\gamma;\mathrm{e}^{n}) < \frac{\gamma } {\sqrt{2}}n -\varphi (n)\sigma \sqrt{n}.}$$

Note that (4.37) is sharp in the sense that \(2+\varepsilon\) cannot be replaced by \(2-\varepsilon\).

Remarks.

  1. (i)

    By Lemma 1, we have \(f(\sqrt{2};\beta;c;N) = F(\sqrt{2};\beta;\gamma;N) + O(1)\) as N → , where \(c =\gamma /2\sqrt{2}\). So Lemma 1 implies that Theorems A and B remain true if \(F(\sqrt{2};\beta;\gamma;N)\) is replaced by the number of solutions \(f(\sqrt{2};\beta;c;N)\) of the inhomogeneous diophantine inequality (4.28).

  2. (ii)

    In Theorem B(i), there is a dramatic difference between rational β and almost all β. For every rational β, the counting function has the form

    $$\displaystyle{F(\sqrt{2};\beta;\gamma;N) = c(\gamma )\log N + O(1)\quad \mbox{ as $N \rightarrow \infty $}}$$

    for all γ > 0, and it remains valid if \(\sqrt{2}\) is replaced by any quadratic irrational. This bounded size fluctuation around the main term clogN, which is typically not the area, jumps up considerably. By (4.37), we have square root size fluctuations around the main term, which is the area, so the fluctuations have size the square root of the area, and this holds for almost all β and all γ > 0.

Let us return to (4.36). It is a special case of Theorem B(ii) with

$$\displaystyle{\varphi (n) = ((2\pm \varepsilon )\log \log n)^{1/2}.}$$

Indeed, the series (4.38) is divergent or convergent depending on whether we have \(2+\varepsilon\) or \(2-\varepsilon\) in the definition of \(\varphi (n)\).

We can obtain a much more delicate result by choosing a large integer k ≥ 4 and writing

$$\displaystyle{\varphi (n) = \left (2\log _{2}n + 2\log _{3}n + 2\log _{4}n +\ldots +2\log _{k-1}n + (2\pm \varepsilon )\log _{k}n\right )^{1/2}.}$$

Beware that here, and here only, we use the space-saving notation \(\log _{2}n =\log \log n\), i.e. it means the iterated logarithm instead of the usual meaning as base 2 logarithm, and in general, \(\log _{k}n =\log (\log _{k-1}n)\) denotes the k-times iterated logarithm of n. With this choice of \(\varphi (n)\), we have

$$\displaystyle{\sum _{n=1}^{\infty }\frac{\varphi (n)} {n} \mathrm{e}^{-\varphi ^{2}(n)/2 } \approx \sum _{n} \frac{1} {n\log n\log _{2}n\log _{3}n\ldots \log _{k-1}n(\log _{k}n)^{1\pm \varepsilon /2}},}$$

which is divergent or convergent depending on whether we have \(2+\varepsilon\) or \(2-\varepsilon\) in the definition of \(\varphi (n)\).

This example clearly illustrates the remarkable precision of Theorem B(ii).

Next we focus on a simple consequence of Theorem B. Let c > 0 be arbitrarily small but fixed. Then by Theorem B, the inhomogeneous diophantine inequality

$$\displaystyle{ \Vert n\sqrt{2}-\beta \Vert < \frac{c} {n} }$$
(4.39)

has infinitely many integer solutions n ≥ 1 for almost all β, in the sense of the Lebesgue measure .

Inequality (4.39) corresponds to the hyperbola segment

$$\displaystyle{\vert y-\beta \vert < \frac{c} {x},\quad x \geq 1,}$$

where β is fixed, and this has infinite area. But we may go further, and consider smaller regions

$$\displaystyle{\vert y-\beta \vert < \frac{1} {x\log x},\quad \vert y-\beta \vert < \frac{1} {x\log x\log \log x},}$$

and the like. They all have infinite area, since

$$\displaystyle{\int _{\mathrm{e}}^{N}\frac{\mathrm{d}x} {x\log x} =\log \log N\quad \mbox{ and}\quad \int _{\mathrm{e}^{\mathrm{e}}}^{N} \frac{\mathrm{d}x} {x\log x\log \log x} =\log \log \log N,}$$

and the rest all tend to infinity as N → . It is very natural, therefore, to ask the following question.

Question.

Consider the inequalities

$$\displaystyle{ \Vert n\sqrt{2}-\beta \Vert < \frac{c} {n\log n},\quad n \geq n_{1}, }$$
(4.40)
$$\displaystyle{ \Vert n\sqrt{2}-\beta \Vert < \frac{c} {n\log n\log \log n},\quad n \geq n_{2}, }$$
(4.41)

and so on, where 0 ≤ β < 1 is a fixed constant. Is it true that for almost all β, in the sense of the Lebesgue measure, the inequalities (4.40), (4.41) and the like have infinitely many positive integer solutions n?

Well, the answer is affirmative.

Theorem C (Area Principle for \(\sqrt{\mathbf{2}}\)).

Let ψ(x) be any positive decreasing function of the real variable x satisfying

$$\displaystyle{ \sum _{n=1}^{\infty }\psi (n) = \infty. }$$
(4.42)

Then the inhomogeneous inequality

$$\displaystyle{\Vert n\sqrt{2}-\beta \Vert <\psi (n)}$$

has infinitely many integral solutions for almost all 0 ≤β < 1, in the sense of Lebesgue measure.

Furthermore, there is an interesting generalization of Theorem C where \(\sqrt{ 2}\) is replaced by any real α.

To explain this generalization, Theorem D below, we recall the basic question of diophantine approximation . We want to decide whether an inequality

$$\displaystyle{\left \vert \alpha -\frac{p} {q}\right \vert < \frac{1} {q^{2}},\quad \mbox{ or equivalently},\quad \vert q\alpha - p\vert < \frac{1} {q},}$$

with integers p and q, or more generally, an inequality

$$\displaystyle{ \Vert q\alpha \Vert <\psi (q), }$$
(4.43)

where ψ(q) is a positive decreasing function of q, has infinitely many integral solutions in q, and if this is the case, to determine the solutions, or at least the asymptotic number of integral solutions.

It is perfectly natural to study the inhomogeneous analog of (4.43), the inequality

$$\displaystyle{ \Vert q\alpha -\beta \Vert <\psi (q), }$$
(4.44)

where β is an arbitrary fixed real number. Of course, we may assume that 0 ≤ β < 1.

Is there any connection between the solvability of the homogeneous inequality (4.43) and the inhomogeneous inequality (4.44)? Theorem C is about the special case \(\alpha = \sqrt{2}\), and it justifies the Naive Area Principle. Recall that the Naive Area Principle is a vague intuition claiming that a nice region of infinite area must contain infinitely many lattice points. We know that the Naive Area Principle is false for the hyperbolic region \(-\frac{1} {2} \leq x^{2} - 2y^{2} \leq \frac{1} {2}\), which has infinite area and contains only one lattice point, namely the origin. This Pell inequality is basically equivalent to the diophantine inequality

$$\displaystyle{ \Vert q\sqrt{2}\Vert < \frac{c} {q}, }$$
(4.45)

with \(c \leq 2^{-5/2}\), and (4.45) does not have infinitely many integral solutions in q if the constant c < 2−5∕2.

The failure of the Naive Area Principle for (4.45) is compensated by the success of the Naive Area Principle for the inhomogeneous inequality

$$\displaystyle{\Vert q\sqrt{2}-\beta \Vert <\psi (q),}$$

which has infinitely many integral solution q for almost all β, provided that ψ(x) is any positive decreasing function of the real variable x satisfying (4.42). This is the statement of Theorem C. The next result generalizes the special case \(\alpha = \sqrt{2}\) to arbitrary real α.

Theorem D (General Area Principle).

Let ψ(x) be any positive decreasing function of the real variable x satisfying (4.42) . For any real number α, at least one of the following two cases always holds:

  1. (i)

    The homogeneous inequality (4.43) has infinitely many integral solutions.

  2. (ii)

    The inhomogeneous inequality (4.44) has infinitely many integral solutions for almost all 0 ≤β < 1, in the sense of Lebesgue measure.

Remark.

Note that divergence condition (4.42) is necessary. Indeed, if

$$\displaystyle{ \sum _{n=1}^{\infty }\psi (n) < \infty, }$$
(4.46)

then the set of pairs (α, β), for which the inequality (4.44) has infinitely many integral solutions q, has two-dimensional Lebesgue measure zero. This statement immediately follows from the statement that for every fixed β, the set of α which satisfy (4.44) for infinitely many q has Lebesgue measure zero. The second statement has an easy proof as follows. Every such α in 0 < α < 1 is contained in infinitely many intervals of the form

$$\displaystyle{\left [\frac{p+\beta } {q} -\frac{\psi (q)} {q}, \frac{p+\beta } {q} + \frac{\psi (q)} {q} \right ]}$$

with integers q ≥ N and 1 ≤ p ≤ q, and the total length of these intervals is less than

$$\displaystyle{2\sum _{q\geq N}\psi (q),}$$

which by (4.46) tends to zero as N → . This means that Theorem D is a precise convergence-divergence type result, or we may call it a zero-one law, to borrow a well-known concept from probability theory.

Let us return to the inhomogeneous inequality (4.44). If α is rational and β is irrational, then (4.44) has only finitely many integral solutions for any ψ(q) → 0 as q → . Well, this is trivial. It is less trivial to find an irrational α and a decreasing function ψ(x) satisfying (4.42) such that for almost all β, (4.44) has only finitely many integral solutions. We can take any irrational 0 < α < 1 with sufficiently large partial quotients in the sense that

$$\displaystyle{\alpha = \frac{1} {a_{1}+} \frac{1} {a_{2}+}\ldots = [a_{1},a_{2},a_{3},\ldots ],}$$

where

$$\displaystyle{ a_{k} \approx k^{(\log k)^{2} }, }$$
(4.47)

and take

$$\displaystyle{ \psi (q) = \frac{1} {q\log q}. }$$
(4.48)

Then the denominator q k of the k-th convergent of α is roughly

$$\displaystyle{ q_{k} \approx a_{1}a_{2}\ldots a_{k} \approx k^{k(\log k)^{2} }, }$$
(4.49)

and so

$$\displaystyle{\sum _{k} \frac{1} {\log q_{k}} = O\left (\sum _{k} \frac{1} {k(\log k)^{3}}\right ) < \infty.}$$

We recall the well-known fact

$$\displaystyle{\left \vert \alpha -\frac{p_{k}} {q_{k}}\right \vert < \frac{1} {q_{k}q_{k+1}}}$$

which implies

$$\displaystyle{ \left \vert n\alpha -\frac{np_{k}} {q_{k}} \right \vert < \frac{n} {q_{k}q_{k+1}}. }$$
(4.50)

If \(q_{k} \leq n < q_{k+1}k^{-2}\) and

$$\displaystyle{\Vert n\alpha -\beta \Vert < \frac{1} {n\log n},}$$

then by (4.49) and (4.50), we have

$$\displaystyle{ \left \Vert \beta -\frac{np_{k}} {q_{k}} \right \Vert < \frac{1} {k^{2}q_{k}} + \frac{1} {n\log n} < \frac{2} {k(\log k)^{3}q_{k}}. }$$
(4.51)

If \(q_{k+1}k^{-2} \leq n < q_{k+1}\), then define the set

$$\displaystyle{ A_{k} =\bigcup _{n}\left [n\alpha - \frac{1} {n\log n},n\alpha + \frac{1} {n\log n}\right ]\quad \bmod 1, }$$
(4.52)

where the summation in (4.52) is extended over all n with \(q_{k+1}k^{-2} \leq n < q_{k+1}\). Motivated by (4.51), define the set

$$\displaystyle{ B_{k} =\bigcup _{0\leq j<q_{k}}\left [ \frac{j} {q_{k}} - \frac{2} {k(\log k)^{3}q_{k}}, \frac{j} {q_{k}} + \frac{2} {k(\log k)^{3}q_{k}}\right ]\quad \bmod 1. }$$
(4.53)

Clearly

$$\displaystyle{ \sum _{k}\mathrm{meas}(B_{k}) \leq \sum _{k} \frac{4} {k(\log k)^{3}} < \infty, }$$
(4.54)

where meas denotes the usual Lebesgue measure, and

$$\displaystyle{ \sum _{k}\mathrm{meas}(A_{k}) = O\left (\sum _{k} \frac{\log (k^{2})} {k(\log k)^{3}}\right ) = O\left (\sum _{k} \frac{1} {k(\log k)^{2}}\right ) < \infty. }$$
(4.55)

It follows from (4.54) and (4.55) that almost all β are contained in only a finite number of A k and in a finite number of B k . In view of (4.51)–(4.53), this implies that for almost all β, the inequality (4.44) has only finitely many integral solutions, where α and ψ are defined by (4.47) and (4.48).

For the proofs of Theorems A and B, we refer the reader to the forthcoming book [2]. For the proofs of Theorems C and D, see the recent paper [8]. This section was a detour, or rather a counterpart; the rest of the chapter is about extra large fluctuations, i.e. sophisticated violations of the Naive Area Principle.

The next section is technical, and contains the proofs of Theorem 3 and Lemmas 14. The truly interesting new results come later, starting in Sect. 4.4.

3 Proving Theorem 3 and the Lemmas

Proof of Lemma 2.

First we establish the estimate (4.31). Consider the hyperbolic needle \(H_{N}(\gamma ) = H_{N}(\sqrt{2};\gamma )\), defined by

$$\displaystyle{ H_{N}(\gamma ) =\{ (x,y) \in \mathbf{R}^{2}: -\gamma \leq x^{2} - 2y^{2} \leq \gamma,\ 1 \leq x + y\sqrt{2} \leq 2\sqrt{2}N\}. }$$
(4.56)

Comparing (4.11) with (4.56), we see that

$$\displaystyle{H_{N}(\gamma ) = H^{{\ast}}(\sqrt{2};[-\gamma,\gamma ];N),}$$

so by (4.13), we deduce that

$$\displaystyle{ \mathrm{area}(H_{N}(\gamma )) = \frac{\gamma } {\sqrt{2}}\log N + O(1). }$$
(4.57)

Next we need the following almost trivial result.

Lemma 5.

Let \(A \subset \mathbf{R}^{2}\) be a Lebesgue measurable set in the plane with finite measure denoted by area (A). Then

$$\displaystyle{\int _{0}^{1}\int _{ 0}^{1}\vert (A + \mathbf{x}) \cap \mathbf{Z}^{2}\vert \,\mathrm{d}\mathbf{x} =\mathrm{ area}(A),}$$

where A + x denotes the translation of the set A by the vector x R 2 .

Now by Lemma 5, we have

$$\displaystyle{ \int _{0}^{1}\int _{ 0}^{1}\vert (H_{ N}(\gamma ) + \mathbf{v}) \cap \mathbf{Z}^{2}\vert \,\mathrm{d}\mathbf{v} =\mathrm{ area}(H_{ N}(\gamma )). }$$
(4.58)

If \(\mathbf{v} = (v_{1},v_{2}) \in [0,1)^{2}\) is chosen in such a way that \(v_{1} - v_{2}\sqrt{2} \equiv \beta \bmod 1\) is fixed, then clearly

$$\displaystyle{ \vert F(\sqrt{2};\beta;\gamma;N) -\vert (H_{N}(\gamma ) + \mathbf{v}) \cap \mathbf{Z}^{2}\vert \vert < c_{ 6}(\gamma ), }$$
(4.59)

where c 6(γ) is a constant independent of β and N. The estimate (4.31) follows on combining (4.57)–(4.59).

Next we prove (4.32). Let 0 ≤ a < b ≤ 1 be fixed. For any M ≥ 1, consider the parallelogram

$$\displaystyle{\mathcal{P}_{M} =\{ \mathbf{v} = (v_{1},v_{2}) \in \mathbf{R}^{2}: a \leq v_{ 1} - v_{2}\sqrt{2} \leq b,\ 0 \leq v_{1} + v_{2}\sqrt{2} \leq M\}.}$$

If M is large, then \(\mathcal{P}_{M}\) is a long and narrow parallelogram, but we can then turn it into a round shape by applying an appropriate automorphism of the quadratic form \(x^{2} - 2y^{2}\). The substitution x 1 = x + 2y, y 1 = x + y is a fundamental automorphism,Footnote 5 and writing

$$\displaystyle{A = \left (\begin{array}{cc} 1&2\\ 1 &1 \end{array} \right ),}$$

we note that A k, k ∈ Z, give rise to infinitely many automorphisms preserving the lattice points and the area. The eigenvectors of the matrix A are parallel to the sides of parallelogram \(\mathcal{P}_{M}\), so on applying an appropriate power A k on the long and narrow parallelogram \(\mathcal{P}_{M}\), we obtain a round parallelogram \(A^{k}\mathcal{P}_{M}\) with sides parallel to that of \(\mathcal{P}_{M}\), and

$$\displaystyle{\mathrm{area}(A^{k}\mathcal{P}_{ M}) =\mathrm{ area}(\mathcal{P}_{M}) = c_{7}M.}$$

Here round means that the diameter of parallelogram \(A^{k}\mathcal{P}_{M}\) is \(O(\sqrt{M})\), so the number of unit squares [0, 1)2 +n, n ∈ Z 2, intersecting the boundary of \(A^{k}\mathcal{P}_{M}\) is \(O(\sqrt{M})\).

Combining this geometric fact with (4.58), we have

$$\displaystyle{ \frac{1} {\mathrm{area}(\mathcal{P}_{M})}\int _{\mathcal{P}_{M}}\vert (H_{N}(\gamma ) + \mathbf{v}) \cap \mathbf{Z}^{2}\vert \,\mathrm{d}\mathbf{v} =\mathrm{ area}(H_{ N}(\gamma ))(1 + O(M^{-1/2})). }$$
(4.60)

If \(\mathbf{v} = (v_{1},v_{2}) \in [0,1)^{2}\) is chosen in such a way that \(v_{1} - v_{2}\sqrt{2} \equiv \beta \bmod 1\) is fixed, then clearly

$$\displaystyle{ \vert F(\sqrt{2};\beta;\gamma;N) -\vert (H_{N}(\gamma ) + \mathbf{v}) \cap \mathbf{Z}^{2}\vert \vert < c_{ 8}(\gamma,M), }$$
(4.61)

where c 8(γ, M) is a constant independent of β and N. Combining (4.57), (4.60) and (4.61), we have

$$\displaystyle\begin{array}{rcl} & & \frac{ \frac{1} {b-a}\int _{a}^{b}F(\sqrt{2};\beta;\gamma;N)\,\mathrm{d}\beta } {\log N} \\ & & \quad = \left ( \frac{\gamma } {\sqrt{2}} + O\left ( \frac{1} {\log N}\right )\right )(1 + O(M^{-1/2})) + \frac{c_{8}(\gamma,M)} {\log N}.{}\end{array}$$
(4.62)

Since M can be arbitrarily large, (4.62) implies (4.32). The proof of Lemma 2 is now complete. □ 

Proof of Lemma 5.

First assume that A is bounded. Let N be a large integer. In view of the periodicity of Z 2, we have

$$\displaystyle{\int _{0}^{N}\int _{ 0}^{N}\vert (A + \mathbf{x}) \cap \mathbf{Z}^{2}\vert \,\mathrm{d}\mathbf{x} = N^{2}\int _{ 0}^{1}\int _{ 0}^{1}\vert (A + \mathbf{x}) \cap \mathbf{Z}^{2}\vert \,\mathrm{d}\mathbf{x}.}$$

On the other hand,

$$\displaystyle\begin{array}{rcl} \int _{0}^{N}\int _{ 0}^{N}\vert (A + \mathbf{x}) \cap \mathbf{Z}^{2}\vert \,\mathrm{d}\mathbf{x}& =& \sum _{\mathbf{ n}\in \mathbf{Z}^{2}}\mathrm{area}\{\mathbf{x} \in [0,N]^{2}: \mathbf{n} \in A + \mathbf{x}\} {}\\ & =& \sum _{\mathbf{n}\in \mathbf{Z}^{2}}\mathrm{area}\{(\mathbf{n} - A) \cap [0,N]^{2}\}. {}\\ \end{array}$$

Without loss of generality, we can assume that the origin is inside A. Let d(A) denote the diameter of A. Then \((\mathbf{n} - A) \subset [0,N]^{2}\) if n ∈ [d(A), Nd(A)]2. On the other hand, \((\mathbf{n} - A) \cap [0,N]^{2} = \varnothing \) if \(\mathbf{n}\not\in [-d(A),N + d(A)]^{2}\). Thus we have

$$\displaystyle{(N +2d(A))^{2} \cdot \mathrm{area}(A) \geq \sum _{\mathbf{ n}\in \mathbf{Z}^{2}}\mathrm{area}\{(\mathbf{n}-A)\cap [0,N]^{2}\} \geq (N -2d(A))^{2} \cdot \mathrm{area}(A).}$$

Dividing the last inequalities by N 2, and combining with the equations above, we see that Lemma 5 follows as N tends to infinity. If A is unbounded, then we approximate A by an increasing sequence \(A_{1} \subset A_{2} \subset A_{3} \subset \ldots\) of subsets of A such that each A k is bounded and \(\mathrm{area}(A\setminus A_{k}) \rightarrow 0\). The last step is then to use the continuity of the Lebesgue measure. □ 

Proof of Lemma 1.

For notational simplicity, we restrict our proof to the special case β 2 = 0; the general case is the same. Again the key step is to apply Lemma 5. For 1 ≤ K < L ≤ , consider the four regions

$$\displaystyle\begin{array}{rcl} H_{K,L}(\beta;\gamma )& =& \{(x,y) \in \mathbf{R}^{2}: -\gamma \leq (x+\beta )^{2} - 2y^{2} \leq \gamma,\ K \leq y \leq L,\ x > 0\}, {}\\ \tilde{H}_{K,L}(\beta;\gamma )& =& \{(x,y) \in \mathbf{R}^{2}: 2\sqrt{2}y\vert x +\beta -y\sqrt{2}\vert <\gamma,\ K \leq y \leq L,\ x > 0\}, {}\\ \tilde{H}_{K,L}^{+}(\beta;\gamma )& =& \{(x,y) \in \mathbf{R}^{2}: (2\sqrt{2}y+1)\vert x+\beta -y\sqrt{2}\vert <\gamma,\ K \leq y \leq L,\ x > 0\}, {}\\ \tilde{H}_{K,L}^{-}(\beta;\gamma )& =& \{(x,y) \in \mathbf{R}^{2}: (2\sqrt{2}y - 1)\vert x+\beta - y\sqrt{2}\vert <\gamma,\ K \leq y \leq L,\ x>0\}. {}\\ \end{array}$$

In view of the factorization (4.27), the condition \((x,y) \in H_{K,L}(\beta;\gamma )\) gives the estimate \(x+\beta = y\sqrt{2} + o(1)\). In fact, we have the stronger form \(x+\beta = y\sqrt{2} + O(1/y)\). Thus there is a threshold \(c_{9} = c_{9}(\gamma )\) such that

$$\displaystyle{\tilde{H}_{K,L}^{+}(\beta;\gamma ) \subset H_{ K,L}(\beta;\gamma ) \subset \tilde{ H}_{K,L}^{-}(\beta;\gamma )}$$

for all L > K > c 9(γ). On the other hand, it is trivial that

$$\displaystyle{\tilde{H}_{K,L}^{+}(\beta;\gamma ) \subset \tilde{ H}_{ K,L}(\beta;\gamma ) \subset \tilde{ H}_{K,L}^{-}(\beta;\gamma ).}$$

Consider now the special case K = 1, L = , β = 0, and study the difference set

$$\displaystyle{D(\gamma ) =\tilde{ H}_{1,\infty }^{-}(0;\gamma )\setminus \tilde{H}_{ 1,\infty }^{+}(0;\gamma ).}$$

The area of this difference set can be estimated by

$$\displaystyle\begin{array}{rcl} \mathrm{area}(D(\gamma ))& =& O\left (\int _{1}^{\infty }\left ( \frac{1} {2\sqrt{2}y - 1} - \frac{1} {2\sqrt{2}y + 1}\right )\mathrm{d}y\right ) {}\\ & =& O\left (\int _{1}^{\infty } \frac{\mathrm{d}y} {8y^{2} - 1}\right ) = O(1). {}\\ \end{array}$$

Combining this with Lemma 5, we have

$$\displaystyle{ \int _{0}^{1}\int _{ 0}^{1}\vert (D(\gamma ) + \mathbf{v}) \cap \mathbf{Z}^{2}\vert \,\mathrm{d}\mathbf{v} =\mathrm{ area}(D(\gamma )) < \infty. }$$
(4.63)

If \(\mathbf{v} = (v_{1},v_{2}) \in [0,1)^{2}\) is chosen in such a way that \(v_{1} - v_{2}\sqrt{2} \equiv \beta \bmod 1\) is fixed, then

$$\displaystyle{ D(\gamma ) + \mathbf{v} \supset H_{K,L}(\beta;\gamma )\varDelta \tilde{H}_{K,L}^{+}(\beta;\gamma ), }$$
(4.64)

where \(A\varDelta B = (A\setminus B) \cup (B\setminus A)\) denotes the symmetric difference of the sets A and B. Combining (4.63) and (4.64), Lemma 1 follows easily. □ 

Proof of Lemma 4.

Consider a rectangle of slope \(1/\sqrt{2}\) which contains two lattice points P = (k, ) and Q = (m, n); in fact, assume that P, Q are two vertices of the rectangle. We denote the vector from P to Q by v = (mk, n), and consider the two perpendicular unit vectors

$$\displaystyle{\mathbf{e}_{1} = \left (\frac{\sqrt{2}} {\sqrt{3}}, \frac{1} {\sqrt{3}}\right )\quad \mbox{ and}\quad \mathbf{e}_{2} = \left ( \frac{1} {\sqrt{3}},-\frac{\sqrt{2}} {\sqrt{3}}\right ).}$$

Then the two side lengths a and b of the rectangle can be expressed in terms of the inner products

$$\displaystyle{a =\vert \mathbf{e}_{1} \cdot \mathbf{v}\vert = \frac{\vert p\sqrt{2} + q\vert } {\sqrt{3}} \quad \mbox{ and}\quad b =\vert \mathbf{e}_{2} \cdot \mathbf{v}\vert = \frac{\vert p - q\sqrt{2}\vert } {\sqrt{3}},}$$

where p = mk and q = n. Thus we have

$$\displaystyle{\mathrm{area} = ab = \frac{\vert (p\sqrt{2} + q)(p - q\sqrt{2})\vert } {3}.}$$

Without loss of generality, we can assume that p ≥ 0 and q ≥ 0. Since (p, q) ≠ (0, 0), we have \(\vert p - q\sqrt{2}\vert = 1/(p + q\sqrt{2})\), and so

$$\displaystyle{\mathrm{area} = \frac{\vert (p\sqrt{2}+q)(p - q\sqrt{2})\vert } {3} = \frac{p\sqrt{2} + q} {3(p+q\sqrt{2})} \geq \frac{p + q} {3(p\sqrt{2}+q\sqrt{2})} = \frac{1} {3\sqrt{2}} > \frac{1} {5},}$$

proving Lemma 4. □ 

Proof of Theorem 3.

We shall show that the set of numbers β in question, the set of divergence points, contains a Cantor set . This guarantees that the cardinality of the set is continuum.

We make a standard Cantor set construction, i.e. we apply the method of nested intervals . For notational convenience, we write \(F(\sqrt{2};\beta;\gamma;N) = F(\beta;\gamma;N)\). By (4.31), we have

$$\displaystyle{\int _{0}^{1}F(\beta;\gamma;N)\,\mathrm{d}\beta = \frac{\gamma } {\sqrt{2}}\log N + O(1).}$$

Applying this with \(\gamma = \frac{1} {4}\), we obtain the existence of 0 < β 1 < 1 and an arbitrarily large integer N 1 such that

$$\displaystyle{F(\beta _{1};\gamma = 1/4;N_{1}) > \frac{1} {8}\log N_{1}.}$$

Since \(\frac{1} {4} < \frac{1} {2}\), there exists an interval I 1 = [a, b] with 0 < a < b < 1 such that \(\beta _{1} \in I_{1}\) and

$$\displaystyle{ F(\beta;\gamma = 1/2;N_{1}) > \frac{1} {8}\log N_{1}\quad \mbox{ for all $\beta \in I_{1}$}. }$$
(4.65)

Next let \(\mathbf{n} = (n_{1},n_{2}) \in \mathbf{Z}^{2}\) be a lattice point such that \(\beta _{2} = n_{1} - n_{2}\sqrt{2} \in I_{1}\). Since the equation \(\vert x^{2} - 2y^{2}\vert \leq \frac{3} {4}\) does not have a non-zero integral solution, trivially

$$\displaystyle{F(\beta _{2};\gamma = 3/4;N) < \frac{1} {100}\log N\quad \mbox{ for all $N \geq N_{2}$},}$$

where N 2 is a sufficiently large threshold. We can clearly assume that N 2 > N 1. Since \(\frac{3} {4} > \frac{1} {2}\), there existsFootnote 6 an interval I 2 = [a, b] with some 0 < a < b < 1 such that \(\beta _{2} \in I_{2}\) and

$$\displaystyle{ F(\beta;\gamma = 1/2;N_{2}) < \frac{1} {100}\log N_{2}\quad \mbox{ for all $\beta \in I_{2}$}. }$$
(4.66)

We can clearly assume that I 2 is a proper subinterval of I 1. Let I(0) = I 2. Repeating the second argument, we deduce that there exists another closed subinterval I(1) such that I(0) and I(1) are disjoint, I(0) ∪ I(1) ⊂ I 1 and

$$\displaystyle{ F(\beta;\gamma = 1/2;N_{2}^{(1)}) < \frac{1} {100}\log N_{2}^{(1)}\quad \mbox{ for all $\beta \in I(1)$}. }$$
(4.67)

We can clearly assume that \(N_{2}^{(1)} > N_{1}\).

By (4.32), we have

$$\displaystyle{ \frac{1} {\vert I(0)\vert }\int _{I(0)}F(\beta;\gamma;N)\,\mathrm{d}\beta = (1 + o(1)) \frac{\gamma } {\sqrt{2}}\log N,}$$

and applying this with \(\gamma = \frac{1} {4}\), we obtain the existence of 0 < β 3 < 1 and a large integer N 3 such that

$$\displaystyle{F(\beta _{3};\gamma = 1/4;N_{3}) > \frac{1} {8}\log N_{3}.}$$

Since \(\frac{1} {4} < \frac{1} {2}\), there exists an interval I 3 = [a, b] with 0 < a < b < 1 such that \(\beta _{3} \in I_{3}\) and

$$\displaystyle{ F(\beta;\gamma = 1/2;N_{3}) > \frac{1} {8}\log N_{3}\quad \mbox{ for all $\beta \in I_{3}$}. }$$
(4.68)

We can clearly assume that I 3 is a proper subinterval of I(0). Write I(0, 0) = I 3. Similarly, there exists another subinterval I(0, 1) such that I(0, 0) and I(0, 1) are disjoint, I(0, 0) ∪ I(0, 1) ⊂ I(0) and

$$\displaystyle{ F(\beta;\gamma = 1/2;N_{3}^{(1)}) > \frac{1} {8}\log N_{3}^{(1)}\quad \mbox{ for all $\beta \in I(0,1)$}. }$$
(4.69)

There are similar disjoint subintervals I(1, 0) and I(1, 1) of I(1).

Next, let \(\mathbf{n} = (n_{1},n_{2}) \in \mathbf{Z}^{2}\) be a lattice point such that \(\beta _{4} = n_{1} - n_{2}\sqrt{2} \in I(0,0)\). Since the inequality \(\vert x^{2} - 2y^{2}\vert \leq \frac{3} {4}\) does not have a non-trivial integral solution,

$$\displaystyle{F(\beta _{4};\gamma = 3/4;N) < \frac{1} {100}\log N\quad \mbox{ for all $N \geq N_{4}$},}$$

where N 4 <  is a sufficiently large threshold. We can clearly assume that N 4 > N 3. Since \(\frac{3} {4} > \frac{1} {2}\), there exists an interval I 4 = [a, b] with 0 < a < b < 1 such that \(\beta _{4} \in I_{4}\) and

$$\displaystyle{ F(\beta;\gamma = 1/2;N_{4}) < \frac{1} {100}\log N_{4}\quad \mbox{ for all $\beta \in I_{4}$}. }$$
(4.70)

We can clearly assume that I 4 is a proper subinterval of I(0, 0). Let I(0, 0, 0)) = I 4. Repeating the last argument, there exists another closed subinterval I(0, 0, 1) such that I(0, 0, 0) and I(0, 0, 1) are disjoint, I(0, 0, 0) ∪ I(0, 0, 1) ⊂ I(0, 0) and

$$\displaystyle{ F(\beta;\gamma = 1/2;N_{4}^{(1)}) < \frac{1} {100}\log N_{4}^{(1)}\quad \mbox{ for all $\beta \in I(0,0,1)$}, }$$
(4.71)

and so on. Repeating this argument, we build an infinite binary tree

$$\displaystyle{I_{1} \supset I_{\varepsilon _{1}} \supset I_{\varepsilon _{1},\varepsilon _{2}} \supset I_{\varepsilon _{1},\varepsilon _{2},\varepsilon _{3}} \supset \ldots,}$$

where \(\varepsilon _{1},\varepsilon _{2},\varepsilon _{3},\ldots \in \{ 0,1\}\).

For an arbitrary infinite 0-1 sequence \(\varepsilon _{1},\varepsilon _{2},\varepsilon _{3},\ldots\), let

$$\displaystyle{\beta \in I_{1} \cap I_{\varepsilon _{1}} \cap I_{\varepsilon _{1},\varepsilon _{2}} \cap I_{\varepsilon _{1},\varepsilon _{2},\varepsilon _{3}} \cap \ldots.}$$

Then by (4.65)–(4.71), there exists an infinite sequence \(1 < M_{1} < M_{2} < M_{3} < M_{4} <\ldots\) of integers such that

$$\displaystyle{F(\beta;\gamma = 1/2;M_{2k-1}) > \frac{1} {8}\log M_{2k-1}\ \ \mbox{ and}\ \ F(\beta;\gamma = 1/2;M_{2k}) < \frac{1} {100}\log M_{2k},}$$

where k = 1, 2, 3, . This proves Theorem 3. □ 

4 The Riesz Product and Theorem 12

4.1 The Method of Nested Intervals vs. the Riesz Product

At the end of Sect. 4.1, we formulated a far-reaching generalization of Theorem 3; see (4.34). It states that Theorem 3 actually holds for every γ > 0, and we have the stronger inequality

$$\displaystyle{ \limsup _{n\rightarrow \infty }\frac{F(\sqrt{2};\beta ^{{\ast}};\gamma;n)} {\log n} > \frac{\gamma } {\sqrt{2}} >\liminf _{n\rightarrow \infty }\frac{F(\sqrt{2};\beta ^{{\ast}};\gamma;n)} {\log n}, }$$
(4.72)

where \((\gamma /\sqrt{2})\log n + O(1)\) is the area of the corresponding hyperbolic region. Indeed, (4.72) holds for continuum many divergence points \(\beta ^{{\ast}} =\beta ^{{\ast}}(\gamma ) \in [0,1)\).

The proof of Theorem 3 was based on an elementary argument that we may call the method of nested intervals. To prove (4.72), we need a new idea, and apply a more sophisticated Riesz product argument. The Riesz product is a powerful tool in Fourier analysis. A typical application is to prove large fluctuations for lacunary trigonometric series. To compare the method of nested intervals to the method of Riesz products, we give a simple illustration; see Facts 1 and 2 below.

Consider a finite cosine sum

$$\displaystyle{ F(x) =\sum _{ j=1}^{N}a_{ j}\cos (2\pi n_{j}x),\quad \mbox{ where $a_{j} = \pm 1$ for all $1 \leq j \leq N$}, }$$
(4.73)

and \(1 \leq n_{1} < n_{2} <\ldots < n_{N}\) are integers. We study the following question. What can we say about max0 ≤ x ≤ 1 F(x)? Well, under different extra conditions, we have different results. We begin with

Fact 6.

If the strong gap condition \(n_{j+1}/n_{j} \geq 8\) holds for every 1 ≤ j ≤ N − 1, then

$$\displaystyle{\max _{0\leq x\leq 1}F(x) \geq \frac{N} {2}.}$$

Proof.

The proof is almost trivial. Let

$$\displaystyle{J_{1} = \left \{x \in [0,1]:\cos (2\pi n_{1}x)\mbox{ lies between }\frac{a_{1}} {2} \mbox{ and }a_{1}\right \}.}$$

Since a 1 = ±1, the set J 1 contains a closed subinterval I 1 of length \(\vert I_{1}\vert \geq 1/4n_{1}\). Next let

$$\displaystyle{J_{2} = \left \{x \in I_{1}:\cos (2\pi n_{2}x)\mbox{ lies between }\frac{a_{2}} {2} \mbox{ and }a_{2}\right \}.}$$

Since \(a_{2} = \pm 1\), the set J 2 contains a closed subinterval I 2 of length \(\vert I_{2}\vert \geq 1/4n_{2}\). Next let

$$\displaystyle{J_{3} = \left \{x \in I_{2}:\cos (2\pi n_{3}x)\mbox{ lies between }\frac{a_{3}} {2} \mbox{ and }a_{3}\right \},}$$

and so on. Repeating this process N times, we obtain a nested sequence of closed intervals

$$\displaystyle{[0,1] \supset I_{1} \supset I_{2} \supset \ldots \supset I_{N}}$$

such that \(a_{k}\cos (2\pi n_{k}x) \geq \frac{1} {2}\) for all x ∈ I k , k = 1, 2, , N. Then clearly F(x) ≥ N∕2 for every x ∈ I N . □ 

This is a typical application of the method of nested intervals. Next comes the Riesz product argument. The problem that we study is the following. What will happen if the strong gap condition \(n_{j+1}/n_{j} \geq 8\) is replaced by the weaker condition \(n_{j+1}/n_{j} \geq 1+\varepsilon > 1\), where \(\varepsilon > 0\) is an arbitrarily small but fixed constant? Can we still prove a linear lower bound like \(\max _{0\leq x\leq 1}F(x) \geq cN\) with some constant \(c = c(\varepsilon ) > 0\) depending only on the value of \(\varepsilon\)? Unfortunately, the method of nested intervals hopelessly collapses. Our new approach is the Riesz product argument. The following result, a well-known theorem of Sidon in Fourier analysis, is much deeper than Fact 6.

Fact 7 (Sidon’s Theorem).

If the weak gap condition

$$\displaystyle{ \frac{n_{j+1}} {n_{j}} \geq 1+\varepsilon > 1 }$$
(4.74)

holds for every 1 ≤ j ≤ N − 1, where \(0 <\varepsilon < \frac{1} {2}\) is a fixed constant, then for \(F(x)\) defined in (4.73), we have

$$\displaystyle{\max _{0\leq x\leq 1}F(x) \geq cN\quad \mbox{ with}\quad c = \frac{1} {4\varepsilon ^{-1}\log (2\varepsilon ^{-1})}.}$$

Proof.

Let \(1 = i(1) < i(2) <\ldots < i(M)\) be a subsequence of 1, 2, 3, , N such that

$$\displaystyle{ \frac{n_{i(j+1)}} {n_{i(j)}} \geq \frac{2} {\varepsilon },\quad j = 1,2,\ldots,M - 1, }$$
(4.75)

and consider the Riesz product

$$\displaystyle{R(x) =\prod _{ j=1}^{M}(1 + a_{ i(j)}\cos (2\pi n_{i(j)}x)).}$$

Since a i(j) = ±1, we have R(x) ≥ 0. We shall use this Riesz product R(x) as a test function. First we evaluate the integral

$$\displaystyle{ \int _{0}^{1}F(x)R(x)\,\mathrm{d}x =\sum _{ j=1}^{M}a_{ i(j)}^{2}\int _{ 0}^{1}\cos ^{2}(2\pi n_{ i(j)}x)\,\mathrm{d}x = \frac{M} {2}. }$$
(4.76)

Indeed, multiplying out the Riesz product R(x), and then using Euler’s formula \(2\mathrm{e}^{y} =\mathrm{ e}^{\mathrm{i}y} +\mathrm{ e}^{-\mathrm{i}y}\), we obtain terms like

$$\displaystyle{ a_{i(j_{1})}a_{i(j_{2})}a_{i(j_{3})}\ldots a_{i(j_{k})}\mathrm{e}^{2\pi \mathrm{i}(\pm n_{i(j_{1})}\pm n_{i(j_{2})}\pm n_{i(j_{3})}\pm \ldots \pm n_{i(j_{k})})}, }$$
(4.77)

where we shall call (4.77) a product of length k ≥ 1. We distinguish two cases.

Case 8 (short products).

k = 1. Multiplying the corresponding terms with F(x) and integrating from 0 to 1, we obtain

$$\displaystyle{\sum _{j=1}^{M}a_{ i(j)}^{2}\int _{ 0}^{1}\cos ^{2}(2\pi n_{ i(j)}x)\,\mathrm{d}x = \frac{M} {2},}$$

which is precisely (4.76).

Case 9 (long products).

k ≥ 2. We can clearly write \(1 \leq j_{1} < j_{2} <\ldots < j_{k}\). Then using the elementary inequalities

$$\displaystyle{1 + \frac{\varepsilon } {2} + \left ( \frac{\varepsilon } {2}\right )^{2} + \left ( \frac{\varepsilon } {2}\right )^{3}+\ldots < 1 +\varepsilon \quad \mbox{ and}\quad 1 - \frac{\varepsilon } {2} -\left ( \frac{\varepsilon } {2}\right )^{2} -\left ( \frac{\varepsilon } {2}\right )^{3}-\ldots > \frac{1} {1+\varepsilon }}$$

if \(0 <\varepsilon < \frac{1} {2}\), we deduce that

$$\displaystyle{\vert \pm n_{i(j_{1})} \pm n_{i(j_{2})} \pm n_{i(j_{3})} \pm \ldots \pm n_{i(j_{k})}\vert \mbox{ lies between }(1+\varepsilon )n_{i(j_{k})}\mbox{ and } \frac{1} {1+\varepsilon }n_{i(j_{k})}.}$$

Comparing this to the gap condition (4.74), we see that F(x) and the long products of R(x) represent disjoint sets of exponential functions

$$\displaystyle{\mathrm{e}^{2\pi \mathrm{i}\ell x},\quad \ell \in \mathbf{Z}.}$$

Using the orthogonality of these functions, the contribution of Case 9 to the integral \(\int _{0}^{1}F(x)R(x)\,\mathrm{d}x\) is zero. This proves (4.76).

The same argument shows that

$$\displaystyle{ \int _{0}^{1}R(x)\,\mathrm{d}x = 1. }$$
(4.78)

Since R(x) ≥ 0, the condition (4.78) means that the integral \(\int _{0}^{1}F(x)R(x)\,\mathrm{d}x\) is a weighted average of F(x), with non-negative weights. It follows from (4.76) that

$$\displaystyle{ \max _{0\leq x\leq 1}F(x) \geq \int _{0}^{1}F(x)R(x)\,\mathrm{d}x = \frac{M} {2}. }$$
(4.79)

The inequality \((1+\varepsilon )^{r} > 2/\varepsilon\) clearly holds with \(r = 2\varepsilon ^{-1}\log (2\varepsilon ^{-1})\). Thus by (4.74) and (4.75), we can choose

$$\displaystyle{ M \geq \frac{N} {r} = \frac{N} {2\varepsilon ^{-1}\log (2\varepsilon ^{-1})}. }$$
(4.80)

Sidon’s theorem then follows from (4.79) and (4.80). □ 

4.2 The Rectangle Property and Theorem 12

Let us return now to Theorem 3 and (4.72). We restate Theorem 3 in a slightly different form. Recall the notation in (4.56). We have

$$\displaystyle\begin{array}{rcl} H_{N}(\sqrt{2};\gamma ) =\{ (x,y) \in \mathbf{R}^{2}: -\gamma \leq x^{2} - 2y^{2} \leq \gamma,\ 1 \leq x + y\sqrt{2} \leq 2\sqrt{2}N\},& &{}\end{array}$$
(4.81)

that is, \(H_{N}(\sqrt{2};\gamma )\) is a long, narrow, tilted hyperbolic needle of slope \(1/\sqrt{2}\). Its area is \((\gamma /\sqrt{2})\log N + O(1)\); see (4.57). Theorem 3 states, roughly speaking, that in the special case \(\gamma = \frac{1} {2}\), there are two translated copies of the same tilted hyperbolic needle \(H_{N}(\sqrt{2};\gamma = 1/2)\) such that one is substantially richer in lattice points than the other. The discrepancy is proportional to the area, and we have extra large deviation . More precisely, there is a positive absolute constant c 10 > 0 such that for infinitely many integers N i , where N i  → , there are translated copies \(\mathbf{x}_{1}^{(i)} + H_{N_{i}}(\sqrt{2};\gamma )\) and \(\mathbf{x}_{2}^{(i)} + H_{N_{i}}(\sqrt{2};\gamma )\) of the tilted hyperbolic needle \(H_{N_{i}}(\sqrt{2};\gamma = 1/2)\) such that

$$\displaystyle\begin{array}{rcl} & & \vert \mathbf{Z}^{2} \cap (\mathbf{x}_{ 1}^{(i)} + H_{ N_{i}}(\sqrt{2};\gamma = 1/2))\vert -\vert \mathbf{Z}^{2} \cap (\mathbf{x}_{ 2}^{(i)} + H_{ N_{i}}(\sqrt{2};\gamma = 1/2))\vert \\ & & \quad > c_{10}\log N_{i}. {}\end{array}$$
(4.82)

In view of the periodicity of the lattice points, we can clearly assume that the pairs of vectors \(\mathbf{x}_{1}^{(i)}\) and \(\mathbf{x}_{2}^{(i)}\) are all in the unit square [0, 1)2, with i → .

The extra large deviation result (4.82), which is equivalent to Theorem 3, can be generalized in several stages. The first generalization is (4.72), or at least an equivalent form as follows.

Proposition 10.

Let γ > 0 be an arbitrary but fixed real number, and let N ≥ 2 be an integer. Then there exists a positive constant \(\delta ^{{\prime}} =\delta ^{{\prime}}(\gamma ) > 0\) , independent of N, such that for the tilted hyperbolic needle \(H_{N}(\sqrt{2};\gamma )\) of area \((\gamma /\sqrt{2})\log N + O(1)\) , there exist translated copies \(\mathbf{x}_{1} + H_{N}(\sqrt{2};\gamma )\) and \(\mathbf{x}_{2} + H_{N}(\sqrt{2};\gamma )\) such that

$$\displaystyle{\vert \mathbf{Z}^{2} \cap (\mathbf{x}_{ 1} + H_{N}(\sqrt{2};\gamma ))\vert > \frac{\gamma } {\sqrt{2}}\log N +\delta ^{{\prime}}\log N}$$

and

$$\displaystyle{\vert \mathbf{Z}^{2} \cap (\mathbf{x}_{ 2} + H_{N}(\sqrt{2};\gamma ))\vert < \frac{\gamma } {\sqrt{2}}\log N -\delta ^{{\prime}}\log N.}$$

Note that Proposition 10 immediately leads to the existence of a single divergence point \(\beta ^{{\ast}} =\beta ^{{\ast}}(\gamma ) \in [0,1)\) in (4.72). To exhibit continuum many divergence points \(\beta ^{{\ast}} =\beta ^{{\ast}}(\gamma ) \in [0,1)\), we simply have to combine Proposition 10 with the routine Cantor set argument in the proof of Theorem 3.

For the second stage of generalization, we replace the set Z 2 of lattice points in the plane with an arbitrary subset \(\mathcal{A}\subset \mathbf{Z}^{2}\) of positive density. Here is an illustration of such a set \(\mathcal{A}\). We say that a lattice point \(\mathbf{n} = (n_{1},n_{2}) \in \mathbf{Z}^{2}\) is coprime Footnote 7 if the coordinates n 1 and n 2 are relatively prime. Let \(\mathbf{Z}_{\mathrm{coprime}}^{2}\) denote the set of coprime lattice points in the plane. It is well known from number theory that \(\mathbf{Z}_{\mathrm{coprime}}^{2}\) is a subset of Z 2 with positive density 6∕π 2.

Now let \(\mathcal{A}\) be an arbitrary subset of Z 2 of positive density \(\delta =\delta (\mathcal{A}) > 0\). There is a natural generalization of Proposition 10 where we replace Z 2 with \(\mathcal{A}\). The price that we have to pay is that, due to the lack of periodicity of a general subset \(\mathcal{A}\), the translations are not necessarily in the unit square anymore.

Proposition 11.

Let \(\mathcal{A}\subset \mathbf{Z}^{2}\) be an arbitrary subset of positive density \(\delta =\delta (\mathcal{A}) > 0\) . Let γ > 0 be an arbitrary but fixed real number, and let N ≥ 2 be an integer. Assume further that M∕N is sufficiently large, depending only on γ and δ. Then there exists a positive constant \(\delta ^{{\prime}} =\delta ^{{\prime}}(\gamma,\delta ) > 0\) , independent of N and M, such that for the tilted hyperbolic needle \(H_{N}(\sqrt{2};\gamma )\) of area \((\gamma /\sqrt{2})\log N + O(1)\) , there exist translated copies \(\mathbf{x}_{1} + H_{N}(\sqrt{2};\gamma ) \subset [0,M]^{2}\) and \(\mathbf{x}_{2} + H_{N}(\sqrt{2};\gamma ) \subset [0,M]^{2}\) such that

$$\displaystyle{\vert \mathcal{A}\cap (\mathbf{x}_{1} + H_{N}(\sqrt{2};\gamma ))\vert > \frac{\delta \gamma } {\sqrt{2}}\log N +\delta ^{{\prime}}\log N}$$

and

$$\displaystyle{\vert \mathcal{A}\cap (\mathbf{x}_{2} + H_{N}(\sqrt{2};\gamma ))\vert < \frac{\delta \gamma } {\sqrt{2}}\log N -\delta ^{{\prime}}\log N.}$$

It turns out that the only relevant property of a lattice point set \(\mathcal{A}\subset \mathbf{Z}^{2}\) that we really use in the proof of Proposition 11 is the rectangle property in Lemma 4, that every tilted rectangle of slope \(1/\sqrt{2}\) and area \(\frac{1} {5}\) contains at most one lattice point. Of course, the concrete value \(\frac{1} {5}\) of the constant is secondary.

The third stage of generalization goes far beyond the family of lattice point sets \(\mathcal{A}\subset \mathbf{Z}^{2}\). The only requirement is that the point set satisfies the rectangle property.

Theorem 12.

Let \(\mathcal{P}\) be a finite set of points in the square [0,M] 2 with density δ, so that the number of elements of \(\mathcal{P}\) is \(\vert \mathcal{P}\vert =\delta M^{2}\) . Assume further that \(\mathcal{P}\) satisfies the following rectangle property, that there is a positive constant \(c_{1} = c_{1}(\mathcal{P}) > 0\) such that every tilted rectangle of slope \(1/\sqrt{2}\) and area c 1 contains at most one element of the set \(\mathcal{P}\) . Let

$$\displaystyle{ \delta ^{{\prime}} =\delta ^{{\prime}}(c_{ 1},\gamma,\delta ) = 10^{-12}\delta \kappa, }$$
(4.83)

where

$$\displaystyle{ \kappa =\min \left \{ \frac{\gamma } {20},\sqrt{c_{1}\gamma }, \frac{10^{-7}c_{1}} {2}, \frac{10^{-7}c_{1}^{2}} {2\gamma } \right \}. }$$
(4.84)

Furthermore, assume that both N and M∕N are sufficiently large and satisfy

$$\displaystyle{ N \geq 2^{10(\gamma +\gamma ^{-1}) }\quad \mbox{ and}\quad M > \frac{10^{11}(\gamma +\gamma ^{-1})(N + 2\gamma )} {c_{1}\delta \kappa }. }$$
(4.85)

Then for the tilted hyperbolic needle \(H_{N}(\sqrt{2};\gamma )\) of area \((\gamma /\sqrt{2})\log N + O(1)\) , there exist translated copies \(\mathbf{x}_{1} + H_{N}(\sqrt{2};\gamma ) \subset [0,M]^{2}\) and \(\mathbf{x}_{2} + H_{N}(\sqrt{2};\gamma ) \subset [0,M]^{2}\) such that

$$\displaystyle{\vert \mathcal{P}\cap (\mathbf{x}_{1} + H_{N}(\sqrt{2};\gamma ))\vert > \frac{\delta \gamma } {\sqrt{2}}\log N +\delta ^{{\prime}}\log N}$$

and

$$\displaystyle{\vert \mathcal{P}\cap (\mathbf{x}_{2} + H_{N}(\sqrt{2};\gamma ))\vert < \frac{\delta \gamma } {\sqrt{2}}\log N -\delta ^{{\prime}}\log N.}$$

Note that Propositions 10 and 11 are special cases of Theorem 12, with \(\mathcal{P} = \mathbf{Z}^{2}\) and \(\mathcal{P} = \mathcal{A}\) respectively.

Unfortunately, the proof of Theorem 12 is rather difficult and long, and the very complicated details cover the next four sections. But the main idea is quite simple. It is basically a sophisticated application of the Riesz product.

5 Proof of Theorem 12 (I): Proving Extra Large Deviations via Riesz Product

Since the proof is long and complicated, a convenient notation here makes a big difference. It is much simpler for us to work with hyperbolic regions in the usual horizontal-vertical position instead of the tilted position. It means that, instead of working with the set Z 2 of lattice points in the plane and the family of tilted hyperbolic needles of a fixed quadratic irrational slope, as in the setting of Theorem 12, we rotate back. In other words, we rotate Z 2 by a quadratic irrational slope, and consider the family of hyperbolic needles in the usual horizontal-vertical position.

Let γ > 0 be an arbitrary real number, and let N ≥ 2 be a large integer. Consider the hyperbolic region

$$\displaystyle{ H_{\gamma }(N) =\{ (x,y) \in \mathbf{R}^{2}: -\gamma \leq xy \leq \gamma,\ 1 \leq x \leq N\}; }$$
(4.86)

see Fig. 4.2. Again we refer to H γ (N) as a hyperbolic needle.

Fig. 4.2
figure 2

A hyperbolic needle in usual horizontal-vertical position

Notice that H γ (N) is basically the horizontal-vertical version of the tilted hyperbolic needle \(H_{N}(\sqrt{2};\gamma )\); see (4.56) or (4.81). To emphasize the difference between the tilted and the horizontal-vertical versions, we have made a major change in the notation, and switched the location of the parameters γ and N.

The area of H γ (N) equals the integral

$$\displaystyle{\mathrm{area}(H_{\gamma }(N)) = 2\int _{1}^{N} \frac{\gamma } {x}\,\mathrm{d}x = 2\gamma \log N.}$$

Let \(\mathrm{rot}_{\alpha }\mathbf{Z}^{2}\) denote the rotated copy of Z 2 by the angle θ, where tanθ = α is the slope and using the origin as the fixed point of the rotation. If α ≠ 0 is a quadratic irrational, then the continued fractions for α is finally periodic. This is a well known number-theoretic fact; for example, if \(\alpha = 1/\sqrt{2}\), then

$$\displaystyle{ \frac{1} {\sqrt{2}} = \frac{1} {1+} \frac{1} {2+} \frac{1} {2+} \frac{1} {2+}\ldots = [1,2,2,2,\ldots ] = [1,\overline{2}].}$$

Periodicity implies that the continued fraction digits, formally known as the partial quotients, form a bounded sequence. It is well known that boundedness yields

$$\displaystyle{ k\Vert k\alpha \Vert \geq c_{11} = c_{11}(\alpha ) > 0\quad \mbox{ for all integers $k \geq 1$}, }$$
(4.87)

where \(c_{11} = c_{11}(\alpha ) > 0\) is some positive constant depending only on α, and \(\Vert z\Vert\) denotes the distance of a real number z to the nearest integer. If \(\alpha = 1/\sqrt{2}\), then (4.87) follows from the factorization \(x^{2} - 2y^{2} = (x - y\sqrt{2})(x + y\sqrt{2})\). If x and y are integers, then

$$\displaystyle{1 \leq \vert x^{2} - 2y^{2}\vert =\vert (x - y\sqrt{2})(x + y\sqrt{2})\vert =\vert x\alpha - y\vert \sqrt{2}\vert x + y\sqrt{2}\vert,}$$

and we choose x = k and y to be the nearest integer to k α. This explains why in the special case \(\alpha = 1/\sqrt{2}\) that the choice \(c_{11} = \frac{1} {4}\) in (4.87) works.

Inequality (4.87) has an important geometric interpretation, namely that there is another constant \(c_{12} = c_{12}(\alpha ) > 0\), depending only on α, such that for every axes-parallel rectangle R,

$$\displaystyle{ \vert \mathrm{rot}_{\alpha }\mathbf{Z}^{2} \cap R\vert \leq 1\quad \mbox{ whenever}\quad \mathrm{area}(R) = c_{ 12}(\alpha ). }$$
(4.88)

If \(\alpha = 1/\sqrt{2}\), then \(c_{12} = \frac{1} {5}\) is a good choice in (4.88), in view of Lemma 4.

The following statement is just a slight generalization of Theorem 12.

Proposition 13.

Let \(\mathcal{P}\) be a finite set of points in the square [0,M] 2 with density δ, so that the number of elements of \(\mathcal{P}\) is \(\vert \mathcal{P}\vert =\delta M^{2}\) . Assume further that \(\mathcal{P}\) satisfies the following rectangle property, that there is a positive constant \(c_{1} = c_{1}(\mathcal{P}) > 0\) such that every axes-parallel rectangle of area c 1 contains at most one element of the set \(\mathcal{P}\) . Let \(\delta ^{{\prime}} =\delta ^{{\prime}}(c_{1},\gamma,\delta )\) be defined by (4.83) and (4.84) , and assume that both N and M∕N are sufficiently large and satisfy (4.85) . Then for the hyperbolic needle H γ (N) given by (4.86) , there exist translated copies \(\mathbf{x}_{1} + H_{\gamma }(N) \subset [0,M]^{2}\) and \(\mathbf{x}_{2} + H_{\gamma }(N) \subset [0,M]^{2}\) such that

$$\displaystyle{ \vert \mathcal{P}\cap (\mathbf{x}_{1} + H_{\gamma }(N))\vert > 2\delta \gamma \log N +\delta ^{{\prime}}\log N }$$
(4.89)

and

$$\displaystyle{ \vert \mathcal{P}\cap (\mathbf{x}_{2} + H_{\gamma }(N))\vert < 2\delta \gamma \log N -\delta ^{{\prime}}\log N. }$$
(4.90)

Remarks.

  1. (i)

    The term 2δ γlogN in (4.89) and (4.90) represents the expectation, since the set \(\mathcal{P}\) has density δ and the hyperbolic needle H γ (N) has area 2γlogN. The extra terms \(\pm \delta ^{{\prime}}\log N\) show that the deviation from the expectation is proportional to the expectation, justifying the terminology extra large deviation .

  2. (ii)

    The constant factors such as 10−12 and 1011 are certainly very far from best possible. Since the proof is complicated, our primary goal is to present the basic ideas in the simplest form, and we do not care too much about optimizing these constant factors.

We begin our long proof of Proposition 13.

Consider the point-counting function

$$\displaystyle{ f(\mathbf{x}) =\vert \mathcal{P}\cap (\mathbf{x} + H_{\gamma }(N))\vert. }$$
(4.91)

If \(\mathbf{x} \in [0,M - N] \times [\gamma,M-\gamma ]\), then clearly

$$\displaystyle{ \mathbf{x} + H_{\gamma }(N) \subset [0,M]^{2}. }$$
(4.92)

This explains why we choose the rectangle [0, MN] × [γ, Mγ] to be our underlying domain in the proof.

Let

$$\displaystyle{ \varDelta (\mathbf{x}) = f(\mathbf{x}) -\delta \cdot \mathrm{area}(H_{\gamma }(N)) = f(\mathbf{x}) - 2\delta \gamma \log N }$$
(4.93)

denote the discrepancy function; Δ(x) deserves its name if (4.92) holds.

In order to show that \(\varDelta (\mathbf{x}) >\delta ^{{\prime}}\log N > 0\) holds for some x = x 1, we apply the test function method initiated by Roth [26]. The basic idea of this method is to construct a positive test function T(x) > 0 such that

$$\displaystyle{ \frac{1} {(M - N)(M - 2\gamma )}\int _{0}^{M-N}\int _{ \gamma }^{M-\gamma }\varDelta (\mathbf{x})T(\mathbf{x})\,\mathrm{d}\mathbf{x} > c_{ 13}\log N > 0, }$$
(4.94)

and

$$\displaystyle{ \frac{1} {(M - N)(M - 2\gamma )}\int _{0}^{M-N}\int _{ \gamma }^{M-\gamma }T(\mathbf{x})\,\mathrm{d}\mathbf{x} < c_{ 14}. }$$
(4.95)

Combining (4.94) and (4.95) with the general trivial inequality

$$\displaystyle{ \int \varDelta (\mathbf{x})T(\mathbf{x})\,\mathrm{d}\mathbf{x} \leq \max _{\mathbf{x}}\varDelta (\mathbf{x})\int T(\mathbf{x})\,\mathrm{d}\mathbf{x}, }$$
(4.96)

which holds for any positive function T(x) > 0, we conclude that

$$\displaystyle{\max _{\mathbf{x}}\varDelta (\mathbf{x}) > c_{15}\log N}$$

with some positive constant c 15 > 0.

Similarly, to show that \(\varDelta (\mathbf{x}) < -\delta ^{{\prime}}\log N < 0\) for some x = x 2, we construct a positive test function T (x) > 0 such that

$$\displaystyle{ \frac{1} {(M - N)(M - 2\gamma )}\int _{0}^{M-N}\int _{ \gamma }^{M-\gamma }\varDelta (\mathbf{x})T^{{\ast}}(\mathbf{x})\,\mathrm{d}\mathbf{x} < -c_{ 16}\log N < 0, }$$
(4.97)

and again

$$\displaystyle{ \frac{1} {(M - N)(M - 2\gamma )}\int _{0}^{M-N}\int _{ \gamma }^{M-\gamma }T^{{\ast}}(\mathbf{x})\,\mathrm{d}\mathbf{x} < c_{ 17}. }$$
(4.98)

Clearly (4.97) and (4.98) lead to the inequality

$$\displaystyle{\min _{\mathbf{x}}\varDelta (\mathbf{x}) < -c_{18}\log N < 0}$$

with some positive constant c 18 > 0.

Let us return to (4.94) and (4.95). We shall express the test function T(x) in terms of modified Rademacher functions , sometimes called Haar wavelet , and this is another idea that we borrow from Roth’s pioneering paper [26]. The benefit of working with modified Rademacher functions is that we have orthogonality and, what is more, we have super-orthogonality ; see the key property below.

Note that Roth simply took the sum of certain modified Rademacher functions, and applied the Cauchy–Schwarz inequality instead of (4.96). For his argument, orthogonality was sufficient. It was Halász’s innovationFootnote 8 to express T(x) as a Riesz product of modified Rademacher functions; see Halász [19]. The main point is that the Riesz product takes advantage of the super-orthogonality. Here we develop an adaptation of the Roth–Halász method for hyperbolic regions.

Following the Roth–Halász approach, we shall express the test function T(x) as a Riesz product of modified Rademacher functions, in the form

$$\displaystyle{ T(\mathbf{x}) =\prod _{j\in \mathcal{J}}(1 +\rho R_{j}(\mathbf{x})), }$$
(4.99)

where 0 < ρ < 1 is an appropriate constant to be specified later, \(\mathcal{J}\) is some appropriate index-set and R j (x), \(j \in \mathcal{J}\), are certain modified Rademacher functions to be defined below. We assume that the test function T(x) is zero outside the rectangle \([0,M - N] \times [\gamma,M-\gamma ]\).

Suppose that 10−2 > η 1 > 0 and \(10^{-2} >\eta _{2} > 0\) are small positive real numbers, to be specified later, such that

$$\displaystyle{ \frac{M - N} {\eta _{1}} = \frac{M - 2\gamma } {\eta _{2}} = 2^{m}, }$$
(4.100)

where m ≥ 1 is an integer. Let j be an arbitrary integer in the interval 0 ≤ j ≤ n where \(2^{n} \approx N\), that is, \(n =\log _{2}N + O(1)\) in binary logarithm. We decompose the rectangle [0, MN] × [γ, Mγ] into \(2^{m} \times 2^{m} = 4^{m}\) disjoint translated copies of the small rectangle

$$\displaystyle{ [0,2^{j}\eta _{ 1}] \times [0,2^{-j}\eta _{ 2}], }$$
(4.101)

and call these congruent copies of the small rectangle (4.101) j-cells. For each of the 4m j-cells, we independently choose one of the three patterns + −, − + and 0; see Fig. 4.3.

Fig. 4.3
figure 3

The patterns + −, − + and 0

As Fig. 4.3 shows, the pattern + − actually means a two-dimensional pattern as follows. We divide the j-cell into four congruent subrectangles, and define a step-function on the j-cell, with value + 1 on the upper-right and lower-left subrectangles, and value − 1 on the upper-left and lower-right subrectangles.

Similarly, the pattern − + means the step-function with value − 1 on the upper-right and lower-left subrectangles, and value + 1 on the upper-left and lower-right subrectangles.

Finally, the pattern 0 means that the step-function is zero on the whole j-cell.

In the sequel, we shall simply refer to these two-dimensional patterns as + −, − + and 0, representing the bottom rows in Fig. 4.3.

By making an independent choice of + −, − + and 0 for each j-cell, we obtain a particular modified Rademacher function R j (x) of order j, defined over the whole rectangle [0, MN] × [γ, Mγ]. We define R j (x) to be 0 outside the rectangle [0, MN] × [γ, Mγ].

Since for each of the 4m j-cells there are 3 options, namely + −, − + and 0, the total number of modified Rademacher functions R j (x) of order j is \(3^{4^{m} }\). Let \(\mathcal{R}(j)\) denote the family of all \(3^{4^{m} }\) modified Rademacher functions of order j. Note that the notation R j (x) is somewhat ambiguous in the sense that it represents any element of this huge family \(\mathcal{R}(j)\).

Super-Orthogonality : Key Property of the Modified Rademacher Functions. If k ≥ 1 and \(0 \leq j_{1} <\ldots < j_{k} \leq n\), then in every elementary cell of size \(2^{j_{1}}\eta _{1} \times 2^{-j_{k}}\eta _{2}\), the product \(R_{j_{1}}(\mathbf{x})\ldots R_{j_{k}}(\mathbf{x})\) of k modified Rademacher functions satisfies one of the three familiar patterns in Fig.  4.3.

Note that an elementary cell of size \(2^{j_{1}}\eta _{1} \times 2^{-j_{k}}\eta _{2}\) arises as a non-empty intersection of a j 1-cell and a j k -cell, where j 1 < j k . The proof of the above key property is almost trivial. It is based on the fact that for any k ≥ 2, the intersection of any k cells of different orders \(j_{1} <\ldots < j_{k}\) is either empty or equal to the intersection of the j 1-cell and the j k -cell, i.e. the intersection of the first and the last. We emphasize that in each of the 3 patterns the integral of the corresponding step-function is zero.

Since every modified Rademacher function R j (x) has values \(\pm 1\) or 0, and since 0 < ρ < 1, it is clear that the Riesz product (4.99) defines a positive test function T(x). The index-set \(\mathcal{J}\), a subset of {0, 1, 2, , n}, will be specified later. Note in advance that \(\mathcal{J}\) is a large subset of {0, 1, 2, , n}, in the sense that \(\vert \mathcal{J}\vert \geq c_{19}(n + 1)\).

Next we check the second requirement (4.95) of the test function. Multiplying out the Riesz product (4.99), we have

$$\displaystyle\begin{array}{rcl} T(\mathbf{x})& =& \prod _{j\in \mathcal{J}}(1 +\rho R_{j}(\mathbf{x})) \\ & =& 1 +\rho \sum _{j\in \mathcal{J}}R_{j}(\mathbf{x}) +\rho ^{2}\mathop{ \sum _{ j_{1}<j_{2}}} _{j_{i}\in \mathcal{J}}R_{j_{1}}(\mathbf{x})R_{j_{2}}(\mathbf{x}) \\ & & \quad +\rho ^{3}\mathop{ \sum _{ j_{1}<j_{2}<j_{3}}} _{j_{i}\in \mathcal{J}}R_{j_{1}}(\mathbf{x})R_{j_{2}}(\mathbf{x})R_{j_{3}}(\mathbf{x})+\ldots,{}\end{array}$$
(4.102)

in the form 1 plus the linear part plus the quadratic part plus the cubic part and so on. Substituting (4.102) into the left hand side of (4.95), we have

$$\displaystyle\begin{array}{rcl} & & \frac{1} {(M - N)(M - 2\gamma )}\int _{0}^{M-N}\int _{ \gamma }^{M-\gamma }T(\mathbf{x})\,\mathrm{d}\mathbf{x} \\ & & \quad = 1 +\sum _{k\geq 1} \frac{\rho ^{k}} {(M - N)(M - 2\gamma )}\mathop{\sum _{j_{1}<\ldots <j_{k}}} _{j_{i}\in \mathcal{J}}\int _{0}^{M-N}\int _{ \gamma }^{M-\gamma }R_{ j_{1}}(\mathbf{x})\ldots R_{j_{k}}(\mathbf{x})\,\mathrm{d}\mathbf{x} \\ & & \quad = 1. {}\end{array}$$
(4.103)

The vanishing integrals in the last step occurs as a consequence of the super-orthogonality of the modified Rademacher functions. For each of 3 patterns that the integrand takes, the integral is zero. Clearly (4.103) gives (4.95) with c 14 = 1.

Finally, we turn to requirement (4.94). The verification of this is by far the most difficult part of the proof. This is where we make the critical decision on how we choose an appropriate modified Rademacher function R j (x) from amongst the huge family \(\mathcal{R}(j)\) of size \(3^{4^{m} }\). We choose the best \(R_{j}(\mathbf{x}) \in \mathcal{R}(j)\) in order to synchronize the trivial errors. The synchronization argument is at the very heart of the proof. Note that if we did not synchronize the trivial errors, then they might cancel out, and we would then not be able to guarantee extra large deviation.

The Trivial Errors and Synchronization. By (4.91) and (4.93), the discrepancy function equals

$$\displaystyle{\varDelta (\mathbf{x}) =\vert \mathcal{P}\cap (\mathbf{x} + H_{\gamma }(N))\vert -\delta \cdot \mathrm{area}(H_{\gamma }(N)),}$$

and so we can write

$$\displaystyle\begin{array}{rcl} & & \int _{0}^{M-N}\int _{ \gamma }^{M-\gamma }\varDelta (\mathbf{x})T(\mathbf{x})\,\mathrm{d}\mathbf{x} \\ & & \quad =\int _{ 0}^{M-N}\int _{ \gamma }^{M-\gamma }\left (\sum _{ P_{i}\in \mathcal{P}\cap (\mathbf{x}+H_{\gamma }(N))}1 -\delta \cdot \mathrm{area}(H_{\gamma }(N))\right )T(\mathbf{x})\,\mathrm{d}\mathbf{x} \\ & & \quad =\int _{ 0}^{M-N}\int _{ \gamma }^{M-\gamma }\left (\sum _{ P_{i}\in \mathcal{P}\cap (\mathbf{x}+H_{\gamma }(N))}1\right )T(\mathbf{x})\,\mathrm{d}\mathbf{x} \\ & & \quad \quad - (M - N)(M - 2\gamma )\delta \cdot \mathrm{ area}(H_{\gamma }(N)), {}\end{array}$$
(4.104)

where in the last step we have used (4.103), and where \(P_{1},P_{2},P_{3},\ldots\) denote the elements of the given point set \(\mathcal{P}\).

Changing the order of summation and integration, we obtain

$$\displaystyle{ \int _{0}^{M-N}\int _{ \gamma }^{M-\gamma }\left (\sum _{ P_{i}\in \mathcal{P}\cap (\mathbf{x}+H_{\gamma }(N))}1\right )T(\mathbf{x})\,\mathrm{d}\mathbf{x} =\sum _{P_{i}\in \mathcal{P}}\int _{P_{i}-H_{\gamma }(N)}T(\mathbf{x})\,\mathrm{d}\mathbf{x}, }$$
(4.105)

where

$$\displaystyle{P_{i} - H_{\gamma }(N) =\{ P_{i} -\mathbf{w}: \mathbf{w} \in H_{\gamma }(N)\}}$$

denotes a reflected and translated copy of the hyperbolic needle H γ (N). Combining (4.104) and (4.105), we have

$$\displaystyle\begin{array}{rcl} & & \frac{1} {(M - N)(M - 2\gamma )}\int _{0}^{M-N}\int _{ \gamma }^{M-\gamma }\varDelta (\mathbf{x})T(\mathbf{x})\,\mathrm{d}\mathbf{x} \\ & & \quad =\sum _{P_{i}\in \mathcal{P}} \frac{1} {(M - N)(M - 2\gamma )}\int _{P_{i}-H_{\gamma }(N)}T(\mathbf{x})\,\mathrm{d}\mathbf{x} -\delta \cdot \mathrm{area}(H_{\gamma }(N)).\qquad \quad {}\end{array}$$
(4.106)

To evaluate (4.106), we return to the Riesz product (4.102). Note that the term 1 in fact denotes the characteristic function χ B of the rectangle \(B = [0,M - N] \times [\gamma,M-\gamma ]\), since by definition the modified Rademacher functions are all zero outside B.

We begin with the contribution of 1 = χ B in (4.102), and note simply that

$$\displaystyle{ \int _{P_{i}-H_{\gamma }(N)}\chi _{B}(\mathbf{x})\,\mathrm{d}\mathbf{x} =\int _{B\cap (P_{i}-H_{\gamma }(N))}\mathrm{d}\mathbf{x} =\mathrm{ area}(B \cap (P_{i} - H_{\gamma }(N))). }$$
(4.107)

Geometric Ideas. Next we study the contribution of the linear part of (4.102) in (4.106). Synchronization means that we want to make the sum

$$\displaystyle{ \sum _{P_{i}\in \mathcal{P}}\int _{P_{i}-H_{\gamma }(N)}R_{j}(\mathbf{x})\,\mathrm{d}\mathbf{x} }$$
(4.108)

large and positive for every \(j \in \mathcal{J}\), where the index-set \(\mathcal{J} \subset \{ 0,1,2,\ldots,n\}\) will be specified later. We decompose the underlying rectangle \(B = [0,M - N] \times [\gamma,M-\gamma ]\) into j-cells. Let \(\mathcal{C}\) be an arbitrary j-cell; it has size η 1 η 2. Consider a single term in (4.108), and restrict it to the j-cell \(\mathcal{C}\). The geometric meaning of the integral

$$\displaystyle{ \int _{\mathcal{C}\cap (P_{i}-H_{\gamma }(N))}R_{j}(\mathbf{x})\,\mathrm{d}\mathbf{x} }$$
(4.109)

plays a crucial role in the argument below; see Fig. 4.4.

Fig. 4.4
figure 4

Intersection of a j-cell with a hyperbolic arc \(P_{i} - H_{\gamma }(N)\)

Since the j-cell is very small, the hyperbola arc \(P_{i} - H_{\gamma }(N)\) can be approximated by its tangent line locally. This explains the tilted straight line segment in Fig. 4.4. The arrows indicate the inside of the hyperbolic needle, i.e. the arc in the picture is the upper arc of the needle.

The value of integral (4.109) depends heavily on which of the 3 patterns happens to show up in the restriction of R j (x) to the j-cell \(\mathcal{C}\). The patterns + − and − + give two integrals whose sum is 0, whereas the pattern 0 clearly gives an integral with value 0.

How do we choose the right pattern + −, − + or 0 in an arbitrary j-cell \(\mathcal{C}\)? Well, for a fixed point the choice is trivial. For every fixed point \(P_{i} \in \mathcal{P}\), exactly one of the two patterns + − and − + will make the integral (4.109) positive, unless both integrals are equal to 0. The problem is that we are dealing with a large sum

$$\displaystyle{ \sum _{P_{i}\in \mathcal{P}}\int _{\mathcal{C}\cap (P_{i}-H_{\gamma }(N))}R_{j}(\mathbf{x})\,\mathrm{d}\mathbf{x} }$$
(4.110)

instead of just a single term (4.109), and we have to make (4.110) positive. The difficulty is that different points may prefer different patterns; say, for \(P_{i_{1}}\) the pattern + − may make the integral (4.109) positive, whereas for another point \(P_{i_{2}}\) the pattern − + may make the integral (4.109) positive.

To overcome this difficulty, we will apply the Single Dominant Term Rule , which means the following. If the sum (4.110) is dominated by a single term (4.109), then by an appropriate choice between the patterns + − and − +, we can always make this dominant term positive. We then show that the contribution from the remaining terms to (4.110) is relatively negligible. If there is no dominant term in (4.110), then we choose the pattern 0.

Of course, we have to define precisely what domination means. The success of the Single Dominant Term Rule is based on the fact that single term domination is quite typical: it happens very often among the 4m j-cells.

What is single term domination in (4.110)? To explain this, we have to talk about slopes. The slope of the diagonal of a j-cell is

$$\displaystyle{4^{-j}\eta _{ 2}/\eta _{1} \approx 4^{-j},}$$

since η 1 and η 2 are almost equal.Footnote 9 Since the hyperbola is a smooth curve, the intersection of a translated and reflected hyperbolic needle \(P_{i} - H_{\gamma }(N)\) with the j-cell \(\mathcal{C}\) is almost like the intersection of \(\mathcal{C}\) with a half-plane, or the intersection of \(\mathcal{C}\) with two nearly parallel half-planes. Since half-planes have well-defined constant slopes, as an intuitive oversimplification, we shall use the terms half-plane and slope for the intersections \(\mathcal{C}\cap (P_{i} - H_{\gamma }(N))\). Single term domination occurs if

  • there is precisely one half-plane \(\mathcal{C}\cap (P_{i} - H_{\gamma }(N))\) with slope close to 4j that intersects \(\mathcal{C}\); and

  • this intersection is a large triangle in only one of the four subrectangles of \(\mathcal{C}\), namely the lower right subrectangle, where the pattern is constant.

Here the intersection requirement large triangle from the lower right subrectangle guarantees that the integral (4.109) is far from zero, and the integral (4.109) of this dominant term is called the trivial error .

An Important Consequence of the Rectangle Property. As indicated above, single term domination means that there is exactly one half-plane \(\mathcal{C}\cap (P_{i} - H_{\gamma }(N))\) with slope close to 4j. It is important to point out that we cannot have two half-planes with slopes very close to 4j such that both are upper arcs. As shown Fig. 4.5, if \(\mathcal{C}\cap (P_{i_{1}} - H_{\gamma }(N))\) and \(\mathcal{C}\cap (P_{i_{2}} - H_{\gamma }(N))\) are both upper arcs with slopes very close to 4j, then the two points \(P_{i_{1}}\) and \(P_{i_{2}}\) have to be in the same axes-parallel rectangle of area c 1, namely, in an axes-parallel rectangle where the slope of the diagonal is close to 4j. But two points in the same axes-parallel rectangle of area c 1 is impossible: it contradicts the hypothesis of Proposition 13.

Fig. 4.5
figure 5

Forbidden configuration

What can happen, however, is that we have two half-planes with slopes very close to 4j such that one is an upper arc and the other one is a lower arc. For example, it can happen that \(\mathcal{C}\cap (P_{i_{1}} - H_{\gamma }(N))\) is an upper arc and \(\mathcal{C}\cap (P_{i_{2}} - H_{\gamma }(N))\) is a lower arc with both slopesFootnote 10 close to 4j. To overcome this difficulty, we switch to a 2 × 2 configuration of j-cells. More precisely, instead of working with a single j-cell \(\mathcal{C}\), we switch to a 2 × 2 configuration of four neighboring j-cells \(\mathcal{C}_{1}\), \(\mathcal{C}_{2}\), \(\mathcal{C}_{3}\) and \(\mathcal{C}_{4}\), where \(\mathcal{C}_{1}\) is the upper left, \(\mathcal{C}_{2}\) is the upper right, \(\mathcal{C}_{3}\) is the lower left and \(\mathcal{C}_{4}\) is the lower right member of the 2 × 2 configuration. The simple geometric idea is the following. Assume that the upper arc of \(P_{i_{1}} - H_{\gamma }(N)\) intersects both \(\mathcal{C}_{2}\) and \(\mathcal{C}_{3}\) satisfying the requirement large triangle from the lower right subrectangle, where the pattern is constant. Then obviously the lower arc of \(P_{i_{2}} - H_{\gamma }(N)\) cannot intersect both of \(\mathcal{C}_{2}\) and \(\mathcal{C}_{3}\), since the slopes are close to 4j. Therefore, either \(\mathcal{C}_{2}\) or \(\mathcal{C}_{3}\) will be a j-cell with single term domination. That is, we can always save at least one of the four neighboring j-cells \(\mathcal{C}_{1}\), \(\mathcal{C}_{2}\), \(\mathcal{C}_{3}\) and \(\mathcal{C}_{4}\). See Fig. 4.6, where \(\mathcal{C}_{3}\) has single term domination.

Fig. 4.6
figure 6

A 2 × 2 configuration of j-cells

Choosing a Short Vertical Translation . Next we explain how one can satisfy the intersection requirement large triangle from the lower right subrectangle, where the pattern is constant. This is very important, since this requirement guarantees that the dominant integral (4.109) is far from zero. First we pick an arbitrary point \(P_{i} \in \mathcal{P}\). Then of course the hyperbolic needle \(P_{i} - H_{\gamma }(N)\) has a long arc such that the slope is close to 4j; long in fact means length of roughly 2j. Therefore, for each point \(P_{i} \in \mathcal{P}\), there is a j-cell \(\mathcal{C}\) such that the intersection \(\mathcal{C}\cap (P_{i} - H_{\gamma }(N))\) has slope close to 4j. Unfortunately, nothing guarantees that \(P_{i} - H_{\gamma }(N)\) intersects only one of the four subrectangles, where the pattern is constant. The solution is very simple. We apply a short vertical translation of the point set \(\mathcal{P}\), but of course the modified Rademacher functions and the test function T(x) remain fixed in the rectangle \(B = [0,M - N] \times [\gamma,M-\gamma ]\). Here a short vertical translation means that the length of the vertical translation runs from 0 to 1. For a j-cell, a translation of length from 0 to 2j η 2 already suffices: as the point P i moves up vertically, the intersection \(\mathcal{C}\cap (P_{i} - H_{\gamma }(N))\) changes, and has good positions where \(P_{i} - H_{\gamma }(N)\) intersects only the lower right subrectangle, where the pattern is constant, and at the same time, this intersection is a large triangle. Since the slope is close to 4j, a positive constant percentage of the translations is good. If we apply translations from 0 to 1, then it will work for all j.

It follows from a standard averaging argument that there isFootnote 11 a vertical translation 0 < t 0 < 1 which is good for many pairs (P i , j) at the same time, where \(P_{i} \in \mathcal{P}\) is a given point and \(j \in \{ 0,1,2,\ldots,n\}\) is an order of the modified Rademacher function. Here many means a positive constant percentage of all pairs.

Of course, a vertical translation has a bad side effect. It causes some points to leave the underlying square [0, M]2. However, luckily for us, it suffices to use short translations of length at most 1, so that we lose relatively few points, and only those that are close to the boundary. Note that the rectangle property in the hypothesis of Proposition 13 guarantees that there are at most O(M) points close to the boundary, which clearly is negligible compared to the number δ M 2 of points in \(\mathcal{P}\).

Summarizing the Vague Geometric Intuition. A typical vertical translation of length 0 < t 0 < 1 has the property that for a positive constant percentage of the pairs \((j,\mathcal{C})\), where \(j \in \{ 0,1,2,\ldots,n\}\) and \(\mathcal{C}\) is a j-cell, we have single term domination, so thatFootnote 12

$$\displaystyle{ \sum _{P_{i}\in \mathcal{P}}\int _{\mathcal{C}\cap (P_{i}-H_{\gamma }(N))}R_{j}(\mathbf{x})\,\mathrm{d}\mathbf{x} \geq \frac{1} {2}\int _{\mathcal{C}\cap (P_{i_{ 0}}-H_{\gamma }(N))}R_{j}(\mathbf{x})\,\mathrm{d}\mathbf{x} \geq c_{20} > 0, }$$
(4.111)

where \(P_{i_{0}}\) is the dominating point , i.e. the intersection \(\mathcal{C}\cap (P_{i_{0}} - H_{\gamma }(N))\) has slope close to 4j, and this intersection is a large triangle from the lower right subrectangle of \(\mathcal{C}\), where the pattern is constant. We shall explain the missing details of (4.111) later, and give an explicit value for c 20.

The Single Term Domination Rule and (4.111) give

$$\displaystyle\begin{array}{rcl} & & \sum _{j\in \mathcal{J}}\sum _{P_{i}\in \mathcal{P}} \frac{1} {(M - N)(M - 2\gamma )}\int _{P_{i}-H_{\gamma }(N)}R_{j}(\mathbf{x})\,\mathrm{d}\mathbf{x} \\ & & \quad \geq c_{21}\vert \mathcal{J}\vert \geq c_{22}(n + 1) > 0. {}\end{array}$$
(4.112)

The geometric intuition requires that \(j \in \mathcal{J}\) satisfies an inequality like

$$\displaystyle{ \max \left \{1, \frac{1} {\gamma } \right \} \leq 2^{j} \leq \min \left \{N, \frac{N} {\gamma } \right \}. }$$
(4.113)

To guarantee (4.113), we choose \(\mathcal{J}\) to be the interval of integers j ∈ { 0, 1, 2, , n} satisfying

$$\displaystyle{ \log _{2}\left (\max \left \{1, \frac{1} {\gamma } \right \}\right ) \leq j \leq \log _{2}N -\log _{2}(\max \{1,\gamma \}). }$$
(4.114)

We emphasize that this was just an intuitive proof of (4.112). We shall return to (4.111) and (4.112) later, and show how we can make the whole argument perfectly precise and explicit.

We shall complete the proof of Proposition 13 in the next three sections. Note that (4.112) is the most difficult part.

6 Proof of Theorem 12 (II): More on the Riesz Product

Applying Super-Orthogonality. We next turn to the contribution of the quadratic, cubic and higher order terms of the Riesz product (4.102) to (4.106). Let k ≥ 2, and let \(0 \leq j_{1} <\ldots < j_{k} \leq n\). Suppose that \(\mathcal{C}^{{\ast}}\) is the non-empty intersection of k cells of orders \(j_{1} <\ldots < j_{k}\). Then \(\mathcal{C}^{{\ast}}\) is an elementary cell of size \(2^{j_{1}}\eta _{1} \times 2^{-j_{k}}\eta _{2} = 2^{j_{1}-j_{k}}\eta _{1}\eta _{2}\). Super-orthogonality yields that the product \(R_{j_{1}}(\mathbf{x})\ldots R_{j_{k}}(\mathbf{x})\) of k modified Rademacher functions of the given orders, restricted to \(\mathcal{C}^{{\ast}}\), equals one of the 3 patterns + −, − + or 0.

Assume that the translated and reflected hyperbolic needle \(P_{i} - H_{\gamma }(N)\) intersects \(\mathcal{C}^{{\ast}}\), and let \(\mathrm{slope} =\mathrm{ slope}(\mathcal{C}^{{\ast}}\cap (P_{i} - H_{\gamma }(N)))\) denote the slopeFootnote 13 of the intersection \(\mathcal{C}^{{\ast}}\cap (P_{i} - H_{\gamma }(N))\). Simple geometric consideration shows that, roughly speaking, the integral

$$\displaystyle{ \frac{1} {\mathrm{area}(\mathcal{C}^{{\ast}})}\int _{\mathcal{C}^{{\ast}}\cap (P_{i}-H_{\gamma }(N))}R_{j_{1}}(\mathbf{x})\ldots R_{j_{k}}(\mathbf{x})\,\mathrm{d}\mathbf{x}}$$

is negligible unless the slope of the intersection \(\mathcal{C}^{{\ast}}\cap (P_{i} - H_{\gamma }(N))\) is close to \(2^{-(j_{1}+j_{k})}\), the slope of the diagonal of \(\mathcal{C}^{{\ast}}\). More precisely, we have

$$\displaystyle\begin{array}{rcl} & & \frac{1} {\mathrm{area}(\mathcal{C}^{{\ast}})}\left \vert \int _{\mathcal{C}^{{\ast}}\cap (P_{i}-H_{\gamma }(N))}R_{j_{1}}(\mathbf{x})\ldots R_{j_{k}}(\mathbf{x})\,\mathrm{d}\mathbf{x}\right \vert \\ & & \quad \leq \min \left \{ \frac{1} {\mathrm{slope} \cdot 2^{j_{1}+j_{k}}},\mathrm{slope} \cdot 2^{j_{1}+j_{k} }\right \}. {}\end{array}$$
(4.115)

Note that (4.115) is a straightforward corollary of the geometry of the 3 possible patterns of \(R_{j_{1}}(\mathbf{x})\ldots R_{j_{k}}(\mathbf{x})\) in \(\mathcal{C}^{{\ast}}\).

The hyperbolic needle H γ (N) is bounded by the long curves y = γx and its reflection y = −γx, with 1 ≤ x ≤ N. The slope is the derivative \((-\gamma /x)^{{\prime}} =\gamma x^{-2}\). The number of elementary cells \(\mathcal{C}^{{\ast}}\) of size \(2^{j_{1}-j_{k}}\eta _{1}\eta _{2}\) intersecting a fixed hyperbolic needle \(P_{i} - H_{\gamma }(N)\) is estimated from above by the simple expression

$$\displaystyle{ 2\left ( \frac{2N} {2^{j_{1}}\eta _{1}} + \frac{2\gamma } {2^{-j_{k}}\eta _{2}}\right ). }$$
(4.116)

Here the factor 2 comes from the two long boundary hyperbolic curves, the first term comes from the pointed end of the hyperbolic needle, and the second term comes from the wide part of the hyperbolic needle. A more detailed explanation of (4.116) goes as follows.

Let us start with the pointed end of the hyperbolic needle H γ (N).

Case A.

As x runs through the interval \(N \geq x \geq \sqrt{\gamma }2^{(j_{1}+j_{k})/2}\), the slope of the intersection \(\mathcal{C}^{{\ast}}\cap (P_{i} - H_{\gamma }(N))\) is γ x −2, which is less than \(2^{-(j_{1}+j_{k})}\), the slope of the diagonal of \(\mathcal{C}^{{\ast}}\). It follows that in this range, \(P_{i} - H_{\gamma }(N)\) intersects fewer than

$$\displaystyle{2 \cdot \frac{2N} {2^{j_{1}}\eta _{1}}}$$

elementary cells \(\mathcal{C}^{{\ast}}\) of size \(2^{j_{1}-j_{k}}\eta _{1}\eta _{2}\), with total area not exceeding \(4\eta _{2}N2^{-j_{k}}\).

Case B.

As x runs through the interval \(\sqrt{\gamma }2^{(j_{1}+j_{k})/2} \geq x \geq 1\), the slope of the intersection \(\mathcal{C}^{{\ast}}\cap (P_{i} - H_{\gamma }(N))\) is greater than \(2^{-(j_{1}+j_{k})}\), the slope of the diagonal of \(\mathcal{C}^{{\ast}}\). It follows that in this range, \(P_{i} - H_{\gamma }(N)\) intersects fewer than

$$\displaystyle{2 \cdot \frac{2\gamma } {2^{-j_{k}}\eta _{2}}}$$

elementary cells \(\mathcal{C}^{{\ast}}\) of size \(2^{j_{1}-j_{k}}\eta _{1}\eta _{2}\), with total area not exceeding \(4\eta _{1}\gamma 2^{j_{1}}\).

In Case A, we view the hyperbola xy = γ as y = γx. In Case B, we switch the role of the coordinate axes and view the same hyperbola as x = γy. Thus by (4.115) and (4.116), we have

$$\displaystyle\begin{array}{rcl} & & \left \vert \int _{P_{i}-H_{\gamma }(N)}R_{j_{1}}(\mathbf{x})\ldots R_{j_{k}}(\mathbf{x})\,\mathrm{d}\mathbf{x}\right \vert \\ & & \ \leq 4\eta _{2}N2^{-j_{k} } \cdot \frac{2} {n}\int _{\sqrt{\gamma }2^{(j_{1}+j_{k})/2}}^{N}\frac{\gamma 2^{j_{1}+j_{k}}} {x^{2}} \,\mathrm{d}x + 4\eta _{1}\gamma 2^{j_{1} } \cdot \frac{2} {\gamma } \int _{\sqrt{\gamma }2^{-(j_{1}+j_{k})/2}}^{\gamma } \frac{\gamma 2^{-(j_{1}+j_{k})}} {y^{2}} \,\mathrm{d}y \\ & & \ = 8\eta _{2}2^{-j_{k} }\left (\sqrt{\gamma }2^{(j_{1}+j_{k})/2} -\frac{\gamma 2^{j_{1}+j_{k}}} {N} \right ) + 8\eta _{1}2^{j_{1} }\left (\sqrt{\gamma }2^{-(j_{1}+j_{k})/2} -\gamma 2^{-(j_{1}+j_{k})}\right ) \\ & & \ \leq 8\sqrt{\gamma }(\eta _{1} +\eta _{2})2^{(j_{1}-j_{k})/2}. {}\end{array}$$
(4.117)

Recall that the contribution 1 = χ B in (4.102), where \(B = [0,M - N] \times [\gamma,M-\gamma ]\). Combining (4.102), (4.106) and (4.107), we have

$$\displaystyle\begin{array}{rcl} & & \frac{1} {(M - N)(M - 2\gamma )}\int _{0}^{M-N}\int _{ \gamma }^{M-\gamma }\varDelta (\mathbf{x})T(\mathbf{x})\,\mathrm{d}\mathbf{x} \\ & & \quad =\sum _{P_{i}\in \mathcal{P}}\frac{\mathrm{area}(B \cap (P_{i} - H_{\gamma }(N)))} {(M - N)(M - 2\gamma )} -\delta \cdot \mathrm{area}(H_{\gamma }(N)) \\ & & \quad \quad +\rho \sum _{j\in \mathcal{J}}\sum _{P_{i}\in \mathcal{P}} \frac{1} {(M - N)(M - 2\gamma )}\int _{P_{i}-H_{\gamma }(N)}R_{j}(\mathbf{x})\,\mathrm{d}\mathbf{x} \\ & & \quad \quad +\sum _{k\geq 2}\rho ^{k}\mathop{ \sum _{ j_{1}<\ldots <j_{k}}} _{j_{i}\in \mathcal{J}}\sum _{P_{i}\in \mathcal{P}} \frac{1} {(M - N)(M-2\gamma )}\int _{P_{i}-H_{\gamma }(N)}R_{j_{1}}(\mathbf{x})\ldots R_{j_{k}}(\mathbf{x})\,\mathrm{d}\mathbf{x}.{}\end{array}$$
(4.118)

Using (4.117), it is easy to estimate the last term in (4.118). We have

$$\displaystyle\begin{array}{rcl} & & \sum _{k\geq 2}\rho ^{k}\mathop{ \sum _{ j_{1}<\ldots <j_{k}}} _{j_{i}\in \mathcal{J}}\sum _{P_{i}\in \mathcal{P}} \frac{1} {(M - N)(M - 2\gamma )}\left \vert \int _{P_{i}-H_{\gamma }(N)}R_{j_{1}}(\mathbf{x})\ldots R_{j_{k}}(\mathbf{x})\,\mathrm{d}\mathbf{x}\right \vert \\ & & \quad \leq \sum _{k\geq 2}\rho ^{k}\sum _{ 0\leq j_{1}<\ldots <j_{k}\leq n}\sum _{P_{i}\in \mathcal{P}}\frac{8\sqrt{\gamma }(\eta _{1} +\eta _{2})2^{(j_{1}-j_{k})/2}} {(M - N)(M - 2\gamma )}. {}\end{array}$$
(4.119)

For convenience, let us write q = j k j 1. We estimate the sum

$$\displaystyle{ \sum _{k\geq 2}\rho ^{k}\sum _{ j_{1}=0}^{n-k+1}\sum _{ q=k-1}^{n-j_{1} }\sum _{j_{1}<j_{2}<\ldots <j_{k-1}<j_{1}+q}2^{-q/2}. }$$
(4.120)

In the innermost sum in (4.120), the indices \(j_{2},\ldots,j_{k-1}\) can be chosen from among the q − 1 numbers lying between j 1 and j 1 + q in \({q - 1\choose k - 2}\) ways. To simplify (4.120), we can let the indices j 1 and q run up to n. Then we change the order of summation. Thus we have

$$\displaystyle\begin{array}{rcl} & & \sum _{k\geq 2}\rho ^{k}\sum _{ j_{1}=0}^{n-k+1}\sum _{ q=k-1}^{n-j_{1} }\sum _{j_{1}<j_{2}<\ldots <j_{k-1}<j_{1}+q}2^{-q/2} \\ & & \quad \leq \sum _{k\geq 2}\rho ^{k}\sum _{ j_{1}=0}^{n}\sum _{ q=k-1}^{n}{q - 1\choose k - 2}2^{-q/2} =\sum _{ j_{1}=0}^{n}\sum _{ q=1}^{n}2^{-q/2}\sum _{ k=2}^{q+1}\rho ^{k}{q - 1\choose k - 2}.{}\end{array}$$
(4.121)

Note that the innermost sum

$$\displaystyle{\sum _{k=2}^{q+1}\rho ^{k}{q - 1\choose k - 2} =\rho ^{2}\sum _{ k=2}^{q+1}\rho ^{k-2}{q - 1\choose k - 2} =\rho ^{2}(1+\rho )^{q-1}.}$$

It follows that if \(0 <\rho < \sqrt{2} - 1\), then

$$\displaystyle\begin{array}{rcl} & & \sum _{j_{1}=0}^{n}\sum _{ q=1}^{n}2^{-q/2}\sum _{ k=2}^{q+1}\rho ^{k}{q - 1\choose k - 2} =\sum _{ j_{1}=0}^{n}\sum _{ q=1}^{n}2^{-q/2}\rho ^{2}(1+\rho )^{q-1} \\ & & \quad = \frac{(n + 1)\rho ^{2}} {\sqrt{2}} \sum _{q=1}^{n}\left (\frac{1+\rho } {\sqrt{2}}\right )^{q-1} \leq \frac{(n + 1)\rho ^{2}} {\sqrt{2}} \sum _{q=1}^{\infty }\left (\frac{1+\rho } {\sqrt{2}}\right )^{q-1} \\ & & \quad = \frac{(n + 1)\rho ^{2}} {\sqrt{2}} \left (1 -\frac{1+\rho } {\sqrt{2}}\right )^{-1} = \frac{(n + 1)\rho ^{2}} {\sqrt{2} - 1-\rho }. {}\end{array}$$
(4.122)

Combining (4.119)–(4.122), we obtain

Lemma 14.

If \(0 <\rho < \sqrt{2} - 1\) , then

$$\displaystyle\begin{array}{rcl} & & \sum _{k\geq 2}\rho ^{k}\mathop{ \sum _{ j_{1}<\ldots <j_{k}}} _{j_{i}\in \mathcal{J}}\sum _{P_{i}\in \mathcal{P}} \frac{1} {(M - N)(M - 2\gamma )}\left \vert \int _{P_{i}-H_{\gamma }(N)}R_{j_{1}}(\mathbf{x})\ldots R_{j_{k}}(\mathbf{x})\,\mathrm{d}\mathbf{x}\right \vert \\ & & \quad \leq \frac{\vert \mathcal{P}\vert } {(M - N)(M - 2\gamma )} \cdot 8\sqrt{\gamma }(\eta _{1} +\eta _{2}) \cdot \frac{(n + 1)\rho ^{2}} {\sqrt{2} - 1-\rho }. {}\end{array}$$
(4.123)

We return to (4.118). The contribution from the first term on the right hand side is o(1), so that it is negligible. To see this, we recall that \(\vert \mathcal{P}\vert =\delta M^{2}\), and also that \(P_{i} - H_{\gamma }(N) \subset B = [0,M - N] \times [\gamma,M-\gamma ]\) for all but O(M) points \(P_{i} \in \mathcal{P}\). Thus

$$\displaystyle\begin{array}{rcl} & & \sum _{P_{i}\in \mathcal{P}}\frac{\mathrm{area}(B \cap (P_{i} - H_{\gamma }(N)))} {(M - N)(M - 2\gamma )} -\delta \cdot \mathrm{area}(H_{\gamma }(N)) \\ & & \quad = \frac{\delta M^{2} + O(M)} {(M - N)(M - 2\gamma )} \cdot \mathrm{ area}(H_{\gamma }(N)) -\delta \cdot \mathrm{area}(H_{\gamma }(N)) \\ & & \quad = O\left (\frac{N\log N} {M} \right ) = o(1). {}\end{array}$$
(4.124)

For the second term on the right hand side of (4.118), we have the estimate (4.112). Thus combining (4.112), (4.118), (4.123) and (4.124), we obtain

$$\displaystyle\begin{array}{rcl} & & \frac{1} {(M - N)(M - 2\gamma )}\int _{0}^{M-N}\int _{ \gamma }^{M-\gamma }\varDelta (\mathbf{x})T(\mathbf{x})\,\mathrm{d}\mathbf{x} {}\\ & & \quad \geq c_{23}\rho (n + 1) - c_{24} \frac{(n + 1)\rho ^{2}} {\sqrt{2} - 1-\rho }- o(1), {}\\ \end{array}$$

where the constants, the first one yet unspecified, are positive and \(0 <\rho < \sqrt{2} - 1\). By choosing a sufficiently small ρ in the range \(0 <\rho < \sqrt{2} - 1\), we clearly have

$$\displaystyle\begin{array}{rcl} & & \frac{1} {(M - N)(M - 2\gamma )}\int _{0}^{M-N}\int _{ \gamma }^{M-\gamma }\varDelta (\mathbf{x})T(\mathbf{x})\,\mathrm{d}\mathbf{x} {}\\ & & \quad \geq c_{25}\rho (n + 1) > c_{26}\log N > 0, {}\\ \end{array}$$

proving (4.94), and thus proving Proposition 13 in the positive direction; see (4.89). It remains to clarify the missing details in (4.111) and (4.112); see also the paragraph Summarizing the Vague Geometric Intuition at the end of Sect. 4.5.

Single Term Domination : Clarifying the Technical Details. The geometric ideas introduced in Sect. 4.5 lead to the following conclusion. At least half of the short vertical translations \(\mathcal{P} + (0,t_{0})\), where 0 < t 0 < 1, of the given point set \(\mathcal{P}\) have the property that for at least 1 % of the pairs \((j,\mathcal{C})\), where j ∈ { 0, 1, 2, , n} and \(\mathcal{C}\) is a j-cell of the underlying rectangle B = [0, MN] × [γ, Mγ], there is single term domination. This property includes, among other requirements to be specified later, that there is a dominating point \(P_{i_{0}} = P_{i_{0}}(j,\mathcal{C}) \in \mathcal{P}\) such that

  • \(\mathcal{C}\cap (P_{i_{0}} - H_{\gamma }(N))\) has slope between \(\frac{5} {6}4^{-j}\) and \(\frac{7} {6}4^{-j}\);

  • \(P_{i_{0}} - H_{\gamma }(N)\) intersects only the lower right subrectangle of \(\mathcal{C}\), and the intersection is a large triangle, meaning that the area is at least \(\frac{1} {32}\) of the area of \(\mathcal{C}\), that is, the area is at least \(\eta _{1}\eta _{2}/32\).

Then, by choosing the pattern + − in the j-cell \(\mathcal{C}\), we have

$$\displaystyle{ \int _{\mathcal{C}\cap (P_{i_{ 0}}-H_{\gamma }(N))}R_{j}(\mathbf{x})\,\mathrm{d}\mathbf{x} \geq \frac{\eta _{1}\eta _{2}} {32}. }$$
(4.125)

To justify the notion single term domination, we shall show that for a typical pair \((j,\mathcal{C})\), the contribution of the remaining points \(P_{i} \in \mathcal{P}\), with ii 0, in the j-cell \(\mathcal{C}\) is negligible, in the sense that

$$\displaystyle{ \left \vert \mathop{\sum _{P_{i}\in \mathcal{P}}}_{i\neq i_{0}}\int _{\mathcal{C}\cap (P_{i}-H_{\gamma }(N))}R_{j}(\mathbf{x})\,\mathrm{d}\mathbf{x}\right \vert \leq \frac{\eta _{1}\eta _{2}} {40}. }$$
(4.126)

To prove (4.126), let \(P_{i}\neq P_{i_{0}}\) be another point in \(\mathcal{P}\) such that \(P_{i} - H_{\gamma }(N)\) intersects \(\mathcal{C}\), i.e. the upper or lower arc of the boundary of the hyperbolic needle \(P_{i} - H_{\gamma }(N)\) intersects the j-cell \(\mathcal{C}\). We are going to distinguish four cases, depending on the type of the intersection of P i H γ (N) with \(\mathcal{C}\), corresponding to upper or lower arc, and close to horizontal or close to vertical, relative to the diagonals of \(\mathcal{C}\).

Case 15.

The upper arc of \(P_{i} - H_{\gamma }(N)\) intersects \(\mathcal{C}\), and the slope is less than the slope of the dominant needle \(P_{i_{0}} - H_{\gamma }(N)\); see Fig. 4.7.

Fig. 4.7
figure 7

Upper arc of P i H γ (N) intersects \(\mathcal{C}\), with slope less than slope of \(P_{i_{0}} - H_{\gamma }(N)\)

Let \(P_{i_{0}} = (a_{i_{0}},b_{i_{0}})\) and \(P_{i} = (a_{i},b_{i})\) denote the coordinates of the two points in question. By the hypothesis of Case 1, we have \(a_{i} > a_{i_{0}}\). Write

$$\displaystyle{h = h_{i} = a_{i} - a_{i_{0}} > 0\quad \mbox{ and}\quad v = v_{i} = b_{i} - b_{i_{0}},}$$

where of course h denotes horizontal and v denotes vertical. The rectangle property guarantees that \(h\vert v\vert \geq c_{1} > 0\).

Let \((A_{1},A_{2})\) denote the coordinates of the lower left vertex of the j-cell \(\mathcal{C}\). The intersection of the line x = A 1 with the upper arcs of \(P_{i_{0}} - H_{\gamma }(N)\) and \(P_{i} - H_{\gamma }(N)\) give two points, and the hypothesis of Case 1 implies that these intersection points are close to each other. More precisely, with \(x = 1 + a_{i_{0}} - A_{1}\), where \(a_{i_{0}} - A_{1} > 0\) and the additional term 1 comes from the fact that the hyperbolic needle H γ (N) begins at x = 1, we have the upper bound

$$\displaystyle{ \left \vert \left (b_{i_{0}} + \frac{\gamma } {x}\right ) -\left (b_{i} + \frac{\gamma } {x + h}\right )\right \vert < 2 \cdot 2^{-j}\eta _{ 2}. }$$
(4.127)

Since \(b_{i} - b_{i_{0}} = v\), we can rewrite (4.127) in the form

$$\displaystyle{ \left \vert \left ( \frac{\gamma } {x} - \frac{\gamma } {x + h}\right ) - v\right \vert = \left \vert \frac{\gamma h} {x(x + h)} - v\right \vert < 2^{-j+1}\eta _{ 2}. }$$
(4.128)

On the other hand, we know that the slope of the upper arc of \(\mathcal{C}\cap (P_{i_{0}} - H_{\gamma }(N))\) satisfies the inequality

$$\displaystyle{ \frac{5} {6}4^{-j} \leq \frac{\gamma } {x^{2}} \leq \frac{7} {6}4^{-j}. }$$
(4.129)

We claim that if η 1, and so also η 2, is a small constant, then the upper arc of \(P_{i_{0}} - H_{\gamma }(N)\) intersects a large number of j-cells different from \(\mathcal{C}\) such that the slope is still almost equal to 4j. Indeed, the horizontal size of \(\mathcal{C}\) is 2j η 1 and, assuming that (4.129) holds, the inequality

$$\displaystyle{ \frac{5} {6}4^{-j} \leq \frac{\gamma } {(x +\ell 2^{j}\eta _{1})^{2}} \leq \frac{7} {6}4^{-j} }$$
(4.130)

has constant times 1∕η 1 consecutive integer solutions in . If η 1 > 0 is small, then of course 1∕η 1 is large, justifying our claim.

Returning to (4.128) and (4.129), and then substituting x by \(x +\ell 2^{j}\eta _{1}\), we have the respective inequalities

$$\displaystyle{ \left \vert \frac{\gamma h} {(x +\ell 2^{j}\eta _{1})(x +\ell 2^{j}\eta _{1} + h)} - v\right \vert < 2^{-j+1}\eta _{ 2} }$$
(4.131)

and (4.130). If (4.129) holds, then there are at least \(\sqrt{\gamma }/10\eta _{1}\) consecutive integer solutions of (4.130).

The basic idea is the following. If runs through these integer solutions of (4.130) while γ, x, h and v remain fixed, then the function

$$\displaystyle{ \frac{\gamma h} {(x +\ell 2^{j}\eta _{1})(x +\ell 2^{j}\eta _{1} + h)}, }$$
(4.132)

as a function of , has substantially different values, and we expect only very few of them to be very close to a fixed v in the quantitative sense of (4.131). Of course, here we assume that η 2 is small.

Next we work out the details of this intuition. We begin by noting that (4.130) implies

$$\displaystyle{ \sqrt{\frac{6\gamma } {5}}2^{j} \geq x +\ell 2^{j}\eta _{ 1} \geq \sqrt{\frac{6\gamma } {7}}2^{j}. }$$
(4.133)

Using this in (4.132), we have the good approximation

$$\displaystyle{ \frac{\gamma h} {(x +\ell 2^{j}\eta _{1})(x +\ell 2^{j}\eta _{1} + h)} \approx \frac{\gamma h} {\sqrt{\gamma }2^{j}(\sqrt{\gamma }2^{j} + h)} = \frac{h} {2^{j}(2^{j} + h/\sqrt{\gamma })}. }$$
(4.134)

We now distinguish two cases. First assume that \(0 < h \leq \sqrt{c_{1}}2^{j-1}\), where c 1 > 0 is the positive constant in the rectangle property. Then the rectangle property yields

$$\displaystyle{ \vert v\vert \geq \frac{c_{1}} {h} \geq \frac{c_{1}} {\sqrt{c_{1}}2^{j-1}} = 2\sqrt{c_{1}}2^{-j} }$$
(4.135)

and

$$\displaystyle{ \frac{h} {2^{j}(2^{j} + h/\sqrt{\gamma })} < \frac{h} {2^{j}2^{j}} \leq \frac{\sqrt{c_{1}}} {2} 2^{-j}. }$$
(4.136)

The assumption

$$\displaystyle{ \eta _{2} < \frac{\sqrt{c_{1}}} {2}, }$$
(4.137)

together with (4.134)–(4.136), implies that (4.131) has no solution.

We can assume, therefore, that the lower bound

$$\displaystyle{ h > \sqrt{c_{1}}2^{j-1} }$$
(4.138)

holds. Now we go back to the basic idea. We claim that if we switch to + 1 in the function (4.132), then its value changes by at least as much as

$$\displaystyle{ \frac{\eta _{1}2^{-j-2}} {1 + \sqrt{\gamma /c_{1}}}. }$$
(4.139)

Indeed, by (4.133), we have

$$\displaystyle{ \frac{\gamma h} {(x +\ell 2^{j}\eta _{1})(x +\ell 2^{j}\eta _{1} + h)} \approx \frac{1} {\sqrt{\gamma }2^{j} + 2^{j}\eta _{1}} \cdot \frac{\gamma h} {\sqrt{\gamma }2^{j} + h}. }$$
(4.140)

We also have the routine estimate

$$\displaystyle\begin{array}{rcl} & & \frac{1} {\sqrt{\gamma }2^{j}} - \frac{1} {\sqrt{\gamma }2^{j} + 2^{j}\eta _{1}} = \frac{1} {\sqrt{\gamma }2^{j}}\left (1 - \frac{1} {1 +\eta _{1}/\sqrt{\gamma }}\right ) \\ & & \quad = \frac{1} {\sqrt{\gamma }2^{j}}\left ( \frac{\eta _{1}} {\sqrt{\gamma }}-\left ( \frac{\eta _{1}} {\sqrt{\gamma }}\right )^{2} + \left ( \frac{\eta _{1}} {\sqrt{\gamma }}\right )^{3}\mp \ldots \right ) \approx \frac{\eta _{1}} {\gamma 2^{j}}.{}\end{array}$$
(4.141)

Furthermore, by (4.138), we have

$$\displaystyle{ \frac{\gamma h} {\sqrt{\gamma }2^{j} + h} > \frac{\gamma } {2\sqrt{\gamma /c_{1}} + 1}. }$$
(4.142)

Then the error estimate (4.139) follows on combining (4.140)–(4.142).

Let us return to (4.132) and (4.139), and apply them in (4.131). We deduce that among the constant times 1∕η 1 consecutive integer values of satisfying (4.130), there are only constant times \((1 + \sqrt{\gamma /c_{1}})\) that will satisfy (4.131). More explicitly, it is safe to say that

$$\displaystyle{ \mbox{ at most }10\left (1 + \sqrt{ \frac{\gamma } {c_{1}}}\right )\mbox{ values of $\ell$ will satisfy both (4.130) and (4.131)}. }$$
(4.143)

The next step is

A Combination of the Rectangle Property and the Pigeonhole Principle. We recall (4.138), that \(h > \sqrt{c_{1}}2^{j-1}\). Consider the power-of-two type decomposition

$$\displaystyle{ 2^{r-1}\sqrt{c_{ 1}}2^{j} < h \leq 2^{r}\sqrt{c_{ 1}}2^{j},\quad r = 0,1,2,\ldots. }$$
(4.144)

We claim that for a fixed point \(P_{i_{0}} = (a_{i_{0}},b_{i_{0}}) \in \mathcal{P}\) and for a fixed integer r ≥ 0, there are at most

$$\displaystyle{ 10\sqrt{ \frac{\gamma } {c_{1}}}2^{r} }$$
(4.145)

other points \(P_{i} = (a_{i},b_{i}) \in \mathcal{P}\), with \(P_{i}\neq P_{i_{0}}\), such that \(h = h_{i} = a_{i} - a_{i_{0}} > 0\) and \(v = v_{i} = b_{i} - b_{i_{0}}\) satisfy (4.131), thus implicitly (4.130) also, and (4.144).

To establish the bound (4.145), first note that if h = h i satisfies (4.144), then by (4.134) and (4.144), we have

$$\displaystyle\begin{array}{rcl} & & \frac{\gamma h} {(x +\ell 2^{j}\eta _{1})(x +\ell 2^{j}\eta _{1} + h)} \approx \frac{h} {2^{j}(2^{j} + h/\sqrt{\gamma })} {}\\ & & \quad \approx \frac{2^{r}\sqrt{c_{1}}2^{j}} {2^{j}(2^{j} + 2^{r}\sqrt{c_{1}}2^{j}/\sqrt{\gamma })} = \frac{2^{-j}} {1/\sqrt{\gamma } + 2^{-r}/\sqrt{c_{1}}}, {}\\ \end{array}$$

so that a solution of (4.131) gives the approximation

$$\displaystyle{ v = v_{i} \approx 2^{-j}\left ( \frac{1} {1/\sqrt{\gamma } + 2^{-r}/\sqrt{c_{1}}} \pm 2\eta _{2}\right ). }$$
(4.146)

Assuming

$$\displaystyle{ \eta _{2} < \frac{1} {8(1/\sqrt{\gamma } + 1/\sqrt{c_{1}})}, }$$
(4.147)

then (4.146) yields the good approximation

$$\displaystyle{ v = v_{i} \approx \frac{2^{-j}} {1/\sqrt{\gamma } + 2^{-r}/\sqrt{c_{1}}}. }$$
(4.148)

Suppose on the contrary that there are more than (4.145) other points \(P_{i} = (a_{i},b_{i}) \in \mathcal{P}\), with \(P_{i}\neq P_{i_{0}}\), such that \(h = h_{i} = a_{i} - a_{i_{0}} > 0\) and \(v = v_{i} = b_{i} - b_{i_{0}}\) satisfy (4.131), thus implicitly (4.130) also, and (4.144). Then by the Pigeonhole Principle and (4.148), there must exist two points \(P_{i_{1}},P_{i_{2}} \in \mathcal{P}\), with \(i_{1}\neq i_{2}\), such that

$$\displaystyle{v_{i_{1}} \approx \frac{2^{-j}} {1/\sqrt{\gamma } + 2^{-r}/\sqrt{c_{1}}} \approx v_{i_{2}}\quad \mbox{ and}\quad \vert h_{i_{1}} - h_{i_{2}}\vert \leq \frac{2^{r}\sqrt{c_{1}}2^{j}} {10\sqrt{\gamma /c_{1}}2^{r}} = \frac{c_{1}2^{j}} {10\sqrt{\gamma }}.}$$

Since the product

$$\displaystyle{ \frac{2^{-j}} {1/\sqrt{\gamma } + 2^{-r}/\sqrt{c_{1}}} \cdot \frac{c_{1}2^{j}} {\sqrt{\gamma }} = \frac{c_{1}} {1 + 2^{-r}\sqrt{\gamma /c_{1}}} < c_{1},}$$

we conclude that there exists an axes-parallel rectangle of area less than c 1 and which contains at least two points of \(\mathcal{P}\), namely \(P_{i_{1}}\) and \(P_{i_{2}}\). This contradicts the rectangle property, and establishes the bound (4.145).

If h = h i falls into the interval (4.144), then

$$\displaystyle{ \mathrm{slope}(\mathcal{C}\cap (P_{i} - H_{\gamma }(N))) = \frac{\gamma } {(x + h)^{2}} \leq \frac{\gamma } {h^{2}} \approx \frac{\gamma } {c_{1}4^{r}} \cdot 4^{-j}, }$$
(4.149)

where 4j almost equals the slope of the diagonals of the j-cell \(\mathcal{C}\). By (4.149), we have

$$\displaystyle{ \frac{1} {\mathrm{area}(\mathcal{C})}\left \vert \int _{\mathcal{C}\cap (P_{i}-H_{\gamma }(N))}R_{j}(\mathbf{x})\,\mathrm{d}\mathbf{x}\right \vert \leq \frac{10\gamma } {c_{1}4^{r}}. }$$
(4.150)

Furthermore, (4.150) holds for all j-cells \(\mathcal{C}\) satisfying

$$\displaystyle{ \frac{5} {6}4^{-j} \leq \mathrm{ slope}(\mathcal{C}\cap (P_{ i_{0}} - H_{\gamma }(N))) \leq \frac{7} {6}4^{-j}. }$$
(4.151)

Let us return now to (4.126). Combining (4.143)–(4.145) and (4.150), we have

$$\displaystyle\begin{array}{rcl} & & \!\!\mathop{\mathop{\sum _{P_{i}\in \mathcal{P}}}_{i\neq i_{0}}} _{\mathrm{Case\ 1}}\mathop{ \sum _{\mathcal{C}}}_{\text{(4.151)}} \frac{1} {\mathrm{area}(\mathcal{C})}\left \vert \int _{\mathcal{C}\cap (P_{i}-H_{\gamma }(N))}R_{j}(\mathbf{x})\,\mathrm{d}\mathbf{x}\right \vert \leq \sum _{r\geq 0}10\left (1+\sqrt{ \frac{\gamma } {c_{1}}}\right )10\sqrt{ \frac{\gamma } {c_{1}}}2^{r} \frac{10\gamma } {c_{1}4^{r}} \\ & & \quad \! = 1000\left (\left ( \frac{\gamma } {c_{1}}\right )^{3/2}+\left ( \frac{\gamma } {c_{1}}\right )^{2}\right )\sum _{ r\geq 0}2^{-r} = 2000\left (\!\left ( \frac{\gamma } {c_{1}}\right )^{3/2}+\left (\! \frac{\gamma } {c_{1}}\right )^{2}\right ). {}\end{array}$$
(4.152)

Since there are at least γ∕10η 1 consecutive integer solutions of (4.130), assuming that (4.129) holds, we have

$$\displaystyle{ \mathop{\sum _{\mathcal{C}}}_{(4.151)}1 \geq \frac{\sqrt{\gamma }} {10\eta _{1}}. }$$
(4.153)

Recall that in order to prove (4.126), we distinguish four cases. Inequalities (4.152) and (4.153) complete Case 1. The remaining three cases will be discussed in the next section. Note that these cases are quite similar to Case 1, but there are some annoying differences in the minor details. We shall complete the proof of Proposition 13 in Sect. 4.8.

7 Proof of Theorem 12 (III): Completing the Case Study

Let us return to (4.125) and (4.126). Again we assume that there is a dominating point \(P_{i_{0}} = P_{i_{0}}(j,\mathcal{C}) \in \mathcal{P}\) such that

  • \(\mathcal{C}\cap (P_{i_{0}} - H_{\gamma }(N))\) has slope between \(\frac{5} {6}4^{-j}\) and \(\frac{7} {6}4^{-j}\);

  • \(P_{i_{0}} - H_{\gamma }(N)\) intersects only the lower right subrectangle of \(\mathcal{C}\), and the intersection is a large triangle, meaning that the area is at least \(\frac{1} {32}\) of the area of \(\mathcal{C}\), that is, the area is at least \(\eta _{1}\eta _{2}/32\).

Again let \(P_{i}\neq P_{i_{0}}\) be another point in \(\mathcal{P}\) such that \(P_{i} - H_{\gamma }(N)\) intersects \(\mathcal{C}\), i.e. the upper or lower arc of the boundary of the hyperbolic needle \(P_{i} - H_{\gamma }(N)\) intersects the j-cell \(\mathcal{C}\). We now discuss the second case, which is quite similar to the first case. Roughly speaking, we switch the roles of the horizontal and the vertical.

Case 16.

The upper arc of P i H γ (N) intersects \(\mathcal{C}\), and the slope is greater than the slope of the dominant needle \(P_{i_{0}} - H_{\gamma }(N)\); see Fig. 4.8.

Fig. 4.8
figure 8

Upper arc of P i H γ (N) intersects \(\mathcal{C}\), with slope greater than slope of \(P_{i_{0}} - H_{\gamma }(N)\)

Let \(P_{i_{0}} = (a_{i_{0}},b_{i_{0}})\) and \(P_{i} = (a_{i},b_{i})\) denote the coordinates of the two points in question. By the hypothesis of Case 2, we have \(a_{i_{0}} > a_{i}\). Write

$$\displaystyle{h = h_{i} = a_{i_{0}} - a_{i} > 0\quad \mbox{ and}\quad v = v_{i} = b_{i_{0}} - b_{i},}$$

where again h denotes horizontal and v denotes vertical. The rectangle property guarantees that \(h\vert v\vert \geq c_{1} > 0\).

Let (A 1, A 2) denote the coordinates of the upper left vertex of the j-cell \(\mathcal{C}\). The intersection of the line y = A 2 with the upper arcs of \(P_{i_{0}} - H_{\gamma }(N)\) and \(P_{i} - H_{\gamma }(N)\) give two points, and the hypothesis of Case 16 implies that these intersection points are close to each other. More precisely, with \(y = A_{2} - b_{i_{0}}\), we have the upper bound

$$\displaystyle{ \left \vert \left (a_{i} - \frac{\gamma } {y + v}\right ) -\left (a_{i_{0}} - \frac{\gamma } {y}\right )\right \vert < 2 \cdot 2^{j}\eta _{ 1}. }$$
(4.154)

Since \(a_{i_{0}} - a_{i} = h > 0\), we can rewrite (4.154) in the form

$$\displaystyle{ \left \vert \left ( \frac{\gamma } {y} - \frac{\gamma } {y + v}\right ) - h\right \vert = \left \vert \frac{\gamma v} {y(y + v)} - h\right \vert < 2^{j+1}\eta _{ 1}. }$$
(4.155)

We emphasize that y + v > 0, otherwise

$$\displaystyle{0 \geq y + v = (A_{2} - b_{i_{0}}) + (b_{i_{0}} - b_{i}) = A_{2} - b_{i},}$$

so that b i  ≥ A 2, which means that the whole upper arc of \(P_{i} - H_{\gamma }(N)\) is above the j-cell \(\mathcal{C}\). But this is impossible, since in Case 2 we assume that the upper arc of \(P_{i} - H_{\gamma }(N)\) intersects \(\mathcal{C}\).

Since we switch the roles of the horizontal and the vertical, we focus on the reciprocal of the slope. We know that the reciprocal of the slope of the upper arc of \(\mathcal{C}\cap (P_{i_{0}} - H_{\gamma }(N))\) satisfies the inequality

$$\displaystyle{ \frac{6} {7}4^{j} \leq \frac{\gamma } {y^{2}} \leq \frac{6} {5}4^{j}. }$$
(4.156)

We claim that if η 2, and so also η 1, is a small constant, then the upper arc of \(P_{i_{0}} - H_{\gamma }(N)\) intersects a large number of j-cells different from \(\mathcal{C}\) such that the reciprocal of the slope is still almost equal to 4j. Indeed, the vertical size of \(\mathcal{C}\) is 2j η 2 and, assuming that (4.156) holds, the inequality

$$\displaystyle{ \frac{6} {7}4^{j} \leq \frac{\gamma } {(y +\ell 2^{-j}\eta _{2})^{2}} \leq \frac{6} {5}4^{j} }$$
(4.157)

has constant times 1∕η 2 consecutive integer solutions in . If η 2 > 0 is small, then of course 1∕η 2 is large, justifying our claim.

Returning to (4.155) and (4.156), and then substituting y by \(y +\ell 2^{-j}\eta _{2}\), we have the respective inequalities

$$\displaystyle{ \left \vert \frac{\gamma v} {(y +\ell 2^{-j}\eta _{2})(y +\ell 2^{-j}\eta _{2} + v)} - h\right \vert < 2^{j+1}\eta _{ 1} }$$
(4.158)

and (4.157). If (4.156) holds, then there are at least \(\sqrt{\gamma }/10\eta _{2}\) consecutive integer solutions of (4.157).

The basic idea is the same as in Case 15. If runs through these integer solutions of (4.157) while γ, y, h and v remain fixed, then the function

$$\displaystyle{ \frac{\gamma v} {(y +\ell 2^{-j}\eta _{2})(y +\ell 2^{-j}\eta _{2} + v)}, }$$
(4.159)

as a function of , has substantially different values, and we expect only very few of them to be very close to a fixed h in the quantitative sense of (4.158). Of course, here we assume that η 1 is small.

Next we work out the details of this intuition. We begin by noting that (4.157) implies

$$\displaystyle{ \sqrt{\frac{6\gamma } {7}}2^{-j} \leq y +\ell 2^{-j}\eta _{ 2} \leq \sqrt{\frac{6\gamma } {5}}2^{-j}. }$$
(4.160)

Using this in (4.159), we have the good approximation

$$\displaystyle\begin{array}{rcl} \frac{\gamma v} {(y +\ell 2^{-j}\eta _{2})(y +\ell 2^{-j}\eta _{2} + v)} \approx \frac{\gamma v} {\sqrt{\gamma }2^{-j}(\sqrt{\gamma }2^{-j} + v)} = \frac{v} {2^{-j}(2^{-j} + v\sqrt{\gamma })}.& &{}\end{array}$$
(4.161)

We now distinguish three cases. First assume that v < 0. Since y + v > 0, we have \(y^{-1} < (y + v)^{-1}\), and so by (4.158), we have

$$\displaystyle{ 2^{j+1}\eta _{ 1} >\vert h\vert = h. }$$
(4.162)

Combining (4.162) with the rectangle property, we deduce that

$$\displaystyle{ \vert v\vert \geq \frac{c_{1}} {h} > \frac{c_{1}} {2\eta _{1}}2^{-j}. }$$
(4.163)

Substituting (4.163) into (4.161), and assuming that

$$\displaystyle{ \eta _{1} < \frac{c_{1}} {2\sqrt{\gamma }}, }$$
(4.164)

we have

$$\displaystyle{ \frac{v} {2^{-j}(2^{-j} + v/\sqrt{\gamma })} = \frac{\vert v\vert } {2^{-j}(v/\sqrt{\gamma }- 2^{-j})} = \frac{2^{j}} {1/\sqrt{\gamma }- 2^{-j}/\vert v\vert } > \sqrt{\gamma }2^{j}. }$$
(4.165)

Combining (4.158), (4.161)–(4.163) and (4.165), we conclude that

$$\displaystyle{2^{j+1}\eta _{ 1} > h > \frac{1} {2}\sqrt{\gamma }2^{j} - 2^{j+1}\eta _{ 1},}$$

which is an obvious contradiction if

$$\displaystyle{ \eta _{1} < \frac{\sqrt{\gamma }} {8}. }$$
(4.166)

This proves that v > 0.

Next assume that \(0 < v \leq \sqrt{c_{1}}2^{-j-1}\), where c 1 > 0 is the positive constant in the rectangle property. Then the rectangle property yields

$$\displaystyle{ h \geq \frac{c_{1}} {v} \geq \frac{c_{1}} {\sqrt{c_{1}}2^{-j-1}} = 2\sqrt{c_{1}}2^{j} }$$
(4.167)

and

$$\displaystyle{ \frac{v} {2^{-j}(2^{-j} + v/\sqrt{\gamma })} < \frac{v} {2^{-j}2^{-j}} \leq \frac{\sqrt{c_{1}}} {2} 2^{j}. }$$
(4.168)

The assumption

$$\displaystyle{ \eta _{1} < \frac{\sqrt{c_{1}}} {2}, }$$
(4.169)

together with (4.161), (4.167) and (4.168), implies that (4.158) has no solution.

We can assume, therefore, that the lower bound

$$\displaystyle{ v > \sqrt{c_{1}}2^{-j-1} }$$
(4.170)

holds. Now we go back to the basic idea. We claim that if we switch to + 1 in the function (4.159), then its value changes by at least as much as

$$\displaystyle{ \frac{\eta _{1}2^{j-2}} {1 + \sqrt{\gamma /c_{1}}}. }$$
(4.171)

Indeed, by (4.160), we have

$$\displaystyle{ \frac{\gamma v} {(y +\ell 2^{-j}\eta _{2})(y +\ell 2^{-j}\eta _{2} + v)} \approx \frac{1} {\sqrt{\gamma }2^{-j}} \cdot \frac{\gamma v} {\sqrt{\gamma }2^{-j} + v}. }$$
(4.172)

We also have the routine estimate

$$\displaystyle\begin{array}{rcl} & & \frac{1} {\sqrt{\gamma }2^{-j}} - \frac{1} {\sqrt{\gamma }2^{-j} + 2^{-j}\eta _{2}} = \frac{1} {\sqrt{\gamma }2^{-j}}\left (1 - \frac{1} {1 +\eta _{2}/\sqrt{\gamma }}\right ) \\ & & \quad = \frac{1} {\sqrt{\gamma }2^{-j}}\left ( \frac{\eta _{2}} {\sqrt{\gamma }}-\left ( \frac{\eta _{2}} {\sqrt{\gamma }}\right )^{2} + \left ( \frac{\eta _{2}} {\sqrt{\gamma }}\right )^{3}\mp \ldots \right ) \approx \frac{\eta _{2}2^{j}} {\gamma }. {}\end{array}$$
(4.173)

Furthermore, by (4.170), we have

$$\displaystyle{ \frac{\gamma v} {\sqrt{\gamma }2^{-j} + v} > \frac{\gamma } {2\sqrt{\gamma /c_{1}} + 1}. }$$
(4.174)

The error estimate (4.171) follows on combining (4.172)–(4.174).

Let us return to (4.159) and (4.171), and apply them in (4.158). We deduce that among the constant times 1∕η 2 consecutive integer values of satisfying (4.157), there are only constant times \((1 + \sqrt{\gamma /c_{1}})\) that will satisfy (4.158). More explicitly, it is safe to say that

$$\displaystyle{ \mbox{ at most }10\left (1 + \sqrt{ \frac{\gamma } {c_{1}}}\right )\mbox{ values of $\ell$ will satisfy both (4.157) and (4.158)}. }$$
(4.175)

As in Case 15, the next step is

A Combination of the Rectangle Property and the Pigeonhole Principle. We recall (4.170), that \(v > \sqrt{c_{1}}2^{-j-1}\). Consider the power-of-two type decomposition

$$\displaystyle{ 2^{r-1}\sqrt{c_{ 1}}2^{-j} < v \leq 2^{r}\sqrt{c_{ 1}}2^{-j},\quad r = 0,1,2,\ldots. }$$
(4.176)

We claim that for a fixed point \(P_{i_{0}} = (a_{i_{0}},b_{i_{0}}) \in \mathcal{P}\) and for a fixed integer r ≥ 0, there are at most

$$\displaystyle{ 10\sqrt{ \frac{\gamma } {c_{1}}}2^{r} }$$
(4.177)

other points \(P_{i} = (a_{i},b_{i}) \in \mathcal{P}\), with \(P_{i}\neq P_{i_{0}}\), such that \(h = h_{i} = a_{i_{0}} - a_{i} > 0\) and \(v = v_{i} = b_{i_{0}} - b_{i} > 0\) satisfy (4.158), thus implicitly (4.157) also, and (4.176).

To establish the bound (4.177), first note that if v = v i satisfies (4.176), then by (4.161) and (4.176), we have

$$\displaystyle\begin{array}{rcl} & & \frac{\gamma v} {(y +\ell 2^{-j}\eta _{2})(y +\ell 2^{-j}\eta _{2} + v)} \approx \frac{v} {2^{-j}(2^{-j} + v/\sqrt{\gamma })} {}\\ & & \quad \approx \frac{2^{r}\sqrt{c_{1}}2^{-j}} {2^{-j}(2^{-j} + 2^{r}\sqrt{c_{1}}2^{-j}/\sqrt{\gamma }} = \frac{2^{j}} {1/\sqrt{\gamma } + 2^{-r}/\sqrt{c_{1}}}, {}\\ \end{array}$$

so that a solution of (4.158) gives the approximation

$$\displaystyle{ h = h_{i} \approx 2^{j}\left ( \frac{1} {1/\sqrt{\gamma } + 2^{-r}/\sqrt{c_{1}}} \pm 2\eta _{1}\right ). }$$
(4.178)

Assuming

$$\displaystyle{ \eta _{1} < \frac{1} {8(1/\sqrt{\gamma } + 1/\sqrt{c_{1}})}, }$$
(4.179)

then (4.178) yields the good approximation

$$\displaystyle{ h = h_{i} \approx \frac{2^{j}} {1/\sqrt{\gamma } + 2^{-r}/\sqrt{c_{1}}}. }$$
(4.180)

Suppose on the contrary that there are more than (4.177) other points \(P_{i} = (a_{i},b_{i}) \in \mathcal{P}\), with \(P_{i}\neq P_{i_{0}}\), such that \(h = h_{i} = a_{i_{0}} - a_{i} > 0\) and \(v = v_{i} = b_{i_{0}} - b_{i} > 0\) satisfy (4.158), thus implicitly (4.157) also, and (4.176). Then by the Pigeonhole Principle and (4.180), there must exist two points \(P_{i_{1}},P_{i_{2}} \in \mathcal{P}\), with \(i_{1}\neq i_{2}\), such that

$$\displaystyle{h_{i_{1}} \approx \frac{2^{j}} {1/\sqrt{\gamma } + 2^{-r}/\sqrt{c_{1}}} \approx h_{i_{2}}\quad \mbox{ and}\quad \left \vert v_{i_{1}} - v_{i_{2}}\right \vert \leq \frac{2^{r}\sqrt{c_{1}}2^{-j}} {10\sqrt{\gamma /c_{1}}2^{r}} = \frac{c_{1}2^{-j}} {10\sqrt{\gamma }}.}$$

Since the product

$$\displaystyle{ \frac{2^{j}} {1/\sqrt{\gamma } + 2^{-r}/\sqrt{c_{1}}} \cdot \frac{c_{1}2^{-j}} {\sqrt{\gamma }} = \frac{c_{1}} {1 + 2^{-r}\sqrt{\gamma /c_{1}}} < c_{1},}$$

we conclude that there exists an axes-parallel rectangle of area less than c 1 and which contains at least two points of \(\mathcal{P}\), namely \(P_{i_{1}}\) and \(P_{i_{2}}\). This contradicts the rectangle property, and establishes the bound (4.177).

If v = v i falls into the interval (4.176), then

$$\displaystyle{ \frac{1} {\mathrm{slope}(\mathcal{C}\cap (P_{i} - H_{\gamma }(N)))} = \frac{\gamma } {(y + v)^{2}} \leq \frac{\gamma } {v^{2}} \approx \frac{\gamma } {c_{1}4^{r}} \cdot 4^{j}, }$$
(4.181)

where 4j almost equals the reciprocal of the slope of the diagonals of the j-cell \(\mathcal{C}\). By (4.181), we have

$$\displaystyle{ \frac{1} {\mathrm{area}(\mathcal{C})}\left \vert \int _{\mathcal{C}\cap (P_{i}-H_{\gamma }(N))}R_{j}(\mathbf{x})\,\mathrm{d}\mathbf{x}\right \vert \leq \frac{10\gamma } {c_{1}4^{r}}. }$$
(4.182)

Furthermore, (4.182) holds for all j-cells \(\mathcal{C}\) satisfying (4.151). Let us return now to (4.126). Combining (4.175)–(4.177) and (4.182), we have

$$\displaystyle\begin{array}{rcl} & & \!\!\mathop{\mathop{\sum _{P_{i}\in \mathcal{P}}}_{i\neq i_{0}}} _{\mathrm{Case\ 2}}\mathop{ \sum _{\mathcal{C}}}_{(4.151)} \frac{1} {\mathrm{area}(\mathcal{C})}\left \vert \int _{\mathcal{C}\cap (P_{i}-H_{\gamma }(N))}R_{j}(\mathbf{x})\,\mathrm{d}\mathbf{x}\right \vert \leq \sum _{r\geq 0}10\left (1+\sqrt{ \frac{\gamma } {c_{1}}}\right )10\sqrt{ \frac{\gamma } {c_{1}}}2^{r} \frac{10\gamma } {c_{1}4^{r}} \\ & & \quad = 1000\left (\left ( \frac{\gamma } {c_{1}}\right )^{3/2} + \left ( \frac{\gamma } {c_{1}}\right )^{2}\right )\sum _{ r\geq 0}2^{-r} = 2000\left (\left ( \frac{\gamma } {c_{1}}\right )^{3/2} + \left ( \frac{\gamma } {c_{1}}\right )^{2}\right ), {}\end{array}$$
(4.183)

a perfect analog of (4.152). This completes Case 16.

Case 17.

The lower arc of \(P_{i} - H_{\gamma }(N)\) intersects \(\mathcal{C}\), and the slope is less than the slope of the dominant needle \(P_{i_{0}} - H_{\gamma }(N)\); see Fig. 4.9.

Fig. 4.9
figure 9

Lower arc of P i H γ (N) intersects \(\mathcal{C}\), with slope less than slope of \(P_{i_{0}} - H_{\gamma }(N)\)

Let \(P_{i_{0}} = (a_{i_{0}},b_{i_{0}})\) and \(P_{i} = (a_{i},b_{i})\) denote the coordinates of the two points in question. By the hypothesis of Case 17, we have \(a_{i} > a_{i_{0}}\). Write

$$\displaystyle{h = h_{i} = a_{i} - a_{i_{0}} > 0\quad \mbox{ and}\quad v = v_{i} = b_{i} - b_{i_{0}},}$$

where again h denotes horizontal and v denotes vertical. It is obvious from the geometry of Case 17 that v > 0. The rectangle property guarantees that \(hv \geq c_{1} > 0\).

Let \((A_{1},A_{2})\) denote the coordinates of the lower left vertex of the j-cell \(\mathcal{C}\). The intersection of the line x = A 1 with the upper arc of \(P_{i_{0}} - H_{\gamma }(N)\) and the lower arc of \(P_{i} - H_{\gamma }(N)\) give two points, and the hypothesis of Case 3 implies that these intersection points are close to each other. More precisely, similar to Case 1, with \(x = 1 + a_{i_{0}} - A_{1}\), we have the upper bound

$$\displaystyle{ \left \vert \left (b_{i_{0}} + \frac{\gamma } {x}\right ) -\left (b_{i} - \frac{\gamma } {x + h}\right )\right \vert < 2 \cdot 2^{-j}\eta _{ 2}. }$$
(4.184)

Since \(b_{i} - b_{i_{0}} = v\), we can rewrite (4.184) in the form

$$\displaystyle{ \left \vert \left ( \frac{\gamma } {x} + \frac{\gamma } {x + h}\right ) - v\right \vert < 2^{-j+1}\eta _{ 2}. }$$
(4.185)

Note that (4.185) is an analog of (4.128) in Case 15, the only difference being that a minus sign is replaced by plus sign. This means that we can basically repeat the argument in Case 15. In fact, the plus sign helps and makes Case 17 simpler than Case 15. On the other hand, we know that the slope of the upper arc of \(\mathcal{C}\cap (P_{i_{0}} - H_{\gamma }(N))\) satisfies the inequality (4.129).

Again, if η 1, and so also η 2, is a small constant, then the upper arc of \(P_{i_{0}} - H_{\gamma }(N)\) intersects a large number of j-cells different from \(\mathcal{C}\) such that the slope is still almost equal to 4j. Indeed, the horizontal size of \(\mathcal{C}\) is 2j η 1 and, assuming that (4.129) holds, the inequality (4.130) has constant times 1∕η 1 consecutive integer solutions in .

Returning to (4.129) and (4.185), and then substituting x by \(x +\ell 2^{j}\eta _{1}\), we have the respective inequalities (4.130) and

$$\displaystyle{ \left \vert \frac{\gamma } {x +\ell 2^{j}\eta _{1}} + \frac{\gamma } {x +\ell 2^{j}\eta _{1} + h} - v\right \vert < 2^{-j+1}\eta _{ 2}. }$$
(4.186)

If (4.129) holds, then there are at least \(\sqrt{\gamma }/10\eta _{1}\) consecutive integer solutions of (4.130).

The basic idea is the same as in Case 15. If runs through these integer solutions of (4.130) while γ, x, h and v remain fixed, then the function

$$\displaystyle{ \frac{\gamma } {x +\ell 2^{j}\eta _{1}} + \frac{\gamma } {x +\ell 2^{j}\eta _{1} + h}, }$$
(4.187)

as a function of , has substantially different values, and we expect only very few of them to be very close to a fixed v in the quantitative sense of (4.186). Of course, here we assume that η 2 is small.

Next we work out the details of this intuition. We begin by noting that (4.130) implies (4.133). Using this in (4.187), we have the good approximation

$$\displaystyle{ \frac{\gamma } {x +\ell 2^{j}\eta _{1}} + \frac{\gamma } {x +\ell 2^{j}\eta _{1} + h} \approx \frac{\gamma } {\sqrt{\gamma }2^{j}} + \frac{\gamma } {\sqrt{\gamma }2^{j} + h}. }$$
(4.188)

We now distinguish two cases. First assume that \(0 < h \leq c_{1}2^{j-2}/\sqrt{\gamma }\), where c 1 > 0 is the positive constant in the rectangle property. Then the rectangle property yields

$$\displaystyle{ \vert v\vert \geq \frac{c_{1}} {h} \geq \frac{c_{1}} {\sqrt{c_{1}}2^{j-1}} = 2\sqrt{c_{1}}2^{-j}. }$$
(4.189)

On the other hand, assuming that

$$\displaystyle{ \eta _{2} < \frac{\sqrt{\gamma }} {2}, }$$
(4.190)

it then follows from (4.186) and (4.188) that

$$\displaystyle{ v \leq \frac{2\gamma } {\sqrt{\gamma }2^{j}} + 2^{-j+1}\eta _{ 2} < 4\sqrt{\gamma }2^{-j}. }$$
(4.191)

Since (4.189) and (4.191) contradict each other, we can therefore assume that

$$\displaystyle{ h > \frac{c_{1}2^{j-2}} {\sqrt{\gamma }}, }$$
(4.192)

which is an analog of (4.138) in Case 1. Now we go back to the basic idea. We claim that if we switch to + 1 in the function (4.187), then its value changes by at least as much as

$$\displaystyle{ \eta _{1}2^{-j-2}, }$$
(4.193)

an analog of (4.139). Indeed, (4.193) follows immediately from the routine estimate

$$\displaystyle{ \frac{1} {\sqrt{\gamma }2^{j}} - \frac{1} {\sqrt{\gamma }2^{j} + 2^{j}\eta _{1}} = \frac{1} {\sqrt{\gamma }2^{j}}\left (1 - \frac{1} {1 +\eta _{1}/\sqrt{\gamma }}\right ) \approx \frac{\eta _{1}} {\gamma 2^{j}}.}$$

Let us return to (4.187) and (4.193), and apply them in (4.186). We deduce that

$$\displaystyle{ \mbox{ at most $10$ values of $\ell$ will satisfy both (4.130) and (4.186)}. }$$
(4.194)

As in Cases 1516, the next step is

A Combination of the Rectangle Property and the Pigeonhole Principle. We recall (4.192), that \(h > c_{1}2^{j-2}/\sqrt{\gamma }\). Consider the power-of-two type decomposition

$$\displaystyle{ 2^{r-1}\frac{c_{1}2^{j-1}} {\sqrt{\gamma }} < h \leq 2^{r}\frac{c_{1}2^{j-1}} {\sqrt{\gamma }},\quad r = 0,1,2,\ldots. }$$
(4.195)

We claim that for a fixed point \(P_{i_{0}} = (a_{i_{0}},b_{i_{0}}) \in \mathcal{P}\) and for a fixed integer r ≥ 0, there are at most

$$\displaystyle{ 10 \cdot 2^{r} }$$
(4.196)

other points \(P_{i} = (a_{i},b_{i}) \in \mathcal{P}\), with \(P_{i}\neq P_{i_{0}}\), such that \(h = h_{i} = a_{i} - a_{i_{0}} > 0\) and \(v = v_{i} = b_{i} - b_{i_{0}}\) satisfy (4.186), thus implicitly (4.130) also, and (4.195).

To establish the bound (4.196), first note that if h = h i satisfies (4.195), then by (4.188) and (4.195), we have

$$\displaystyle{ \frac{\gamma } {x +\ell 2^{j}\eta _{1}} + \frac{\gamma } {x +\ell 2^{j}\eta _{1} + h} \approx \frac{\gamma } {\sqrt{\gamma }2^{j}} + \frac{\gamma } {\sqrt{\gamma }2^{j} + h} \approx \sqrt{\gamma }2^{-j}\left (1 + \frac{1} {1 + c_{1}2^{r-1}/\gamma }\right ),}$$

so that a solution of (4.186) gives the approximation

$$\displaystyle{ v = v_{i} \approx \sqrt{\gamma }2^{-j}\left (1 + \frac{1} {1 + c_{1}2^{r-1}/\gamma }\right ) \pm 2^{-j+1}\eta _{ 2}. }$$
(4.197)

Assuming

$$\displaystyle{ \eta _{2} < \frac{\sqrt{\gamma }} {100}, }$$
(4.198)

then (4.197) yields the good approximation

$$\displaystyle{ v = v_{i} \approx \sqrt{\gamma }2^{-j}\left (1 + \frac{1} {1 + c_{1}2^{r-1}/\gamma }\right ). }$$
(4.199)

Suppose, contrary to the bound (4.196), that there are more than 10 ⋅ 2r other points \(P_{i} = (a_{i},b_{i}) \in \mathcal{P}\), with \(P_{i}\neq P_{i_{0}}\), such that \(h = h_{i} = a_{i} - a_{i_{0}} > 0\) and \(v = v_{i} = b_{i} - b_{i_{0}}\) satisfy (4.186), thus implicitly (4.130) also, and (4.195). Then by the Pigeonhole Principle and (4.199), there must exist two points \(P_{i_{1}},P_{i_{2}} \in \mathcal{P}\), with \(i_{1}\neq i_{2}\), such that

$$\displaystyle{v_{i_{1}} \approx \sqrt{\gamma }2^{-j}\left (1 + \frac{1} {1 + c_{1}2^{r-1}/\gamma }\right ) \approx v_{i_{2}}\!\!\quad \mbox{ and}\!\!\quad \vert h_{i_{1}} - h_{i_{2}}\vert \leq \frac{2^{r}c_{1}2^{j-1}/\sqrt{\gamma }} {10 \cdot 2^{r}} = \frac{c_{1}2^{j}} {20\sqrt{\gamma }}.}$$

Since the product

$$\displaystyle{\sqrt{\gamma }2^{-j}\left (1 + \frac{1} {1 + c_{1}2^{r-1}/\gamma }\right ) \cdot \frac{c_{1}2^{j}} {2\sqrt{\gamma }} < c_{1},}$$

we conclude that there exists an axes-parallel rectangle of area less than c 1 and which contains at least two points of \(\mathcal{P}\), namely \(P_{i_{1}}\) and \(P_{i_{2}}\). This contradicts the rectangle property, and establishes the bound (4.196).

If h = h i falls into the interval (4.195), then

$$\displaystyle{ \mathrm{slope}(\mathcal{C}\cap (P_{i} - H_{\gamma }(N))) = \frac{\gamma } {(x + h)^{2}} \leq \frac{\gamma } {h^{2}} \leq \frac{(\gamma /c_{1})^{2}} {4^{r-2}} \cdot 4^{-j}, }$$
(4.200)

where 4j almost equals the slope of the diagonals of the j-cell \(\mathcal{C}\). By (4.200), we have

$$\displaystyle{ \frac{1} {\mathrm{area}(\mathcal{C})}\left \vert \int _{\mathcal{C}\cap (P_{i}-H_{\gamma }(N))}R_{j}(\mathbf{x})\,\mathrm{d}\mathbf{x}\right \vert \leq \frac{10(\gamma /c_{1})^{2}} {4^{r-2}}. }$$
(4.201)

Furthermore, (4.201) holds for all j-cells \(\mathcal{C}\) satisfying (4.151). Let us return now to (4.126). Combining (4.194)–(4.196) and (4.201), we have

$$\displaystyle\begin{array}{rcl} & & \mathop{\mathop{\sum _{P_{i}\in \mathcal{P}}}_{i\neq i_{0}}} _{\mathrm{Case\ 3}}\mathop{ \sum _{\mathcal{C}}}_{(4.151)} \frac{1} {\mathrm{area}(\mathcal{C})}\left \vert \int _{\mathcal{C}\cap (P_{i}-H_{\gamma }(N))}R_{j}(\mathbf{x})\,\mathrm{d}\mathbf{x}\right \vert \leq \sum _{r\geq 0}10 \cdot 10 \cdot 2^{r} \cdot \frac{10(\gamma /c_{1})^{2}} {4^{r-2}} \\ & & \quad = 16000\left ( \frac{\gamma } {c_{1}}\right )^{2}\sum _{ r\geq 0}2^{-r} = 32000\left ( \frac{\gamma } {c_{1}}\right )^{2}. {}\end{array}$$
(4.202)

This completes Case 17.

Case 18.

The lower arc of P i H γ (N) intersects \(\mathcal{C}\), and the slope is greater than the slope of the dominant needle \(P_{i_{0}} - H_{\gamma }(N)\); see Fig. 4.10.

Fig. 4.10
figure 10

Lower arc of P i H γ (N) intersects \(\mathcal{C}\), with slope greater than slope of \(P_{i_{0}} - H_{\gamma }(N)\)

Let \(P_{i_{0}} = (a_{i_{0}},b_{i_{0}})\) and \(P_{i} = (a_{i},b_{i})\) denote the coordinates of the two points in question. By the hypothesis of Case 4, we have \(a_{i_{0}} > a_{i}\). We want positive real numbers, and write

$$\displaystyle{h = h_{i} = a_{i_{0}} - a_{i} > 0\quad \mbox{ and}\quad v = v_{i} = b_{i} - b_{i_{0}} > 0,}$$

where again h denotes horizontal and v denotes vertical. The rectangle property guarantees that hv ≥ c 1 > 0.

Let \((A_{1},A_{2})\) denote the coordinates of the lower left vertex of the j-cell \(\mathcal{C}\). We have \(b_{i} > A_{2} > b_{i_{0}}\) and \(b_{i} - A_{2} > A_{2} - b_{i_{0}}\). The intersection of the line y = A 2 with the upper arc of \(P_{i_{0}} - H_{\gamma }(N)\) and the lower arc of \(P_{i} - H_{\gamma }(N)\) give two points, and the hypothesis of Case 4 implies that these intersection points are relatively close to each other in the following quantitative sense. Write \(y = A_{2} - b_{i_{0}} > 0\). Then \(b_{i} - A_{2} = (b_{i} - b_{i_{0}}) - y = v - y > y\), and we have the upper bound

$$\displaystyle{ \left \vert \left (a_{i} - \frac{\gamma } {v - y}\right ) -\left (a_{i_{0}} - \frac{\gamma } {y}\right )\right \vert < 2 \cdot 2^{j}\eta _{ 1}. }$$
(4.203)

Since \(a_{i_{0}} - a_{i} = h > 0\), we can rewrite (4.203) in the form

$$\displaystyle{ \left \vert \left ( \frac{\gamma } {y} - \frac{\gamma } {v - y}\right ) - h\right \vert < 2^{j+1}\eta _{ 1}. }$$
(4.204)

Now we basically repeat the argument of Case 16. But, just like Case 17 is a simpler version of Case 15, Case 18 is a simpler version of Case 16. Case 18 is similar to Case 17 in the technical sense that the two critical functions

$$\displaystyle{ f_{3}(y) = \frac{\gamma } {y} + \frac{\gamma } {y + h}\quad \mbox{ and}\quad f_{4}(y) = \frac{\gamma } {y} - \frac{\gamma } {v - y} }$$
(4.205)

are in synchrony, in the sense that each is a sum of two parts that increase or decrease together as y varies.

As in Case 16, we switch the roles of the horizontal and the vertical, and focus on the reciprocal of the slope. We know that the reciprocal of the slope of the upper arc of \(\mathcal{C}\cap (P_{i_{0}} - H_{\gamma }(N))\) satisfies the inequality (4.156). We know also that if η 2, and so also η 1, is a small constant, then the upper arc of \(P_{i_{0}} - H_{\gamma }(N)\) intersects a large number of j-cells different from \(\mathcal{C}\) such that the reciprocal of the slope is still almost equal to 4j.

Returning to (4.156) and (4.204), and then substituting y by \(y +\ell 2^{-j}\eta _{2}\), we have the respective inequalities (4.157) and

$$\displaystyle{ \left \vert \frac{\gamma } {y +\ell 2^{-j}\eta _{2}} - \frac{\gamma } {v - (y +\ell 2^{-j}\eta _{2})} - h\right \vert < 2^{j+1}\eta _{ 1}. }$$
(4.206)

If (4.156) holds, then there are at least \(\sqrt{\gamma }/10\eta _{2}\) consecutive integer solutions of (4.157).

The basic idea is the same as in Case 16. If runs through these integer solutions of (4.157) while γ, x, h and v remain fixed, then the function

$$\displaystyle{ \frac{\gamma } {y +\ell 2^{-j}\eta _{2}} - \frac{\gamma } {v - (y +\ell 2^{-j}\eta _{2})}, }$$
(4.207)

as a function of , has substantially different values, and we expect only very few of them to be very close to a fixed h in the quantitative sense of (4.157). Of course, here we assume that η 1 is small.

Next we work out the details of this intuition. We begin by noting that (4.157) implies (4.160). Since the functions f 3(y) and f 4(y) given by (4.205) are in synchrony, we can basically repeat the argument of (4.187), (4.193) and (4.194) in Case 3, and conclude that if we switch to + 1 in the function (4.207), then its value changes by at least as much as

$$\displaystyle{\eta _{2}2^{j-2},}$$

an analog of (4.171) and (4.193). Thus we deduce that

$$\displaystyle{ \mbox{ at most $10$ values of $\ell$ will satisfy both (4.157) and (4.206)}. }$$
(4.208)

As in Cases 1517, the next step is

A Combination of the Rectangle Property and the Pigeonhole Principle. In this case, since \(v - (y +\ell 2^{-j}\eta _{2}) > y +\ell 2^{-j}\eta _{2}\), we have

$$\displaystyle{ v > 2(y +\ell 2^{-j}\eta _{ 2}). }$$
(4.209)

In view of (4.160), we can assume that

$$\displaystyle{v > \sqrt{\frac{6\gamma } {7}}2^{-j+1}.}$$

Consider the power-of-two type decomposition

$$\displaystyle{ 2^{r-1}\sqrt{\frac{6\gamma } {7}}2^{-j+2} < v \leq 2^{r}\sqrt{\frac{6\gamma } {7}}2^{-j+2},\quad r = 0,1,2,\ldots. }$$
(4.210)

We claim that for a fixed point \(P_{i_{0}} = (a_{i_{0}},b_{i_{0}}) \in \mathcal{P}\) and for a fixed integer r ≥ 0, there are at most

$$\displaystyle{ \frac{100\gamma 2^{r}} {c_{1}} }$$
(4.211)

other points \(P_{i} = (a_{i},b_{i}) \in \mathcal{P}\), with \(P_{i}\neq P_{i_{0}}\), such that \(h = h_{i} = a_{i_{0}} - a_{i} > 0\) and \(v = v_{i} = b_{i} - b_{i_{0}} > 0\) satisfy (4.206), thus implicitly (4.157) also, and (4.210).

To establish the bound (4.211), first note that if v = v i satisfies (4.210), then by (4.160), (4.206) and (4.209), and assuming that

$$\displaystyle{ \eta _{1} < \frac{\sqrt{\gamma }} {4}, }$$
(4.212)

we have

$$\displaystyle{ h = h_{i} < \frac{\gamma } {y +\ell 2^{-j}\eta _{2}} + 2^{j+1}\eta _{ 1} \leq \frac{\gamma } {2^{-j}\sqrt{6\gamma /7}} + 2^{j+1}\eta _{ 1} \leq 2\sqrt{\gamma }2^{j}. }$$
(4.213)

Suppose, contrary to the bound (4.211), that there are more than \(100\gamma 2^{r}/c_{1}\) other points \(P_{i} = (a_{i},b_{i}) \in \mathcal{P}\), with \(P_{i}\neq P_{i_{0}}\), such that \(h = h_{i} = a_{i_{0}} - a_{i} > 0\) and \(v = v_{i} = b_{i} - b_{i_{0}} > 0\) satisfy (4.206), thus implicitly (4.157) also, and (4.210). Then by the Pigeonhole Principle and (4.213), there must exist two points \(P_{i_{1}},P_{i_{2}} \in \mathcal{P}\), with \(i_{1}\neq i_{2}\), such that

$$\displaystyle{\max \{h_{i_{1}},h_{i_{2}}\} \leq 2\sqrt{\gamma }2^{j}\quad \mbox{ and}\quad \vert v_{ i_{1}} - v_{i_{2}}\vert \leq \frac{2^{r}\sqrt{6\gamma /7}2^{-j+2}} {100\gamma 2^{r}/c_{1}} = \frac{c_{1}\sqrt{6/7}} {25\sqrt{\gamma }} 2^{-j}.}$$

Since the product

$$\displaystyle{\sqrt{\gamma }2^{j} \cdot \frac{c_{1}\sqrt{6/7}} {\sqrt{\gamma }} 2^{-j} = \sqrt{\frac{6} {7}}c_{1} < c_{1},}$$

we conclude that there exists an axes-parallel rectangle of area less than c 1 and which contains at least two points of \(\mathcal{P}\), namely \(P_{i_{1}}\) and \(P_{i_{2}}\). This contradicts the rectangle property, and establishes the bound (4.211).

If v = v i falls into the interval (4.210), then

$$\displaystyle{ \frac{1} {\mathrm{slope}(\mathcal{C}\cap (P_{i} - H_{\gamma }(N)))} = \frac{\gamma } {(y + v)^{2}} \leq \frac{\gamma } {v^{2}} \approx \frac{1} {4^{r}} \cdot 4^{j}, }$$
(4.214)

where 4j almost equals the reciprocal of the slope of the diagonals of the j-cell \(\mathcal{C}\). By (4.214), we have

$$\displaystyle{ \frac{1} {\mathrm{area}(\mathcal{C})}\left \vert \int _{\mathcal{C}\cap (P_{i}-H_{\gamma }(N))}R_{j}(\mathbf{x})\,\mathrm{d}\mathbf{x}\right \vert \leq \frac{10} {4^{r}}. }$$
(4.215)

Furthermore, (4.215) holds for all j-cells \(\mathcal{C}\) satisfying (4.151). Let us return now to (4.126). Combining (4.208), (4.210), (4.211) and (4.215) we have

$$\displaystyle\begin{array}{rcl} & & \mathop{\mathop{\sum _{P_{i}\in \mathcal{P}}}_{i\neq i_{0}}} _{\mathrm{Case\ 4}}\mathop{ \sum _{\mathcal{C}}}_{(4.151)} \frac{1} {\mathrm{area}(\mathcal{C})}\left \vert \int _{\mathcal{C}\cap (P_{i}-H_{\gamma }(N))}R_{j}(\mathbf{x})\,\mathrm{d}\mathbf{x}\right \vert \leq \sum _{r\geq 0}10 \cdot 100 \frac{\gamma } {c_{1}}2^{r} \cdot \frac{10} {4^{r}} \\ & & \quad = 10000 \frac{\gamma } {c_{1}}\sum _{r\geq 0}2^{-r} = 20000 \frac{\gamma } {c_{1}}. {}\end{array}$$
(4.216)

This completes Case 18.

8 Completing the Proof of Theorem 12

In this section, we shall finally complete the proof of Proposition 13. Let us return to (4.125) and (4.126). We are now ready to clarify the technical details of the single term domination.

Let \(P_{i_{0}} \in \mathcal{P}\) and \(j \in \mathcal{J}\) be arbitrary.

Property 19.

The slope γx 2 of the hyperbolic needle \(P_{i_{0}} - H_{\gamma }(N)\) satisfies

$$\displaystyle{ \frac{5} {6}4^{-j} \leq \frac{\gamma } {x^{2}} \leq \frac{7} {6}4^{-j}. }$$
(4.217)

Note that (4.217) holds if and only if

$$\displaystyle{\sqrt{\frac{6\gamma } {7}}2^{j} \leq x \leq \sqrt{\frac{6\gamma } {5}}2^{j},}$$

and this is an interval of length greater than \(\sqrt{\gamma }2^{j}/6\). Since a j-cell \(\mathcal{C}\) has horizontal side η 12j, there are more than

$$\displaystyle{\frac{\sqrt{\gamma }2^{j}/6} {\eta _{1}2^{j}} = \frac{\sqrt{\gamma }} {6\eta _{1}}}$$

j-cells \(\mathcal{C}\) with the slope of the intersection \(\mathcal{C}\cap (P_{i_{0}} - H_{\gamma }(N))\) satisfying Property 19.

It would be not too difficult to prove directly, by using some familiar arguments from uniform distribution, that among these more than \(\sqrt{\gamma }/6\eta _{1}\) j-cells \(\mathcal{C}\), at least 1 % has the following additional property.

Property 20.

The hyperbolic needle \(P_{i_{0}} - H_{\gamma }(N)\) intersects only the lower right subrectangle of \(\mathcal{C}\), and the intersection is a large triangle, meaning that the area is at least \(\frac{1} {32}\) the area of \(\mathcal{C}\), i.e. the area is at least \(\eta _{1}\eta _{2}/32\).

It is technically simpler, however, to force Property 20 in an indirect way, by using the trick of short vertical translations; see Fig. 4.11. This geometric trick was already mentioned at the end of Sect. 4.5.

Fig. 4.11
figure 11

Short vertical translations

More precisely, for every real number t 0 satisfying 0 < t 0 < 1, consider all j-cells \(\mathcal{C}\) such that, with B = [0, MN] × [γ, Mγ], we have

$$\displaystyle{ \mathcal{C}\cap (P_{i_{0}} + (0,t_{0}) - H_{\gamma }(N)) \subset B }$$
(4.218)

and

$$\displaystyle{ \frac{5} {6}4^{-j} \leq \mathrm{ slope}(\mathcal{C}\cap (P_{ i_{0}} + (0,t_{0}) - H_{\gamma }(N))) \leq \frac{7} {6}4^{-j}. }$$
(4.219)

Simple geometric consideration shows that for, say, at least 5 % of the pairs \((t_{0},\mathcal{C})\), where \(\mathcal{C}\) satisfies (4.218) and (4.219), \(\mathcal{C}\cap (P_{i_{0}} + (0,t_{0}) - H_{\gamma }(N))\) also satisfies Property 20, i.e. \(P_{i_{0}} + (0,t_{0}) - H_{\gamma }(N)\) intersects only the lower right subrectangle of \(\mathcal{C}\), and the intersection is a large triangle of area at least \(\eta _{1}\eta _{2}/32\).

For the proof of the positive direction (4.89), we choose the pattern + − in every j-cell \(\mathcal{C}\) satisfying (4.218) and (4.219). Naturally, we choose the opposite pattern − + for the negative direction (4.90). Then

$$\displaystyle{ \int _{\mathcal{C}\cap (P_{i_{ 0}}+(0,t_{0})-H_{\gamma }(N))}R_{j}(\mathbf{x})\,\mathrm{d}\mathbf{x} \geq \frac{\eta _{1}\eta _{2}} {32}. }$$
(4.220)

Finally, if the j-cell \(\mathcal{C}\) does not satisfy both (4.218) and (4.219), then we choose the pattern 0. Therefore, by (4.220) and summarizing Cases 1–4, we have

$$\displaystyle\begin{array}{rcl} & & \int _{0}^{1}\left (\sum _{ j\in \mathcal{J}}\sum _{P_{i_{0}}\in \mathcal{P}}\int _{P_{i_{0}}+(0,t_{0})-H_{\gamma }(N)}R_{j}(\mathbf{x})\,\mathrm{d}\mathbf{x}\right )\mathrm{d}t_{0} \\ & & \quad \geq \sum _{j\in \mathcal{J}}\mathop{\sum _{P_{i_{ 0}}\in \mathcal{P}}}_{(4.222)}\left ( \frac{1} {20} \cdot \frac{\sqrt{\gamma }} {6\eta _{1}} \cdot \frac{\eta _{1}\eta _{2}} {32} \right. \\ & & \quad \quad \left.-\mathop{\sum _{P_{i}\in \mathcal{P}}}_{i\neq i_{0}}\mathop{ \sum _{\mathcal{C}}}_{(4.219)}\int _{0}^{1}\left \vert \int _{ \mathcal{C}\cap (P_{i}+(0,t_{0})-H_{\gamma }(N))}R_{j}(\mathbf{x})\,\mathrm{d}\mathbf{x}\right \vert \mathrm{d}t_{0}\right ) \\ & & \quad \geq \sum _{j\in \mathcal{J}}\mathop{\sum _{P_{i_{ 0}}\in \mathcal{P}}}_{(4.222)}\Bigg( \frac{\sqrt{\gamma }\eta _{2}} {3840} -\eta _{1}\eta _{2}\Bigg(4000\left (\left ( \frac{\gamma } {c_{1}}\right )^{3/2} + \left ( \frac{\gamma } {c_{1}}\right )^{2}\right ) \\ & & \quad \quad + 32000\left ( \frac{\gamma } {c_{1}}\right )^{2} + 20000 \frac{\gamma } {c_{1}}\Bigg)\Bigg), {}\end{array}$$
(4.221)

where the summation over \(P_{i_{0}} \in \mathcal{P}\) is under the restriction

$$\displaystyle{ P_{i_{0}} + (0,t_{0}) - H_{\gamma }(N) \subset B\quad \mbox{ for all $t_{0}$ satisfying $0 < t_{0} < 1$}, }$$
(4.222)

the summation over \(\mathcal{C}\) is under the restriction (4.219), the summation over \(P_{i} \in \mathcal{P}\) with ii 0 encompass Cases 1518, and finally the factor \(\frac{1} {20}\) comes from the 5 % mentioned earlier. Furthermore, we have used in the last step the inequalities (4.152), (4.183), (4.202) and (4.216) for every t 0 satisfying 0 < t 0 < 1.

In our discussion in Sects. 4.6 and 4.7, we have made some assumptions on η 1 and η 2. Corresponding to Cases 1518, we have assumed respectively that

$$\displaystyle\begin{array}{rcl} & & \eta _{2} <\min \left \{\frac{\sqrt{c_{1}}} {2}, \frac{1} {8(1/\sqrt{\gamma } + 1/\sqrt{c_{1}})}\right \}, {}\\ & & \eta _{1} <\min \left \{ \frac{c_{1}} {2\sqrt{\gamma }}, \frac{\sqrt{\gamma }} {8}, \frac{\sqrt{c_{1}}} {2}, \frac{1} {8(1/\sqrt{\gamma } + 1/\sqrt{c_{1}})}\right \}, {}\\ & & \eta _{2} <\min \left \{\frac{\sqrt{\gamma }} {2}, \frac{\sqrt{\gamma }} {100}\right \}, {}\\ & & \eta _{1} < \frac{\sqrt{\gamma }} {4}; {}\\ \end{array}$$

see (4.137), (4.147), (4.164), (4.166), (4.169), (4.179), (4.190), (4.198) and (4.212). Since

$$\displaystyle{ \frac{1} {1/\sqrt{\gamma } + 1/\sqrt{c_{1}}} \geq \frac{\sqrt{\gamma } + \sqrt{c_{1}}} {2},}$$

we can guarantee all of the above requirements on η 1 and η 2 by imposing the single inequality

$$\displaystyle{ \max \{\eta _{1},\eta _{2}\} <\min \left \{ \frac{\sqrt{\gamma }} {100}, \frac{\sqrt{c_{1}}} {8}, \frac{c_{1}} {2\sqrt{\gamma }}\right \}. }$$
(4.223)

Let us return to (4.221). We have

$$\displaystyle\begin{array}{rcl} & & \frac{\sqrt{\gamma }\eta _{2}} {3840} -\eta _{1}\eta _{2}\Bigg(4000\left (\left ( \frac{\gamma } {c_{1}}\right )^{3/2} + \left ( \frac{\gamma } {c_{1}}\right )^{2}\right ) + 32000\left ( \frac{\gamma } {c_{1}}\right )^{2} + 20000 \frac{\gamma } {c_{1}}\Bigg) \\ & & \quad \geq \frac{\sqrt{\gamma }\eta _{2}} {7680}, {}\end{array}$$
(4.224)

assuming that (4.223) holds and η 1 satisfies the additional inequality

$$\displaystyle{ \frac{1} {\eta _{1}} \geq \frac{10^{8}} {\sqrt{\gamma }} \left (\left ( \frac{\gamma } {c_{1}}\right ) + \left ( \frac{\gamma } {c_{1}}\right )^{2}\right ). }$$
(4.225)

Since η 1 and η 2 are almost equal, in view of (4.100), we can satisfy both (4.223) and (4.225) by the choice

$$\displaystyle{ \eta _{1} \approx \eta _{2} =\min \left \{ \frac{\sqrt{\gamma }} {200}, \frac{\sqrt{c_{1}}} {10}, \frac{10^{-8}c_{1}} {2\sqrt{\gamma }}, \frac{10^{-8}c_{1}^{2}} {2\gamma ^{3/2}} \right \}. }$$
(4.226)

Substituting (4.226) in (4.224) and then returning to (4.221), we have

$$\displaystyle{\int _{0}^{1}\left (\sum _{ j\in \mathcal{J}}\sum _{P_{i_{0}}\in \mathcal{P}}\int _{P_{i_{0}}+(0,t_{0})-H_{\gamma }(N)}R_{j}(\mathbf{x})\,\mathrm{d}\mathbf{x}\right )\mathrm{d}t_{0} \geq \sum _{j\in \mathcal{J}}\mathop{\sum _{P_{i_{0}}\in \mathcal{P}}}_{(4.222)} \frac{\sqrt{\gamma }\eta _{2}} {7680},}$$

where i 0 is now a dummy variable. Clearly there exists t 0, satisfying 0 < t 0 < 1, such that

$$\displaystyle{ \sum _{j\in \mathcal{J}}\sum _{P_{i}\in \mathcal{P}}\int _{P_{i}+(0,t_{0})-H_{\gamma }(N)}R_{j}(\mathbf{x})\,\mathrm{d}\mathbf{x} \geq \sum _{j\in \mathcal{J}}\mathop{\sum _{P_{i}\in \mathcal{P}}}_{(4.228)} \frac{\sqrt{\gamma }\eta _{2}} {7680}. }$$
(4.227)

Note that in (4.227), we have substituted the dummy variable i 0 by i, together with a corresponding summation restriction

$$\displaystyle{ P_{i} + (0,t_{0}) - H_{\gamma }(N) \subset B\quad \mbox{ for all $t_{0}$ satisfying $0 < t_{0} < 1$}. }$$
(4.228)

Next we return to (4.118), and replace the point set \(\mathcal{P}\) by the translated point set \(\mathcal{P} + (0,t_{0})\). Then Lemma 14 gives

$$\displaystyle\begin{array}{rcl} & & \frac{1} {(M - N)(M - 2\gamma )}\int _{0}^{M-N}\int _{ \gamma }^{M-\gamma }\varDelta (\mathbf{x})T(\mathbf{x})\,\mathrm{d}\mathbf{x} \\ & & \ \ =\sum _{P_{i}\in \mathcal{P}}\frac{\mathrm{area}(B \cap (P_{i} + (0,t_{0}) - H_{\gamma }(N)))} {(M - N)(M - 2\gamma )} -\delta \cdot \mathrm{area}(H_{\gamma }(N)) \\ & & \quad +\rho \sum _{j\in \mathcal{J}}\sum _{P_{i}\in \mathcal{P}} \frac{1} {(M - N)(M - 2\gamma )}\int _{P_{i}+(0,t_{0})-H_{\gamma }(N)}R_{j}(\mathbf{x})\,\mathrm{d}\mathbf{x} + E_{1},\quad \quad \quad {}\end{array}$$
(4.229)

where the error E 1 satisfies

$$\displaystyle{ \vert E_{1}\vert \leq \frac{\vert \mathcal{P}\vert } {(M - N)(M - 2\gamma )} \cdot 8\sqrt{\gamma }(\eta _{1} +\eta _{2}) \cdot \frac{(n + 1)\rho ^{2}} {\sqrt{2} - 1-\rho }. }$$
(4.230)

Recall that \(\mathcal{P}\) is a finite subset of the square [0, M]2 with cardinality \(\vert \mathcal{P}\vert =\delta M^{2}\). Since 0 < t 0 < 1, the rectangle property implies, via elementary calculations, that the condition

$$\displaystyle{ P_{i} + (0,t_{0}) - H_{\gamma }(N) \subset B = [0,M - N] \times [\gamma,M-\gamma ] }$$
(4.231)

holds for all but at most

$$\displaystyle{ \frac{(2N + 4\gamma + 1)M} {c_{1}} }$$
(4.232)

points \(P_{i} \in \mathcal{P}\). Thus

$$\displaystyle\begin{array}{rcl} & & \sum _{P_{i}\in \mathcal{P}}\frac{\mathrm{area}(B \cap (P_{i} + (0,t_{0}) - H_{\gamma }(N)))} {(M - N)(M - 2\gamma )} -\delta \cdot \mathrm{area}(H_{\gamma }(N)) {}\\ & & \quad = \frac{\delta M^{2} +\theta c_{1}^{-1}(2N + 4\gamma + 1)M} {(M - N)(M - 2\gamma )} \cdot \mathrm{ area}(H_{\gamma }(N)) -\delta \cdot \mathrm{area}(H_{\gamma }(N)) {}\\ & & \quad = \left ( \frac{M^{2}} {(M - N)(M - 2\gamma )} - 1\right )\delta \cdot \mathrm{ area}(H_{\gamma }(N)) {}\\ & & \quad \quad +\theta \frac{c_{1}^{-1}(2N + 4\gamma + 1)M} {(M - N)(M - 2\gamma )} \cdot \mathrm{ area}(H_{\gamma }(N)), {}\\ \end{array}$$

with some constant θ satisfying − 1 ≤ θ ≤ 1. Since \(\mathrm{area}(H_{\gamma }(N)) = 2\gamma \log N\), it then follows that

$$\displaystyle\begin{array}{rcl} & & \left \vert \sum _{P_{i}\in \mathcal{P}}\frac{\mathrm{area}(B \cap (P_{i} + (0,t_{0}) - H_{\gamma }(N)))} {(M - N)(M - 2\gamma )} -\delta \cdot \mathrm{area}(H_{\gamma }(N))\right \vert \\ & & \quad \leq \frac{3N + 6\gamma + 1} {(M - N)(M - 2\gamma )} \cdot 2\gamma \log N. {}\end{array}$$
(4.233)

Combining (4.229), (4.230) and (4.233), we deduce that

$$\displaystyle\begin{array}{rcl} & & \! \frac{1} {(M - N)(M - 2\gamma )}\int _{0}^{M-N}\int _{ \gamma }^{M-\gamma }\varDelta (\mathbf{x})T(\mathbf{x})\,\mathrm{d}\mathbf{x} \\ & & \ =\rho \sum _{j\in \mathcal{J}}\sum _{P_{i}\in \mathcal{P}} \frac{1} {(M - N)(M - 2\gamma )}\int _{P_{i}+(0,t_{0})-H_{\gamma }(N)}R_{j}(\mathbf{x})\,\mathrm{d}\mathbf{x}+E_{2},\quad \quad {}\end{array}$$
(4.234)

where the error E 2 satisfies

$$\displaystyle\begin{array}{rcl} \vert E_{2}\vert & \leq & \frac{\vert \mathcal{P}\vert } {(M - N)(M - 2\gamma )} \cdot 8\sqrt{\gamma }(\eta _{1} +\eta _{2}) \cdot \frac{(n + 1)\rho ^{2}} {\sqrt{2} - 1-\rho } \\ & &\quad + \frac{3N + 6\gamma + 1} {(M - N)(M - 2\gamma )} \cdot 2\gamma \log N. {}\end{array}$$
(4.235)

Combining (4.227), (4.234) and (4.235), we then conclude that

$$\displaystyle\begin{array}{rcl} & & \frac{1} {(M - N)(M - 2\gamma )}\int _{0}^{M-N}\int _{ \gamma }^{M-\gamma }\varDelta (\mathbf{x})T(\mathbf{x})\,\mathrm{d}\mathbf{x} \\ & & \quad \geq \rho \sum _{j\in \mathcal{J}} \frac{1} {(M - N)(M - 2\gamma )}\mathop{\sum _{P_{i}\in \mathcal{P}}}_{(4.228)} \frac{\sqrt{\gamma }\eta _{2}} {7680} \\ & & \quad \quad - \frac{\vert \mathcal{P}\vert } {(M - N)(M - 2\gamma )} \cdot 8\sqrt{\gamma }(\eta _{1} +\eta _{2}) \cdot \frac{(n + 1)\rho ^{2}} {\sqrt{2} - 1-\rho } \\ & &\quad \quad - \frac{3N + 6\gamma + 1} {(M - N)(M - 2\gamma )} \cdot 2\gamma \log N. {}\end{array}$$
(4.236)

Recall that \(\mathcal{J}\) is an interval of integers satisfying (4.114), so that

$$\displaystyle{\vert \mathcal{J}\vert \geq (n + 1) -\log _{2}\left (\gamma +\frac{1} {\gamma } \right ).}$$

On the other hand, it follows from (4.231) and (4.232) that

$$\displaystyle{\mathop{\sum _{P_{i}\in \mathcal{P}}}_{(4.228)}1 \geq \delta M^{2} -\frac{(2N + 4\gamma + 1)M} {c_{1}}.}$$

Thus

$$\displaystyle\begin{array}{rcl} & & \sum _{j\in \mathcal{J}} \frac{1} {(M - N)(M - 2\gamma )}\mathop{\sum _{P_{i}\in \mathcal{P}}}_{(4.228)}1 \\ & & \quad \geq \left ((n + 1) -\log _{2}\left (\gamma +\frac{1} {\gamma } \right )\right )\left (\delta -\frac{2N + 4\gamma + 1} {c_{1}M} \right ).{}\end{array}$$
(4.237)

Let us now return to (4.236). If ρ is small, then ρ 2n is negligible compared to ρ. Let ρ = 10−6, say. Substituting this and the estimate (4.237) into (4.236), and assuming that N and MN are both large, we deduce that

$$\displaystyle\begin{array}{rcl} & & \frac{1} {\mathrm{area}(B)}\int _{B}\varDelta (\mathbf{x})T(\mathbf{x})\,\mathrm{d}\mathbf{x} {}\\ & & \quad \geq \rho \left ((n + 1) -\log _{2}\left (\gamma +\frac{1} {\gamma } \right )\right )\left (\delta -\frac{2N + 4\gamma + 1} {c_{1}M} \right ) \frac{\sqrt{\gamma }\eta _{2}} {10^{4}}. {}\\ \end{array}$$

More precisely, the assumptions on N and M are given by (4.84) and (4.85), and the choice for n is made precise by

$$\displaystyle{\frac{N} {2} < 2^{n} \leq N.}$$

These choices, together with the definition (4.226) for η 2, ensure that

$$\displaystyle{ \frac{1} {\mathrm{area}(B)}\int _{B}\varDelta (\mathbf{x})T(\mathbf{x})\,\mathrm{d}\mathbf{x} \geq \delta ^{{\prime}}\log N, }$$
(4.238)

where \(\delta ^{{\prime}} =\delta ^{{\prime}}(c_{1},\gamma,\delta ) > 0\) is a positive constant independent of N and M, and defined by (4.83) and (4.84).

It now follows from (4.238) that there exists a translated copy \(\mathbf{x}_{1} + H_{\gamma }(N)\) of the hyperbolic needle H γ (N) such that \(\mathbf{x}_{1} + H_{\gamma }(N) \subset [0,M]^{2}\) and

$$\displaystyle{\vert \mathcal{P}\cap (\mathbf{x}_{1} + H_{\gamma }(N))\vert \geq 2\delta \gamma \log N +\delta ^{{\prime}}\log N.}$$

This establishes the inequality (4.89). The proof of the other inequality (4.90) is the same, except that we replace the pattern + − by the opposite pattern − +.

Thus the long proof of Proposition 13 is complete. This also completes the proof of Theorem 12.

9 Yet Another Generalization of Theorem 3

Let α > 0, 0 ≤ β < 1 and γ > 0 be arbitrary but fixed real numbers, and let f(α; β; γ; N) denote the number of integral solutions of the diophantine inequalityFootnote 14

$$\displaystyle{\Vert n\alpha -\beta \Vert < \frac{\gamma } {n},\quad 1 \leq n \leq N.}$$

This inequality motivates the hyperbolic region

$$\displaystyle{\vert y-\beta \vert < \frac{\gamma } {x},\quad 1 \leq x \leq N,}$$

which has area 2γlogN.

Let us return to the special case \(\alpha = \sqrt{2}\). Combining Lemmas 1 and 2, we have

$$\displaystyle{ \int _{0}^{1}f(\sqrt{2};\beta;\gamma;N)\,\mathrm{d}\beta = 2\gamma \log N + O(1), }$$
(4.239)

and for an arbitrary subinterval [a, b] with 0 ≤ a < b ≤ 1, we have the limit formula

$$\displaystyle{ \lim _{N\rightarrow \infty }\frac{ \frac{1} {b-a}\int _{a}^{b}f(\sqrt{2};\beta;\gamma;N)\,\mathrm{d}\beta } {\log N} = 2\gamma. }$$
(4.240)

There is a straightforward generalization of (4.239) and (4.240) for arbitrary α > 0, and the proof is the same. We have

$$\displaystyle{ \int _{0}^{1}f(\alpha;\beta;\gamma;N)\,\mathrm{d}\beta = 2\gamma \log N + O(1), }$$
(4.241)

and for an arbitrary subinterval [a, b] with 0 ≤ a < b ≤ 1, we have the limit formula

$$\displaystyle{ \lim _{N\rightarrow \infty }\frac{ \frac{1} {b-a}\int _{a}^{b}f(\alpha;\beta;\gamma;N)\,\mathrm{d}\beta } {\log N} = 2\gamma. }$$
(4.242)

The formulas (4.239)–(4.242) express the almost trivial geometric fact that the average number of lattice points contained in all the translated copies of a given region equals the area of the region; see Lemma 5. It is natural, therefore, to study the limit

$$\displaystyle{ \lim _{N\rightarrow \infty }\frac{f(\alpha;\beta;\gamma;N)} {2\gamma \log N}. }$$
(4.243)

The case of rational α in (4.243) is trivial. Indeed, if N → , then the function f(α; β; γ; N) remains bounded for all but a finite number of values of β = β(α) in the unit interval. When f(α; β; γ; N) tends to infinity, it behaves like a linear function c 27 N, which is much faster than the logarithmic function logN.

If α is irrational, then we have the following non-trivial result, which can be considered a far-reaching generalization of Theorem 3.

Theorem 21.

Let α > 0 be an arbitrary irrational, and let γ > 0 be an arbitrary real number. There are continuum many divergence points \(\beta ^{{\ast}} =\beta ^{{\ast}}(\alpha,\gamma ) \in [0,1)\) such that

$$\displaystyle{\limsup _{n\rightarrow \infty }\frac{f(\alpha;\beta ^{{\ast}};\gamma;n)} {\log n} >\liminf _{n\rightarrow \infty }\frac{f(\alpha;\beta ^{{\ast}};\gamma;n)} {\log n}.}$$

To prove Theorem 21, we can clearly assume that 0 < α < 1. We need the continued fractions

$$\displaystyle{\alpha = \frac{1} {a_{1}+} \frac{1} {a_{2}+} \frac{1} {a_{3}+}\ldots = [a_{1},a_{2},a_{3},\ldots ].}$$

For irrational α, the digits \(a_{1},a_{2},a_{3},\ldots\) form an infinite sequence, with a i  ≥ 1 for all i ≥ 1. For k ≥ 2, the fractions

$$\displaystyle{\frac{p_{k}} {q_{k}} = [a_{1},\ldots,a_{k}]}$$

are known as the convergents to α. It is well known that p k , q k are generated by the recurrence relations

$$\displaystyle{ p_{k} = a_{k}p_{k-1} + p_{k-2},\quad q_{k} = a_{k}q_{k-1} + q_{k-2}, }$$
(4.244)

with the convention that p 0 = 0, q 0 = 1, p 1 = 1 and q 1 = a 1.

Another well-known fact about the convergents is the inequality

$$\displaystyle{\left \vert \alpha -\frac{p_{k-1}} {q_{k-1}}\right \vert \leq \frac{1} {q_{k-1}q_{k}},}$$

which clearly implies

$$\displaystyle{ \Vert q_{k-1}\alpha \Vert < \frac{1} {a_{k}q_{k-1}}. }$$
(4.245)

Write n = ℓ q k−1. Then by (4.245), we have

$$\displaystyle{\Vert n\alpha \Vert =\Vert \ell q_{k-1}\alpha \Vert < \frac{\ell} {a_{k}q_{k-1}} = \frac{\ell^{2}} {a_{k}\ell q_{k-1}} = \frac{\ell^{2}} {a_{k}n},}$$

and so \(\Vert n\alpha \Vert <\gamma /n\) holds whenever \(\ell^{2}/a_{k} \leq \gamma\), i.e. whenever

$$\displaystyle{ 1 \leq \ell\leq \sqrt{\gamma a_{k}}. }$$
(4.246)

Now let

$$\displaystyle{ N_{k} = \lfloor \sqrt{\gamma a_{k}}\rfloor q_{k-1}, }$$
(4.247)

where ⌊z⌋ denotes the lower integral part of a real number z. It then follows from (4.246) that the homogeneous diophantine inequality \(\Vert n\alpha \Vert <\gamma /n\) has at least

$$\displaystyle{\sum _{i=1}^{k}\lfloor \sqrt{\gamma a_{ i}}\rfloor }$$

integer solutions n satisfying 1 ≤ n ≤ N k . Formally, we therefore have

$$\displaystyle{ f(\alpha;\beta = 0;\gamma;N_{k}) \geq \sum _{i=1}^{k}\lfloor \sqrt{\gamma a_{ i}}\rfloor. }$$
(4.248)

We distinguish two cases, and start with the much harder one.

Case 22.

For all sufficiently large values of k, we have

$$\displaystyle{ \sum _{i=1}^{k}\lfloor \sqrt{\gamma a_{ i}}\rfloor \leq 100 \cdot 2\gamma \log N_{k}. }$$
(4.249)

We proceed in four steps. Step 1. The crucial first step in the argument is to show that the condition (4.249) implies the exponential upper bound

$$\displaystyle{ \prod _{i=1}^{k}(a_{ i} + 1) \leq \mathrm{ e}^{c^{{\prime}}k } }$$
(4.250)

for all sufficiently large values of k, where \(c^{{\prime}} = c^{{\prime}}(\gamma )\) is a finite constant independent of k.

To derive (4.250), we use the well-known principle that the exponential functions grow faster than polynomials, in the form of an elementary inequality as follows.

Lemma 23.

For any fixed positive c > 0, the inequality

$$\displaystyle{(x + 1)^{c} \leq (8c^{2}\mathrm{e}^{-2})^{c}\mathrm{e}^{\sqrt{x}}}$$

holds for every x ≥ 1.

Proof.

We start with the trivial observation that x + 1 ≤ 2x for all x ≥ 1, which leads us to the function \(g(x) = (2x)^{c}e^{-\sqrt{x}}\), which we wish to maximize. It is easy to compute the derivative of g(x) and show that its value is maximized when x = 4c 2. The desired inequality follows from \((x + 1)^{c}\mathrm{e}^{-\sqrt{x}} \leq g(x) \leq g(4c^{2})\). □ 

By repeated application of (4.244), we have

$$\displaystyle\begin{array}{rcl} q_{k-1}& =& a_{k-1}q_{k-2} + q_{k-3} \leq (a_{k-1} + 1)q_{k-2} \\ & \leq & (a_{k-1} + 1)(a_{k-2} + 1)q_{k-3} \leq \ldots \leq \prod _{i=1}^{k-1}(a_{ i} + 1).{}\end{array}$$
(4.251)

Combining this with (4.247) and (4.249), we have

$$\displaystyle\begin{array}{rcl} \sum _{i=1}^{k}(\sqrt{\gamma a_{ i}} - 1)& \leq & 100 \cdot 2\gamma \left (\log \sqrt{\gamma } +\log \sqrt{a_{k}} +\log \prod _{ i=1}^{k-1}(a_{ i} + 1)\right ) \\ & \leq & 200\gamma \left (\log \sqrt{\gamma } +\log \prod _{ i=1}^{k}(a_{ i} + 1)\right ). {}\end{array}$$
(4.252)

Applying the exponential function, the inequality (4.252) becomes

$$\displaystyle{ \prod _{i=1}^{k}\mathrm{e}^{\sqrt{\gamma a_{i}} -1} \leq \gamma ^{100\gamma }\prod _{ i=1}^{k}(a_{ i} + 1)^{200\gamma }, }$$
(4.253)

and this inequality holds for all sufficiently large k, i.e. for all k ≥ k 0.

Applying Lemma 23 with \(c = 400\sqrt{\gamma }\) and x = a i for each i = 1, 2, , k + 1, and then multiplying these inequalities together, we obtain

$$\displaystyle{\prod _{i=1}^{k}(a_{ i} + 1)^{400\sqrt{\gamma }}\leq (800\sqrt{\gamma })^{800\sqrt{\gamma }k}\prod _{ i=1}^{k}\mathrm{e}^{\sqrt{a_{i}} }.}$$

Raising this to the \(\sqrt{\gamma }\)-th power, we have

$$\displaystyle{ \prod _{i=1}^{k}(a_{ i} + 1)^{400\gamma } \leq (800\gamma )^{800\gamma k}\prod _{ i=1}^{k}\mathrm{e}^{\sqrt{\gamma a_{i}} }. }$$
(4.254)

We next combine (4.253) and (4.254) to obtain

$$\displaystyle{\prod _{i=1}^{k}(a_{ i} + 1)^{400\gamma } \leq (800\gamma )^{800\gamma k}\mathrm{e}^{k}\gamma ^{100\gamma }\prod _{ i=1}^{k}(a_{ i} + 1)^{200\gamma },}$$

which, on removing a common factor and then taking 200γ-th root, becomes

$$\displaystyle{ \prod _{i=1}^{k}(a_{ i} + 1) \leq (800\gamma )^{4k}\mathrm{e}^{k/200\gamma }\sqrt{\gamma } = \sqrt{\gamma }((800\gamma )^{4}\mathrm{e}^{1/200\gamma })^{k}. }$$
(4.255)

Since this holds for all k ≥ k 0, the inequality (4.250) follows.

Step 2. We shall next show that small digit a i implies a local rectangle property. It follows from (4.255) that, for all sufficiently large k,

$$\displaystyle{ a_{i} + 1 \leq (1000\gamma )^{8}\mathrm{e}^{ \frac{1} {100\gamma } } }$$
(4.256)

holds for at least k∕2 values of i in 1 ≤ i ≤ k. In other words, at least half of the continued fraction digits a i of α are small, less than a constant depending only on γ, in the precise quantitative sense of (4.256).

Next we show that, for every small digit a i , the rectangle property must hold locally, in some power-of-two range around q i . To prove this, we basically repeat the proof of Lemma 4, and use some facts from the theory of continued fractions; see Lemma 24 below. The details go as follows.

As in the proof of Lemma 4, we consider a rectangle of slope 1∕α and which contains two lattice points P = (k, ) and Q = (m, n); in fact, assume that P and Q are two vertices of the rectangle. We denote the vector from P to Q by v = (mk, n), and consider the two perpendicular unit vectors

$$\displaystyle{\mathbf{e}_{1} = \left ( \frac{\alpha } {\sqrt{1 +\alpha ^{2}}}, \frac{1} {\sqrt{1 +\alpha ^{2}}}\right )\quad \mbox{ and}\quad \mathbf{e}_{2} = \left ( \frac{1} {\sqrt{1 +\alpha ^{2}}},- \frac{\alpha } {\sqrt{1 +\alpha ^{2}}}\right ).}$$

Then the two sides a and b of the rectangle can be expressed in terms of the inner products \(\mathbf{e}_{1} \cdot \mathbf{v}\) and \(\mathbf{e}_{2} \cdot \mathbf{v}\). We have

$$\displaystyle{a =\vert \mathbf{e}_{1} \cdot \mathbf{v}\vert = \frac{\vert p\alpha + q\vert } {\sqrt{1 +\alpha ^{2}}}\quad \mbox{ and}\quad b =\vert \mathbf{e}_{2} \cdot \mathbf{v}\vert = \frac{\vert p - q\alpha \vert } {\sqrt{1 +\alpha ^{2}}},}$$

where p = mk and q = n. Thus the area of the rectangle is equal to

$$\displaystyle{ \mathrm{area} = ab = \frac{\vert p\alpha + q\vert \vert p - q\alpha \vert } {1 +\alpha ^{2}}. }$$
(4.257)

Without loss of generality we can assume that p ≥ 0 and q ≥ 0, and that p is the nearest integer to q α. Then \(\vert p - q\alpha \vert =\Vert q\alpha \Vert\). Next we need the following fact from the theory of continued fractions.

Lemma 24.

If 1 ≤ q < q i , then

$$\displaystyle{\Vert q\alpha \Vert \geq \Vert q_{i-1}\alpha \Vert > \frac{1} {(a_{i} + 2)q_{i-1}}.}$$

We postpone the proof of Lemma 24.

Now assume that

$$\displaystyle{ \frac{q_{i-1}} {4} \leq q < q_{i}. }$$
(4.258)

Applying Lemma 24 and (4.258), we have

$$\displaystyle{\vert p - q\alpha \vert =\Vert q\alpha \Vert \geq \Vert q_{i-1}\alpha \Vert > \frac{1} {(a_{i} + 2)q_{i-1}} \geq \frac{1} {4(a_{i} + 2)q}.}$$

Substituting this in (4.257) and assuming (4.258), we have

$$\displaystyle{ \mathrm{area} = ab = \frac{(p\alpha + q)\vert p - q\alpha \vert } {1 +\alpha ^{2}} \geq \frac{q\vert p - q\alpha \vert } {1 +\alpha ^{2}} \geq \frac{1} {4(a_{i} + 2)(1 +\alpha ^{2})}. }$$
(4.259)

Let us elaborate on the meaning of (4.259). It is about a rectangle of slope 1∕α which contains two lattice points P = (k, ) and Q = (m, n); in fact, P and Q are two vertices of the rectangle. We write the vector from P to Q as v = (p, q) and, without loss of generality, we can assume that p ≥ 0 and q ≥ 0, and that p is the nearest integer to q α. If q is large, then \(\sqrt{1 +\alpha ^{2}}q\) is very close to the diameter of this long and narrow rectangle. It means that q is basically a size parameter of the rectangle. Assume that the restriction (4.258) holds. Then the inequality (4.259) tells us that the area of this long and narrow rectangle is at least \(1/4(a_{i} + 2)(1 +\alpha ^{2})\), that is, the area is not too small if a i is not too large.

We can therefore rephrase (4.258) and (4.259) together in a nutshell as follows. A small digit a i yields the rectangle property locally. This means that we have a good chance to adapt the Riesz product technique.

For the convenience of the reader, we interrupt the argument, and include a proof of Lemma 24 which is surprisingly tricky.

Proof of Lemma 24.

Recall (4.244), that

$$\displaystyle{p_{k} = a_{k}p_{k-1} + p_{k-2},\quad q_{k} = a_{k}q_{k-1} + q_{k-2}.}$$

These recurrences hold for any a k , including arbitrary real values. Writing

$$\displaystyle{\alpha = [a_{1},\ldots,a_{k-1},\alpha _{k}],}$$

with

$$\displaystyle{\alpha _{k} = a_{k} + \frac{1} {a_{k+1}+} \frac{1} {a_{k+2}+}\ldots = [a_{k};a_{k+1},a_{k+2},\ldots ],}$$

we obtain the useful formula

$$\displaystyle{\alpha = \frac{\alpha _{k}p_{k-1} + p_{k-2}} {\alpha _{k}q_{k-1} + q_{k-2}},}$$

and it follows that

$$\displaystyle{ q_{k-1}\alpha - p_{k-1} = \frac{q_{k-1}p_{k-2} - p_{k-1}q_{k-2}} {\alpha _{k}q_{k-1} + q_{k-2}}. }$$
(4.260)

It is not difficult to show that

$$\displaystyle{ q_{k-1}p_{k-2} - p_{k-1}q_{k-2} = -(q_{k-2}p_{k-3} - p_{k-2}q_{k-3}). }$$
(4.261)

Since p 0 = 0, q 0 = 1, p 1 = 1 and q 1 = a 1, we have \(q_{1}p_{0} - p_{1}q_{0} = -1\). It follows by induction, using (4.261), that

$$\displaystyle{ q_{k-1}p_{k-2} - p_{k-1}q_{k-2} = (-1)^{k-1}. }$$
(4.262)

Combining this with (4.260), we have

$$\displaystyle{ q_{k-1}\alpha - p_{k-1} = \frac{(-1)^{k-1}} {\alpha _{k}q_{k-1} + q_{k-2}}, }$$
(4.263)

which implies

$$\displaystyle{\Vert q_{k-1}\alpha \Vert =\vert q_{k-1}\alpha - p_{k-1}\vert = \frac{1} {\alpha _{k}q_{k-1} + q_{k-2}} > \frac{1} {(a_{k} + 2)q_{k-1}}.}$$

It remains to prove that, if p and q are integers with 0 < q < q k , then

$$\displaystyle{ \vert q\alpha - p\vert \geq \vert q_{k-1}\alpha - p_{k-1}\vert. }$$
(4.264)

To prove this, we define integers u and v by the equations

$$\displaystyle{ p = up_{k-1} + vp_{k},\quad q = uq_{k-1} + vq_{k}. }$$
(4.265)

Note that (4.265) is solvable in integers u and v, since the determinant of the system is ± 1, in view of (4.262). Since 0 < q < q k , we must have u ≠ 0. Moreover, if v ≠ 0, then u and v must have opposite signs. Since \(q_{k-1}\alpha - p_{k-1}\) and q k αp k also have opposite signs, in view of (4.263), we conclude that

$$\displaystyle\begin{array}{rcl} \vert q\alpha - p\vert & =& \vert u(q_{k-1}\alpha - p_{k-1}) + v(q_{k}\alpha - p_{k})\vert {}\\ & =& \vert u(q_{k-1}\alpha - p_{k-1})\vert +\vert v(q_{k}\alpha - p_{k})\vert {}\\ & \geq & \vert u(q_{k-1}\alpha - p_{k-1})\vert \geq \vert q_{k-1}\alpha - p_{k-1}\vert, {}\\ \end{array}$$

proving (4.264). □ 

Step 3. We next employ the Riesz product technique. Let us return to Theorem 12, and the basically equivalent Proposition 13. A trivial novelty is that in this section, the slope is 1∕α, whereas in Theorem 12 and Proposition 13, the slopes are respectively \(1/\sqrt{2}\) and 0. The Riesz product (4.99) is defined by using some appropriate modified Rademacher functions \(R_{j}(\mathbf{x}) \in \mathcal{R}(j)\) for j with \(1 \leq 2^{j} \leq N\), i.e. for log2 N + O(1) values of j. In the hypothesis of Theorem 12 and Proposition 13, we have the unrestricted rectangle property; here we have a restricted rectangle property instead, meaning that the rectangle property holds only for O(logN) values of the power-of-two parameter j, where \(0 \leq j \leq \log _{2}N + O(1)\). Indeed, by (4.250) and (4.251), we have

$$\displaystyle{\log N_{k} =\log q_{k-1} + O(1) \leq \log \prod _{i=1}^{k}(a_{ i} + 1) + O(1) = O(\log N),}$$

and by (4.256), the continued fraction digit a i of α is small for at least k∕2 values of i in 1 ≤ i ≤ k, if k is sufficiently large. For these small values of the continued fraction a i , the rectangle property holds in the power-of-two range around q i−1, i.e. when \(2^{j} \approx q_{i-1}\); see (4.258) and (4.259). This means that we can easily save the Riesz product technique developed earlier in Sects. 4.54.8. The minor price that we pay is a constant factor loss, due to the fact that log2 N is replaced by c 28logN, where \(c_{28} = c_{28}(\gamma )\) is a small positive constant depending only on γ > 0. Thus we obtain the following result.

Lemma 25.

Let I = [a,b], where 0 ≤ a < b < 1, be an arbitrary subinterval of the unit interval. Assume that(4.249)holds. Then there exists a constant \(\delta ^{{\prime}} =\delta ^{{\prime}}(\gamma ) > 0\) , depending only on γ > 0, such that the following hold:

  1. (i)

    For all sufficiently large integers N, there is a subinterval \(I_{1} = [a_{1},b_{1}]\) of I, possibly depending on N and with \(a < a_{1} < b_{1} < b\) , such that for all \(\beta _{1} \in I_{1}\) ,

    $$\displaystyle{f(\alpha;\beta _{1};\gamma;N) > 2\gamma \log N +\delta ^{{\prime}}\log N.}$$
  2. (ii)

    For all sufficiently large integers N, there is a subinterval \(I_{2} = [a_{2},b_{2}]\) of I, possibly depending on N and with \(a < a_{2} < b_{2} < b\) , such that for all \(\beta _{2} \in I_{2}\) ,

    $$\displaystyle{f(\alpha;\beta _{2};\gamma;N) < 2\gamma \log N -\delta ^{{\prime}}\log N.}$$

Step 4. The last step, the construction of a Cantor set, is routine. Combining the method of nested intervals with Lemma 25, we can easily build an infinite binary tree of nested intervals the same way as in the proof of Theorem 3. The divergence points β arise as the intersection of infinitely many decreasing intervals, which correspond to an infinite branch of the binary tree. Since a binary tree of countably infinite height has continuum many infinite branches, we obtain continuum many divergence points, proving Theorem 21 in Case 22.

Case 26.

The inequality

$$\displaystyle{ \sum _{i=1}^{k}\lfloor \sqrt{\gamma a_{ i}}\rfloor > 100 \cdot 2\gamma \log N_{k} }$$
(4.266)

holds for infinitely many integers k ≥ 1, where N k is defined by (4.247).

The estimate (4.241) tells us that 2γlogN k is the average value of \(f(\alpha;\beta;\gamma;N_{k})\) as β runs through the unit interval. On the other hand, combining (4.248) and (4.266), we deduce that

$$\displaystyle{f(\alpha;\beta = 0;\gamma;N_{k}) > 100 \cdot 2\gamma \log N_{k}}$$

for infinitely many integers k ≥ 1. In other words, for infinitely many values N = N k , the homogeneous case β = 0 gives at least 100 times more integer solutions than the average value 2γlogN k . This represents an extreme bias; in fact, an extreme surplus. The proof of Theorem 3 is based on a somewhat similar extreme bias, a violation of the Naive Area Principle, in the sense that the Pell inequality \(-1 < x^{2} - 2y^{2} < 1\) has no integer solution except x = y = 0, while the corresponding hyperbolic region has infinite area. The only difference is that whereas in Theorem 3, we have an extreme shortage of solutions for the homogeneous case β = 0, we have here an extreme surplus. But this difference is irrelevant for the method of nested intervals, as it works in both cases. This means that in Case 2, we can simply repeat the Cantor set construction in the proof of Theorem 3. This completes the proof of Theorem 21.

Theorem 21 is a qualitative result. In contrast, we complete this section with a quantitative result.

Proposition 27.

Let α > 0 and γ > 0 be arbitrary real numbers. Then there is an effectively computable positive constant \(\delta ^{{\prime}} =\delta ^{{\prime}}(\gamma ) > 0\) , depending only on γ > 0, such that for every sufficiently large integer N, there exist two real numbers β 1 (N) and β 2 (N) in the unit interval, with \(0 \leq \beta _{1}(N) <\beta _{2}(N) < 1\) , such that

$$\displaystyle{\vert f(\alpha;\beta _{1}(N);\gamma;N) - f(\alpha;\beta _{2}(N);\gamma;N)\vert >\delta ^{{\prime}}\log N.}$$

We just outline the proof in a couple of sentences, since it is basically the same as that of Theorem 21, without the Cantor set construction. Indeed, let \(q_{\ell-1} \leq N < q_{\ell}\). Since \(q_{\ell} = a_{\ell}q_{\ell-1} + q_{\ell-2} \leq (a_{\ell} + 1)q_{\ell-1}\), we have

$$\displaystyle{1 \leq \frac{N} {q_{\ell-1}} \leq a_{\ell} + 1.}$$

Again we distinguish two cases.

Case 28.

We have

$$\displaystyle{\sum _{i=1}^{\ell-1}\lfloor \sqrt{\gamma a_{ i}}\rfloor + \left \lfloor \sqrt{ \frac{\gamma N} {q_{\ell-1}}}\right \rfloor \leq 100 \cdot 2\gamma \log N.}$$

Then by repeating the argument of Case 1 in the proof of Theorem 21 above, we obtain Proposition 27; see Lemma 25.

Case 29.

We have

$$\displaystyle{\sum _{i=1}^{\ell-1}\lfloor \sqrt{\gamma a_{ i}}\rfloor + \left \lfloor \sqrt{ \frac{\gamma N} {q_{\ell-1}}}\right \rfloor > 100 \cdot 2\gamma \log N.}$$

Then

$$\displaystyle{f(\alpha;\beta = 0;\gamma;N) > 100 \cdot 2\gamma \log N,}$$

and so we can choose β 1(N) = 0. Finally, for β 2(N), we can choose any below average point; in other words, we can choose β 2(N) to be any β that satisfies the inequality \(f(\alpha;\beta;\gamma;N) \leq (2 + o(1))\gamma \log N\); see (4.241).

10 General Point Sets: Theorem 30

What will happen if we drop the rectangle property in Theorem 12 or Proposition 13? Can we still exhibit extra large deviations for hyperbolic needles? This is the subject of this last section.

Suppose that \(\mathcal{P}\) is a finite point set of density δ > 0 in a large square [0, M]2, so that \(\vert \mathcal{P}\vert =\delta M^{2}\). We shall make a very mild technical assumption, that \(\mathcal{P}\) is not clustered. More precisely, we introduce a new concept called the separation constant and denoted by \(\sigma =\sigma (\mathcal{P})\), and say that \(\mathcal{P}\) is σ-separated if the usual Euclidean distance between any two points of \(\mathcal{P}\) is at least σ. For example, the set of integer lattice points in the plane is clearly 1-separated, so that \(\sigma (\mathbf{Z}^{2}) = 1\).

Our basic idea is the following. We show that if \(\mathcal{P}\) is σ-separated with some not too small constant σ > 0, then the rectangle property holds, at least in a weak statistical sense, for the majority of the directions which we shall call the good directions. For example, in Theorem 12, the slope \(1/\sqrt{2}\) is a concrete good direction. This is how we will be able to save the Riesz product argument in the proof of Theorem 12 or Proposition 13, and still prove extra large deviations, proportional to the area, for hyperbolic needles, at least for the majority of the directions.

In the rest of the section, we work out the details of the vague intuition, and this will give us Theorem 30. The obvious handicap of this majority approach is that for an arbitrary point set \(\mathcal{P}\) which is not clustered, we cannot predict whether a given concrete direction is good or not.

Another, and purely technical, shortcoming is that in Theorem 30, we cannot get rid of the assumption that \(\mathcal{P}\) is not clustered. This technical difficulty is rather counterintuitive, since at least at first sight, clusters actually seem to help us create extra large deviations. However, some technical difficulties prevent us from adapting the Riesz product technique for clustered point sets \(\mathcal{P}\). It remains an interesting open problem to decide whether or not the separation constant \(\sigma =\sigma (\mathcal{P})\) in Theorem 30 plays any role.

In Theorem 30, we changeFootnote 15 the underlying set, and switch from the large square [0, M]2 to the large disk

$$\displaystyle{\mathrm{disk}(\mathbf{0};M) =\{ \mathbf{x} \in \mathbf{R}^{2}:\vert \mathbf{x}\vert \leq M\}}$$

of radius M and centered at the origin.

Let \(\mathcal{P}\) be a finite point set of density δ > 0 in the large disk disk(0; M), so that \(\vert \mathcal{P}\vert =\delta \pi M^{2}\); here we assume that the radius M is large. We also assume that \(\mathcal{P}\) is not clustered. More precisely, we assume that \(\mathcal{P}\) is σ-separated for some positive constant \(\sigma =\sigma (\mathcal{P}) > 0\). The goal is to count the number of elements of \(\mathcal{P}\) in rotated and translated copies of our usual hyperbolic needle H γ (N).

Let 10−2 > η > 0 be a small positive real numbers, to be specified later. Let j be an arbitrary integer in the interval 0 ≤ j ≤ n, where 2n ≈ N, that is, n = log2 N + O(1) in binary logarithm. We decompose the large disk disk(0; M) into disjoint translated copies of the small rectangle

$$\displaystyle{ [0,2^{j}\eta ] \times [0,2^{-j}\eta ]; }$$
(4.267)

in other words, we form a rectangle lattice starting from the origin. We shall focus on the copies of (4.267) which are inside the large disk disk(0; M), and ignore the copies of (4.267) that intersect the boundary circle or are outside the disk. Note that there are O(2j η M) copies of (4.267) that intersect the boundary circle of the large disk. If 2j η = o(M), then there are \((1 + o(1))\pi M^{2}\eta ^{-2}\) copies of (4.267) that are inside the large disk disk(0; M). We call these translated copies of the small rectangle (4.267) j-cells. More precisely, we call them j-cells of angle 0.

In general, let θ be an arbitrary angle, with 0 ≤ θ < π. Let Rot θ denote the rotation of the plane by the angle θ, assuming that the fixed point of the rotation Rot θ is the origin. We decompose the large disk disk(0; M) into disjoint translates of the rotated copy

$$\displaystyle{ \mathrm{Rot}_{\theta }([0,2^{j}\eta ] \times [0,2^{-j}\eta ]) }$$
(4.268)

of the small rectangle (4.267). We shall focus on the translated copies of (4.268) which are inside the large disk disk(0; M). Again, if 2j η = o(M), then there are \((1 + o(1))\pi M^{2}\eta ^{-2}\) translated copies of (4.268) that are inside the large disk disk(0; M). We call these translated copies of the small rectangle (4.268) j-cells of angle θ.

We want to prove, in a quantitative form, that if \(\mathcal{P}\) is not clustered, then for a typical angle θ ∈ [0, π), the overwhelming majority of the j-cells of angle θ that contain at least one point of \(\mathcal{P}\) actually contain exactly one point of \(\mathcal{P}\). A quantitative result like this, a statistical version of the rectangle property, will serve as a substitute for the rectangle property, and it will suffice to save the Riesz product technique developed in Sects. 4.54.8.

Statistical Version of the Rectangle Property: An Average Argument. Suppose that \(P_{i_{1}},P_{i_{2}} \in \mathcal{P}\), where i 1i 2, are two arbitrary points. We define the angle-set by

$$\displaystyle{\mathrm{angle}(P_{i_{1}},P_{i_{2}};j) =\{\theta \in [0,\pi ): \mbox{ there is a $j$-cell of angle $\theta $ containing $P_{i_{1}}$ and $P_{i_{2}}$}\}.}$$

The angle-set \(\mathrm{angle}(P_{i_{1}},P_{i_{2}};j)\) is clearly measurable. Let \(\vert \mathrm{angle}(P_{i_{1}},P_{i_{2}};j)\vert\) denote the usual one-dimensional Lebesgue measure, i.e. length.

The basic idea is to estimate the double sum

$$\displaystyle{\mathop{\sum _{P_{i_{ 1}},P_{i_{2}}\in \mathcal{P}}}_{i_{1}\neq i_{2}}\vert \mathrm{angle}(P_{i_{1}},P_{i_{2}};j)\vert.}$$

Simple geometric consideration shows that

$$\displaystyle{\vert \mathrm{angle}(P_{i_{1}},P_{i_{2}};j)\vert < 2 \cdot \frac{2^{-j}\eta } {\vert P_{i_{1}}P_{i_{2}}\vert },}$$

where 2j η is the length of the short side of a j-cell and \(\vert P_{i_{1}}P_{i_{2}}\vert\) denotes the usual Euclidean distance between \(P_{i_{1}}\) and \(P_{i_{2}}\), and so

$$\displaystyle{ \mathop{\sum _{P_{i_{ 1}},P_{i_{2}}\in \mathcal{P}}}_{i_{1}\neq i_{2}}\vert \mathrm{angle}(P_{i_{1}},P_{i_{2}};j)\vert < 2^{-j}\eta \sum _{ P_{i_{1}}\in \mathcal{P}}\left (\mathop{\sum _{P_{i_{2}}\in \mathcal{P}}}_{i_{1}\neq i_{2}} \frac{1} {\vert P_{i_{1}}P_{i_{2}}\vert } \right ). }$$
(4.269)

Since \(\mathcal{P}\) is σ-separated, it is easy to give an upper bound to the inner sum in (4.269). Using a standard power-of-two decomposition, we have

$$\displaystyle\begin{array}{rcl} & & \mathop{\sum _{P_{i_{ 2}}\in \mathcal{P}}}_{i_{1}\neq i_{2}} \frac{1} {\vert P_{i_{1}}P_{i_{2}}\vert } \leq \sum _{1\leq \ell\leq L}\mathop{ \mathop{\sum _{P_{i_{ 2}}\in \mathcal{P}}}_{i_{1}\neq i_{2}}} _{2^{\ell-1}\sigma <\vert P_{i_{ 1}}P_{i_{2}}\vert \leq 2^{\ell}\sigma } \frac{1} {\vert P_{i_{1}}P_{i_{2}}\vert } \\ & & \quad \leq \sum _{1\leq \ell\leq L} \frac{1} {2^{\ell-1}\sigma } \cdot \pi (2^{\ell+1})^{2} =\sum _{ 1\leq \ell\leq L}\frac{8\pi } {\sigma } \cdot 2^{\ell} < \frac{16\pi } {\sigma } \cdot 2^{L}, {}\end{array}$$
(4.270)

where L denotes the largest integer such thatFootnote 16 \(2^{L}\sigma < 2^{j+1}\eta\), and where the estimate \(\pi (2^{\ell+1})^{2}\) arises from the fact that a square of side σ∕2 cannot contain two points from \(\mathcal{P}\), since \(\mathcal{P}\) is σ-separated. From (4.270), and using the fact that \(2^{L}\sigma < 2^{j+1}\eta\), we conclude that

$$\displaystyle{ \mathop{\sum _{P_{i_{ 2}}\in \mathcal{P}}}_{i_{1}\neq i_{2}} \frac{1} {\vert P_{i_{1}}P_{i_{2}}\vert } < \frac{16\pi } {\sigma } \cdot 2^{L} < \frac{16\pi } {\sigma } \cdot \frac{2^{j+1}\eta } {\sigma } = \frac{2^{5}\pi \eta 2^{j}} {\sigma ^{2}}. }$$
(4.271)

Combining (4.269) and (4.271), and using the fact that \(\vert \mathcal{P}\vert =\delta \pi M^{2}\), we then obtain

$$\displaystyle{ \mathop{\sum _{P_{i_{ 1}},P_{i_{2}}\in \mathcal{P}}}_{i_{1}\neq i_{2}}\vert \mathrm{angle}(P_{i_{1}},P_{i_{2}};j)\vert < 2^{-j}\eta \vert \mathcal{P}\vert \frac{2^{5}\pi \eta 2^{j}} {\sigma ^{2}} = \frac{2^{5}\pi ^{2}\eta ^{2}\delta M^{2}} {\sigma ^{2}}. }$$
(4.272)

Recall that the disk \(\mathrm{disk}(\mathbf{0};M)\) of radius M contains \((1 + o(1))\pi M^{2}\eta ^{-2}\) j-cells of a given angle θ, and that θ runs through the interval 0 ≤ θ < π. It is natural, therefore, to normalize the sum (4.272) and consider the average

$$\displaystyle{ \frac{1} {\pi ^{2}M^{2}\eta ^{-2}}\mathop{ \sum _{P_{i_{ 1}},P_{i_{2}}\in \mathcal{P}}}_{i_{1}\neq i_{2}}\vert \mathrm{angle}(P_{i_{1}},P_{i_{2}};j)\vert <\eta ^{4} \cdot \frac{2^{5}\delta } {\sigma ^{2}}. }$$
(4.273)

Consequences of Inequality(4.273). Let us return to Sect. 4.8. Recall that the last step in the proof of Proposition 13, and indirectly the proof of Theorem 12, is to choose the parameters η 1 and η 2 as sufficiently small positive constants independent of M and N; see (4.226). In fact, in view of (4.100), η 1 and η 2 are almost equal.

In similar fashion, we assume here that the parameter γ of the hyperbolic needle, the density δ of \(\mathcal{P}\) and the separation constant σ of \(\mathcal{P}\) are fixed positive constants, and consider η, which of course plays the role of η 1 and η 2, as a parameter that we shall eventually choose as a sufficiently small positive constant independent of M and N.

Since the area of a j-cell is η 2, we can say roughly that the probability that a j-cell of any angle contains a point of \(\mathcal{P}\) is

$$\displaystyle{ \mathrm{density} \times \mathrm{ area} =\delta \eta ^{2}. }$$
(4.274)

On the other hand, in view of (4.273), the probability that a j-cell of any angle contains exactly two points of \(\mathcal{P}\) does not exceed c 29 η 4, which is negligible compared to δ η 2 in (4.274) if η is small enough.

In general, the probability that a j-cell of any angle contains exactly p points of \(\mathcal{P}\), where \(2^{\ell} < p \leq 2^{\ell+1}\) with  = 1, 2, 3, , does not exceed \(c_{30}\eta ^{4}4^{-\ell}\), where the constant factor c 30 is independent of . Indeed, p points from \(\mathcal{P}\) means that we can choose \({p\choose 2}\) pairs \(P_{i_{1}},P_{i_{2}}\), implying that those rich j-cells show up with multiplicity

$$\displaystyle{{p\choose 2} > 2^{\ell}2^{\ell-1} = \frac{1} {2}4^{\ell}}$$

in (4.273), explaining the factor 4 in \(c_{30}\eta ^{4}4^{-\ell}\). The point here is that even the sum of the products

$$\displaystyle{\sum _{\ell\geq 1}2^{\ell+1}\eta ^{4}4^{-\ell}}$$

is negligible compared to the δ η 2 in (4.274) if η is small enough.

Summarizing, we can say that (4.273) implies the following general picture about the distribution of the elements of \(\mathcal{P}\) in the j-cells of any angle. Let θ ∈ [0, π) be a typical angle, and consider the j-cells of angle θ. The overwhelming majority of the points \(P \in \mathcal{P}\) turn out to be singles, meaning that if the point P is contained in some j-cell \(\mathcal{C}\) of angle θ, then \(\mathcal{C}\) does not contain any other point of \(\mathcal{P}\). Here the vague term overwhelming majority in fact has the quantitative meaning of 1 − O(η 2) part of \(\mathcal{P}\). Note that 1 − O(η 2) is almost 1 if η is small.

Furthermore, rich j-cells turn out to be very rare in the following sense. Let  ≥ 0 be a fixed integer. The proportion of the j-cells \(\mathcal{C}\) of angle θ containing p points of \(\mathcal{P}\), where \(2^{\ell} < p \leq 2^{\ell+1}\), compared to those j-cells which contain at least one point of \(\mathcal{P}\), does not exceed \(c_{31}\eta ^{2}4^{-\ell}\), where the constant factor c 31 is independent of . Since 2 is negligible compared to 4 if is large, the term very rare is well justified.

We can say, therefore, that a weaker statistical version of the rectangle property holds for the majority of the angles θ ∈ [0, π), assuming that η > 0 is a sufficiently small constant depending only on the parameter γ of the hyperbolic needle, the density δ of \(\mathcal{P}\) and the separation constant σ of \(\mathcal{P}\).

A simple analysis of the Riesz product argument in Sects. 4.54.8 shows that this weaker statistical version of the rectangle property is a good substitute for the strict rectangle property, and thus we can prove the following result.

Theorem 30.

Let \(\mathcal{P}\) be a finite set of points in the disk \(\mathrm{disk}(\mathbf{0};M)\) with density δ, so that the number of elements of \(\mathcal{P}\) is \(\vert \mathcal{P}\vert =\delta \pi M^{2}\) . Assume that \(\mathcal{P}\) is σ-separated for some σ > 0. Assume further that both N and M∕N are sufficiently large, depending only on γ, δ and σ. Then there exist a positive constant \(\delta ^{{\prime}} =\delta ^{{\prime}}(\sigma,\gamma,\delta ) > 0\) , independent of N and M, and a measurable subset \(\mathcal{A}\subset [0,2\pi )\) , of Lebesgue measure greater than \(\frac{99} {100} \cdot 2\pi\) , such that for every angle \(\theta \in \mathcal{A}\) , there exist translated copies \(\mathbf{x}_{1} +\mathrm{ Rot}_{\theta }H_{\gamma }(N) \subset \mathrm{ disk}(\mathbf{0};M)\) and \(\mathbf{x}_{2} +\mathrm{ Rot}_{\theta }H_{\gamma }(N) \subset \mathrm{ disk}(\mathbf{0};M)\) of the rotated hyperbolic needle \(\mathrm{Rot}_{\theta }H_{\gamma }(N)\) such that

$$\displaystyle{\vert \mathcal{P}\cap (\mathbf{x}_{1} +\mathrm{ Rot}_{\theta }H_{\gamma }(N))\vert \geq 2\delta \gamma \log N +\delta ^{{\prime}}\log N}$$

and

$$\displaystyle{\vert \mathcal{P}\cap (\mathbf{x}_{2} +\mathrm{ Rot}_{\theta }H_{\gamma }(N))\vert \leq 2\delta \gamma \log N -\delta ^{{\prime}}\log N.}$$

As indicated at the beginning of this section, it is reasonable to guess that clusters just help to create extra large fluctuations. This intuition motivates the following

Open Problem.

Can one prove a version of Theorem  30 which makes no reference to the separation constant \(\sigma =\sigma (\mathcal{P})\) ? In other words, can we simply drop \(\sigma =\sigma (\mathcal{P})\) from the hypotheses of Theorem  30?

The author guesses that the answer is affirmative but, unfortunately, cannot prove it.

Finally, we briefly mention a closely related problem, where we cannot drop the separation constant \(\sigma =\sigma (\mathcal{P})\) from the hypotheses. Note that Theorems 321 all concern the extra large fluctuations of the measure-theoretic discrepancy, meaning the difference between the number of points of \(\mathcal{P}\) and its expectation of density times area. What we study last here is the large fluctuations of the ± 1-discrepancy, or 2-coloring discrepancy .

This means that we have an arbitrary 2-coloring \(\varphi: \mathcal{P}\rightarrow \{\pm 1\}\) of the given point set \(\mathcal{P}\), with + 1 representing red and − 1 representing blue, say. Extra large fluctuations of the ± 1-discrepancy means that there is a translated, or rotated and translated, copies H and H ′ ′ of the hyperbolic needle H γ (N) such that

$$\displaystyle{\sum _{P\in \mathcal{P}\cap H^{{\prime}}}\varphi (P) > c_{32} \cdot \mathrm{ area}(H^{{\prime}}) = c_{ 33}\log N > 0}$$

with some positive constants c 32 and c 33 and

$$\displaystyle{\sum _{P\in \mathcal{P}\cap H^{{\prime\prime}}}\varphi (P) < -c_{34} \cdot \mathrm{ area}(H^{{\prime\prime}}) = -c_{ 35}\log N < 0}$$

with some positive constants c 34 and c 35.

The Riesz product technique can be easily adapted to prove extra large fluctuations of the ± 1-discrepancy. For example, we have the following ± 1-discrepancy analog of Proposition 13.

Proposition 31 (2-Coloring Discrepancy for Translated Copies).

Let \(\mathcal{P}\) be a finite set of points in the square [0,M] 2 with density δ, so that the number of elements of \(\mathcal{P}\) is \(\vert \mathcal{P}\vert =\delta M^{2}\) . Let \(\varphi: \mathcal{P}\rightarrow \{\pm 1\}\) be an arbitrary 2-coloring of \(\mathcal{P}\) . Assume that \(\mathcal{P}\) satisfies the following rectangle property, that there is a positive constant \(c_{1} = c_{1}(\mathcal{P}) > 0\) such that every axes-parallel rectangle of area c 1 contains at most one element of the set \(\mathcal{P}\) . As in Proposition  13 , let \(\delta ^{{\prime}} =\delta ^{{\prime}}(c_{1},\gamma,\delta )\) be defined by (4.83) and (4.84) , and assume that both N and M∕N are sufficiently large and satisfy (4.85) . Then for the hyperbolic needle H γ (N) given by (4.86) , there exist translated copies \(\mathbf{x}_{1} + H_{\gamma }(N) \subset [0,M]^{2}\) and \(\mathbf{x}_{2} + H_{\gamma }(N) \subset [0,M]^{2}\) such that

$$\displaystyle{\sum _{P\in \mathcal{P}\cap (\mathbf{x}_{1}+H_{\gamma }(N))}\varphi (P) \geq \delta ^{{\prime}}\log N}$$

and

$$\displaystyle{\sum _{P\in \mathcal{P}\cap (\mathbf{x}_{2}+H_{\gamma }(N))}\varphi (P) \leq -\delta ^{{\prime}}\log N.}$$

Similarly, one can easily prove the following analog of Theorem 30.

Proposition 32 (2-Coloring Discrepancy for Rotated and Translated Copies).

Let \(\mathcal{P}\) be a finite set of points in the disk \(\mathrm{disk}(\mathbf{0};M)\) with density δ, so that the number of elements of \(\mathcal{P}\) is \(\vert \mathcal{P}\vert =\delta \pi M^{2}\) . Let \(\varphi: \mathcal{P}\rightarrow \{\pm 1\}\) be an arbitrary 2-coloring of \(\mathcal{P}\) . Assume that \(\mathcal{P}\) is σ-separated with some σ > 0. Assume further that both N and M∕N are sufficiently large, depending only on γ, δ and σ. Then there exist a positive constant \(\delta ^{{\prime}} =\delta ^{{\prime}}(\sigma,\gamma,\delta ) > 0\) , independent of N and M, and a measurable subset \(\mathcal{A}\subset [0,2\pi )\) , of Lebesgue measure greater than \(\frac{99} {100} \cdot 2\pi\) , such that for every angle \(\theta \in \mathcal{A}\) , there exist translated copies \(\mathbf{x}_{1} +\mathrm{ Rot}_{\theta }H_{\gamma }(N) \subset \mathrm{ disk}(\mathbf{0};M)\) and \(\mathbf{x}_{2} +\mathrm{ Rot}_{\theta }H_{\gamma }(N) \subset \mathrm{ disk}(\mathbf{0};M)\) of the rotated hyperbolic needle \(\mathrm{Rot}_{\theta }H_{\gamma }(N)\) such that

$$\displaystyle{\sum _{P\in \mathcal{P}\cap (\mathbf{x}_{1}+\mathrm{Rot}_{\theta }H_{\gamma }(N))}\varphi (P) \geq \delta ^{{\prime}}\log N}$$

and

$$\displaystyle{\sum _{P\in \mathcal{P}\cap (\mathbf{x}_{2}+\mathrm{Rot}_{\theta }H_{\gamma }(N))}\varphi (P) \leq -\delta ^{{\prime}}\log N.}$$

We want to point out that in Proposition 32 on the ± 1-discrepancy of hyperbolic needles, we definitely need some extra condition implying that \(\mathcal{P}\) is not too clustered. Indeed, it is easy to construct an extremely clustered point set \(\mathcal{P}\) for which the ± 1-discrepancy of the hyperbolic needles is negligible. For example, we can start with a typical point set in general position, and split up every point into a pair of points being extremely close to each other. The two points in these extremely close pairs are joined with a straight line segment each, and we refer to these line segments as the very short line segments. Consider the particular 2-coloring of the point set where the two points in the extremely close pairs all have different colors, with one + 1 and the other − 1. We can easily guarantee that this particular 2-coloring has negligible ± 1-discrepancy for the family of all hyperbolic needles congruent to H γ (N). If the original point set is in general position and the point pairs are close enough, than the arcs of any congruent copy of H γ (N) intersect at most two very short line segments. Since the boundary of H γ (N) consists of 4 arcs, the ± 1-discrepancy is at most 4 ⋅ 2 = 8, which is indeed negligible.