1 Introduction

In the early 60s Paul Erdős asked the following question about point-line incidences in the plane: Is it possible that a planar point set contains many collinear four-tuples, but it contains no five points on a line? There are constructions for \(n\)-element point sets with \(n^2/6-O(n)\) collinear triples with no four on a line (see [4] or [11]). However, no similar construction is known for larger tuples.

Let us formulate Erdős’ problem more precisely. For a finite set \(P\) of points in the plane and \(k\ge 2\), let \(t_k(P)\) be the number of lines meeting \(P\) in exactly \(k\) points, and let \(T_k(P):=\sum _{k^{\prime }\ge k}t_{k^{\prime }}(P)\) be the number of lines meeting \(P\) in at least \(k\) points. For \(r>k\) and \(n\), we define

$$\begin{aligned} t_k^{(r)}(n):= {\mathop {\mathop {\max }\limits _{|P|=n}}\limits _{T_r(P)=0}} t_k(P). \end{aligned}$$

In plain words, \(t_k^{(r)}(n)\) is the number of lines containing exactly \(k\) points from \(P\), maximized over all \(n\) point sets \(P\) that do not contain \(r\) collinear points. Erdős conjectured that \(t_k^{(r)}(n)=o(n^2)\) for any fixed \(r>k>3\) and offered $100 for a proof or disproof [9] (the conjecture is listed as Conjecture 12 in the problem collection of Brass et al. [3]). In this paper we are concerned about bounding \(t_k^{(k+1)}(n)\) from below for \(k>3\). To simplify notation, from now on we will use \(t_k(n)\) to denote \(t_k^{(k+1)}(n)\) (Fig. 1).

Fig. 1
figure 1

A construction of a point set showing that \(t_3(7)\ge 6\)

1.1 Earlier Results and Our Result

This problem was among Erdős’ favourite geometric problems, he frequently talked about it and listed it among the open problems in geometry, see [610]. It is not just a simple puzzle which might be hard to solve, it is related to some deep and difficult problems in other fields. It seems that the key to tackle this question would be to understand the group structure behind point sets with many collinear triples. A recent result of Green and Tao—proving the Motzkin–Dirac conjecture [12]—might be an important development in this direction.

In the present paper, our goal is to give a construction showing that Erdős conjecture, if true, is sharp; for \(k>3\), one cannot replace the exponent 2 by \(2-c\), for any \(c>0\).

The first result was due to Kárteszi [15] who proved that \(t_k(n)\ge c_kn\log {n}\) for all \(k>3\). In 1976 Grünbaum [13] showed that \(t_k(n)\ge c_kn^{1+1/(k-2)}\). For some 30 years this was the best bound when Ismailescu [14], Brass et al. [2], and Elkies [5] consecutively improved Grünbaum’s bound for \(k\ge 5\). However, similarly to Grünbaum’s bound, the exponent was going to 1 as \(k\) went to infinity.

In what follows we are going to give a construction that substantially improves the lower bound. Namely, we will show the following.

Theorem 1

For any \(k\ge 4\) integer, there is a positive integer \(n_0\) such that for \(n>n_0\) we have \(t_k(n)>n^{2-({c}/{\sqrt{\log n}})}\), where \(c=2\log (4k+9)\).

We note that each of the collinear \(k\)-tuples that we count in our construction has an additional property that the points form a \(k\)-term arithmetic progression, as the distance between every two consecutive points is the same in every coordinate.

1.2 Preliminaries

For \(r>0\) and a positive integer \(d\) let \(B_d(r)\) denote the closed ball in \(\mathbb R ^d\) of radius \(r\) centred at the origin, and \(S_d(r)\) denotes the sphere in \(\mathbb R ^d\) of radius \(r\) centred at the origin.

For any positive integer \(d\) the \(d\)-dimensional unit hypercube (and its translates) are denoted by \(H_d\). If the center of the cube is a point \(\mathbf{x}\in \mathbb R ^d\) then it has the vertex set \(\mathbf{x}+\{-1/2,1/2\}^d\) and we denote it by \(H_d(\mathbf{x})\).

For a set \(S\subseteq \mathbb R ^d\), let \(N(S)\) denote the number of points from the integer lattice \(\mathbb Z ^d\) that belong to \(S\), i.e., \(N(S):=|\mathbb Z ^d \cap S|\).

Throughout the paper the \(\log \) notation stands for the base 2 logarithm.

2 A Lower Bound on \(t_k(n)\)

We will prove bounds for even and odd value of \(k\) separately, as the odd case needs a bit more attention. Our proof is elementary, we use the fact that the volume of a large sphere approximates well the number of lattice points inside the ball. There are more advanced techniques to count lattice points on the surface of a sphere, however we see no way to improve our bound significantly by applying them.

In our construction, we rely on the fact that a point set in a large dimensional space that satisfies our collinearity conditions can be converted to a planar point set by simply projecting it to a plane along a suitably chosen vector, with all the collinearities preserved. That enables us to perform most of the construction in a space of large dimension, exploiting the properties of such a space.

2.1 Proof for Even \(k\)

Let \(d\) be a positive integer, and let \(r>0\). We will use a quantitative version of the following well known fact

$$\begin{aligned} {{N(B_d(r))}\over {V(B_d(r))}}\rightarrow 1 \quad \text {as} \quad r\rightarrow \infty , \end{aligned}$$

where

$$\begin{aligned} V(B_d(r))=\frac{r^d\pi ^{\frac{d}{2}}}{\Gamma \big (\tfrac{d}{2}+1\big )}, \ \end{aligned}$$
(1)

estimating the number of lattice points on a sphere using Gauss’ volume argument.

Lemma 1

For \(r\ge \sqrt{d}\), we have

$$\begin{aligned} V(B_d(r-\sqrt{d}/2))\le N(B_d(r))\le V(B_d(r+\sqrt{d}/2)). \end{aligned}$$

Proof

For every lattice point \(p\in B_d(r) \cap \mathbb Z ^d\) we look at the unit cube \(H_d(p)\) with center \(p\). These cubes all have disjoint interiors and each of them has diameter \(\sqrt{d}\). Moreover, their union \(\bigcup _{p\in B_d(r) \cap \mathbb Z ^d} H_d(p)\) is included in \(B_d(r+\sqrt{d}/2)\), it contains \(B_d(r-\sqrt{d}/2)\) and its volume is equal to \(N(B_d(r))\), which readily implies the statement of the lemma.\(\square \)

We will also use a bound on the number of points on a sphere.

Lemma 2

There exists a constant \(c_0>0\) such that

$$\begin{aligned} N(S_d(r))\le 2^{\frac{c_0\log {r}}{\log \log {r}}}N(B_{d-2}(r)). \end{aligned}$$

Proof

The number \(N(S_d(r))\) is the number of ways \(r^2\) can be written as an ordered sum of \(d\) perfect squares. Such sum can be broken into two sums, the first containing all summands except the last two, and the second containing the last two summands, so we have

$$\begin{aligned} N(S_d(r))=\sum \limits _{s=1}^{r^2}N(S_{d-2}(\sqrt{s}))N(S_2(\sqrt{r^2-s})). \end{aligned}$$

The number of ways a positive integer \(n\) can be represented as a sum of two squares is known to be at most 4 times \(d(n)\), the number of divisors of \(n\), and there exists a constant \(c^{\prime }>0\) so that \(d(n)\le 2^{{c^{\prime }\log {n}}/{\log \log {n}}}\) (see, e.g., [1, Sect. 13.10]). Hence, we have

$$\begin{aligned} N(S_d(r))&\le 4 d(r^2)\sum \limits _{s=1}^{r^2}N(S_{d-2}(\sqrt{s})) \\&\le 2^{\frac{2c^{\prime }\log {r}}{\log \log {r}}}N(B_{d-2}(r)). \end{aligned}$$

\(\square \)

Proof

(of Theorem 1 for even \(k\)) We will give a construction of a point set \(P\) containing no \(k+1\) collinear points, with a high value of \(t_k(P)\).

For a positive integer \(d\) let us set \(r_0=2^{d}\). For each integer point from \(B_d(r_0)\), the square of its distance to the origin is at most \(r_0^2\). As the square of that distance is an integer, we can apply pigeonhole principle to conclude that there exists \(r\), with \(0< r \le r_0\), such that the sphere \(S_d(r)\) contains at least \(1/r_0^2\) fraction of points from \(B_d(r_0)\), and together with Lemma 1, we have

$$\begin{aligned} N(S_d(r))\ge \frac{1}{r_0^2} N(B_d(r_0)) \ge \frac{1}{r_0^2} V(B_d(r_0-\sqrt{d}/2)). \end{aligned}$$

Let us consider the unordered pairs of different points from \(\mathbb Z ^d \cap S_d(r)\). The total number of such pairs is at least

$$\begin{aligned} \Big (\begin{array}{c}{N(S_d(r))}\\ {2}\end{array}\Big ) \ge \Big (\begin{array}{c}{\frac{V(B_d(r_0-\sqrt{d}/2))}{r_0^2}}\\ {2}\end{array}\Big ). \end{aligned}$$

For every \(p,q\in \mathbb Z ^d \cap S_d(r)\) the Euclidean distance \(d(p,q)\) between \(p\) and \(q\) is at most \(2r\), and the square of that distance is an integer. Hence, there are at most \(4r^2\) different possible values for \(d(p,q)\). Applying pigeonhole principle again, we get that there are at least

$$\begin{aligned} \frac{1}{4r^2} \Big (\begin{array}{c}{\frac{V(B_d(r_0-\sqrt{d}/2))}{r_0^2}}\\ {2}\end{array}\Big )\ge \frac{V(B_d(r_0-\sqrt{d}/2))^2}{8r_0^6} \end{aligned}$$
(2)

pairs of points from \(\mathbb Z ^d \cap S_d(r)\) that all have the same distance. We denote that distance by \(\ell \).

Let \(p_1,q_1\in \mathbb Z ^d \cap S_d(r)\) with \(d(p_1,q_1)=\ell \), and let \(s\) be the line going through \(p_1\) and \(q_1\). We define \(k-2\) points \(p_2,\ldots ,p_{k/2}\), \(q_2, \ldots , q_{k/2}\) on the line \(s\) such that \(d(p_i,p_{i+1})=\ell \) and \(d(q_i,q_{i+1})=\ell \), for all \(1\le i<k/2\), and all \(k\) points \(p_1,\ldots ,p_{k/2}\), \(q_1, \ldots , q_{k/2}\) are different, see Fig. 2.

Fig. 2
figure 2

Line \(s\) with \(k\) points, for \(k\) even

Knowing that \(p_1\) and \(q_1\) are points from \(\mathbb Z ^d\), the way we defined points \(p_2,\ldots ,p_{k/2}, q_2, \ldots , q_{k/2}\) implies that they have to be in \(\mathbb Z ^d\) as well. If we set

$$\begin{aligned} r_i:=\sqrt{r^2 + i(i-1)\ell ^2}\le \sqrt{r^2+ (2ir)^2}\le r(k+1), \end{aligned}$$
(3)

for all \(i=1,\ldots ,k/2\), then the points \(p_i\) and \(q_i\) belong to the sphere \(S_d(r_i)\), and hence, \(p_i, q_i\in \mathbb Z ^d \cap S_d(r_i)\), for all \(i=1,\ldots ,k/2\), see Fig. 3.

Fig. 3
figure 3

The position of the \(k\) points related to the origin, for \(k\) even

We define the point set \(P\) to be the set of all integer points on spheres \(S_d(r_i)\), for all \(i=1,\ldots ,k/2\), that is

$$\begin{aligned} P:=\mathbb Z ^d \cap \big ( \mathop {\bigcup }\limits _{i=1}^{k/2} S_d(r_i) \big ). \end{aligned}$$

As the point set \(P\) is contained in the union of \(k/2\) spheres, there are obviously no \(k+1\) collinear points in \(P\). On the other hand, every pair of points \(p_1,q_1\in \mathbb Z ^d \cap S_d(r)\) with \(d(p_1,q_1)=\ell \) defines one line that contains \(k\) points from \(P\).

If we set \(n:=|P|\), and if \(d\) is large enough, we have

$$\begin{aligned} n&= \sum \limits _{i=1}^{k/2} N(S_d(r_i)) \nonumber \\&\le \sum \limits _{i=1}^{k/2}2^{\frac{c_1\log {r_i}}{\log \log {r_i}}}N(B_{d-2}(r_i))\nonumber \\&\le k 2^{\frac{c_2 d}{\log {d}}} N(B_{d-2}((k+1)2^d)) \\&\le k 2^{\frac{c_2 d}{\log {d}}} V(B_{d-2} (2^d(k+1+\sqrt{d} / 2^{d+1})))\nonumber \\&\le k 2^{\frac{c_2 d}{\log {d}}} \frac{2^{d(d-2)} (k+1+\sqrt{d} / 2^{d+1} )^{d-2} \pi ^{(d-2)/2} }{ \frac{\sqrt{\pi d} (d/2e)^{d/2}}{d/2} } \nonumber \\&\le \frac{2^{d(d-2)} (k+2)^{d} \pi ^{d/2} (2e)^{d/2}}{ d^{d/2} } \nonumber \\&\le 2^{d^2- \frac{d}{2}\log d + \frac{d}{2} \log ( \frac{(k+2)^2\pi e}{8}) } \nonumber , \end{aligned}$$
(4)

where \(c_1,c_2\) are constants depending only on \(k\). Here, we used Lemmas 1, 2, (1), (3), and the standard estimates for the function \(\Gamma \).

On the other hand, we get from (2) and (1) that the number of lines containing exactly \(k\) points from \(P\) is

$$\begin{aligned} t_k(P)&\ge \Big ( \frac{\big ( 2^d -\frac{\sqrt{d}}{2}\big )^d \pi ^{d/2} (2e)^{d/2}}{\sqrt{\frac{4\pi }{d}} \, d^{d/2}} \Big )^2 \frac{1}{2^{6d+3}} \\&\ge 2^{2d^2- d\log d - d \log \big ( \frac{64}{\pi e}\Big )} . \end{aligned}$$

Putting the previous two inequalities together it follows that there exist a constant \(n_0\) depending on \(k\), such that for \(n>n_0\) we have

$$\begin{aligned} t_k(P) \ge \frac{n^2}{2^{c\sqrt{\log {n}}}} = n^{2-\frac{c}{\sqrt{\log {n}}}}, \end{aligned}$$

where \(c=2\log (3k+6)\).

To obtain a point set in two dimensions, we project the \(d\) dimensional point set to a two dimensional plane in \(\mathbb R ^d\). The vector \(v\) along which we project should be chosen generically, so that every two points from our point set are mapped to different points, and every three points that are not collinear are mapped to points that are still not collinear.\(\square \)

2.2 Proof for Odd \(k\)

Proof

(of Theorem 1 for odd \(k\)) We will give a construction of a point set \(P\) containing no \(k+1\) collinear points, with a high value of \(t_k(P)\).

For a positive integer \(d\) let us set \(r_0=2^{d}\). In the same way as in the proof for even \(k\), we can find \(r\) with \(0< r \le r_0\), such that the sphere \(S_d(r)\) contains at least a \(1/r_0^2\) fraction of the integer points from \(B_d(r_0)\), and hence

$$\begin{aligned} N(S_d(r))\ge \frac{1}{r_0^2} N(B_d(r_0)) \ge \frac{1}{r_0^2} V(B_d(r_0-\sqrt{d}/2)). \end{aligned}$$

Now, for every point \(p\in \mathbb Z ^d \cap S_d(r)\) there is a corresponding point \(p^{\prime }\) on the sphere \(S_d(2r)\) that belongs to the half-line from the origin to \(p\). It is not hard to see that all coordinates of \(p^{\prime }\) are even integers, so \(p^{\prime }\in (2\mathbb Z )^d \cap S_d(2r)\). Hence, the number of points in \((2\mathbb Z )^d \cap S_d(2r)\) is at least \(N(S_d(r))\).

We look at unordered pairs of different points from \((2\mathbb Z )^d \cap S_d(2r)\). The total number of such pairs is at least

$$\begin{aligned} \Big (\begin{array}{c}{N(S_d(r))}\\ {2}\end{array}\Big ) \ge \Big (\begin{array}{c}{\frac{V(B_d(r_0-\sqrt{d}/2))}{r_0^2}}\\ {2}\end{array}\Big ). \end{aligned}$$

If we just look at such pairs of points that have different first coordinate, we surely have at least half as many pairs as before. To see that, observe that for every point \(p\in (2\mathbb Z )^d \cap S_d(2r)\), a point obtained from \(p\) by changing the sign of any number of coordinates of \(p\) and/or permuting the coordinates is still in \((2\mathbb Z )^d \cap S_d(2r)\).

For every \(p,q\in (2\mathbb Z )^d \cap S_d(2r)\) we know that the Euclidean distance \(d(p,q)\) between \(p\) and \(q\) is at most \(4r\), and that the square of that distance is an integer. Hence, there are at most \(16r^2\) different possible values for \(d(p,q)\). Applying pigeonhole principle again, we get that there are at least

$$\begin{aligned} \frac{1}{16r^2} \cdot \frac{1}{2} \Big (\begin{array}{c}{\frac{V(B_d(r_0-\sqrt{d}/2))}{r_0^2}}\\ {2}\end{array}\Big )\ge \frac{V(B_d(r_0-\sqrt{d}/2))^2}{64 r_0^6} \end{aligned}$$
(5)

pairs of points from \((2\mathbb Z )^d \cap S_d(2r)\) with different first coordinate that have the same distance. We denote that distance by \(2\ell \). Note that since both \(p\) and \(q\) are contained in \((2\mathbb Z )^d\), we have that the middle point \(m\) of the segment \(pq\) belongs to \(\mathbb Z ^d\), and \(d(p,m)=d(q,m)=\ell \).

Let \(p_1,q_1\in (2\mathbb Z )^d \cap S_d(2r)\) with \(d(p_1,q_1)=2\ell \), let \(m_0\) be the middle point of the segment \(p_1q_1\), and let \(s\) be the line going through \(p_1\) and \(q_1\). We define \(k-3\) points \(p_2,\ldots ,p_{(k-1)/2}\), \(q_2, \ldots , q_{(k-1)/2}\) on the line \(s\) such that \(d(p_i,p_{i+1})=\ell \) and \(d(q_i,q_{i+1})=\ell \), for all \(1\le i<(k-1)/2\), and all \(k\) points \(m_0,p_1,\ldots ,p_{(k-1)/2}\), \(q_1, \ldots , q_{(k-1)/2}\) are different, see Fig. 4.

Fig. 4
figure 4

Line \(s\) with \(k\) points, for \(k\) odd

Knowing that \(p_1\) and \(q_1\) are points from \((2\mathbb Z )^d\), the way we defined points \(m_0, p_2,\ldots ,p_{(k-1)/2}\), \(q_2, \ldots , q_{(k-1)/2}\) implies that they have to be in \(\mathbb Z ^d\). If we set

$$\begin{aligned} r_i:=\sqrt{4r^2 + (i+1)(i-1)\ell ^2} \le r(k+1), \end{aligned}$$
(6)

for all \(i=0,\ldots ,(k-1)/2\), the points \(p_i\) and \(q_i\) belong to the sphere \(S_d(r_i)\), and the point \(m_0\) belongs to \(S_d(r_0)\). Hence, \(p_i, q_i\in \mathbb Z ^d \cap S_d(r_i)\), for all \(i=1,\ldots ,(k-1)/2\), and \(m_0\in \mathbb Z ^d \cap S_d(r_0)\), see Fig. 5.

Fig. 5
figure 5

The position of the \(k\) points related to the origin, for \(k\) odd

By \(\alpha _x\) we denote the hyperplane containing all points in \(\mathbb R ^d\) with first coordinate equal to \(x\). Let \(M\) be the multiset of points \(m\) such that there exist points \(p,q\in (2\mathbb Z )^d \cap S_d(2r)\) having different first coordinate, with \(d(p,q)=2\ell \), and with \(m\) being the middle point of the segment \(pq\). In this multiset, we include the point \(m\) once for every such \(p\) and \(q\). We know that \(M\subseteq \mathbb Z ^d \cap S_d(r_0)\), and each point from \(\mathbb Z ^d \cap S_d(r_0)\) is contained in \(\alpha _x\) for some integer \(-r_0\le x \le r_0\). Hence, by the pigeonhole principle, we get from (5) that there exists \(-r_0\le x_0 \le r_0\) such that \(\alpha _{x_0} \cap M\) contains at least

$$\begin{aligned} \frac{|M|}{2r_0}\ge \frac{V(B_d(r_0-\sqrt{d}/2))^2}{128 r_0^7} \end{aligned}$$
(7)

points.

We define the point set \(P\) to be the set of all integer points on spheres \(S_d(r_i)\), for all \(i=1,\ldots ,(k-3)/2\), all integer points on \(S_d(r_{(k-1)/2})\) that do not belong to \(\alpha _{x_0}\), and all integer points on \(S_d(r_0)\) that belong to \(\alpha _{x_0}\). I.e., we have

$$\begin{aligned} P:=\mathbb Z ^d \cap \big (\big ( \mathop {\bigcup }\limits _{i=1}^{(k-3)/2} S_d(r_i) \big ) \cup \big ( S_d(r_{(k-1)/2}) \setminus \alpha _{x_0} \big ) \cup \big ( S_d(r_0)\cap \alpha _{x_0} \big ) \big ). \end{aligned}$$

Let us first prove that the point set \(P\) does not contain \(k+1\) collinear points. As \(P\) is contained in the union of \((k-1)/2\) spheres and a hyperplane, any line that is not contained in that hyperplane cannot contain more than \(k\) points from \(P\). But the point set \(P\) restricted to the hyperplane \(\alpha _{x_0}\) belongs to the union of \((k-1)/2\) spheres \(S_d(r_i)\), for \(i=0,\ldots ,(k-3)/2\), so we can also conclude that there are no \(k+1\) collinear points in \(P\cap \alpha _{x_0}\).

Let \(n:=|P|\). Obviously, \(P\subseteq \bigcup _{i=0}^{(k-1)/2} S_d(r_i)\), so we can estimate the value of \(n\) similarly as in the even case. We have

$$\begin{aligned} n&\le \sum \limits _{i=0}^{\frac{k-1}{2}} N(S_d(r_i)) \\&\le \sum \limits _{i=0}^{\frac{k-1}{2}} 2^{\frac{c_1\log {r_i}}{\log \log {r_i}}}N (B_{d-2}(r_i)) \\&\le k2^{\frac{c_2 d}{\log {d}}}N(B_{d-2}((k+1)2^d)) \\&\le 2^{d^2- \frac{d}{2}\log d + \frac{d}{2} \log ( \frac{(k+2)^2\pi e}{8}) }, \end{aligned}$$

where \(c_1,c_2\) are constants depending only on \(k\). The last estimate was done the same way as in the case where \(k\) is even, as the third line of the calculation is exactly the same as the one obtained in (4).

On the other hand, every pair of points \(p_1,q_1\in \mathbb Z ^d \cap S_d(r)\) with different first coordinate, with \(d(p_1,q_1)=2\ell \), and with the middle point that belongs to \(\alpha _{x_0} \cap M\), defines one line that contains \(k\) points from \(P\). Note that such line cannot belong to \(\alpha _{x_0}\), as the first coordinates of \(p_1\) and \(q_1\) cannot be \(x_0\) simultaneously.

Hence, we get from (7) and (1) that the number of lines containing exactly \(k\) points from \(P\) is

$$\begin{aligned} t_k(P)&\ge \Big ( \frac{\big ( 2^d -\frac{\sqrt{d}}{2}\big )^d \pi ^{d/2} (2e)^{d/2}}{\sqrt{\frac{4\pi }{d}} \, d^{d/2}} \Big )^2 \frac{1}{2^{7d+7}} \\&\ge 2^{2d^2- d\log d - d \log \big ( \frac{128}{\pi e}\big )} . \end{aligned}$$

Putting the last two inequalities together we get that there exists a constant \(n_0\) depending on \(k\), such that for \(n>n_0\) we have

$$\begin{aligned} t_k(P) \ge \frac{n^2}{2^{c\sqrt{\log {n}}}}= n^{2-\frac{c}{\sqrt{\log {n}}}}, \end{aligned}$$

where \(c=2\log (4k+9)\).

To obtain a point set in two dimensions, we project the \(d\) dimensional point set to a two dimensional plane in \(\mathbb R ^d\), similarly as in the even case.\(\square \)