Szemeredi’s theorem says that relatively dense sets contain arithmetic progressions. The purpose of this chapter is to present a result of Leth from [89] which shows that certain sparse sets contain “near” arithmetic progressions. We then detail the connection between the aforementioned theorem of Leth and the Erdős-Turán conjecture.

1 The Main Theorem

We begin by making precise the intuitive notion of “near” arithmetic progression mentioned in the introduction.

Definition 14.1

Fix \(w\in \mathbb {N}_0\) and \(t,d\in \mathbb {N}\).Footnote 1 A (t, d, w)-progression is a set of the form

$$\displaystyle \begin{aligned}\operatorname{B}(b,t,d,w):=\bigcup_{i=0}^{t-1}[b+id,b+id+w].\end{aligned}$$

By a block progression we mean a (t, d, w)-progression for some t, d, w.

Note that a (t, d, 0)-progression is the same thing as a t-term arithmetic progression with difference d.

Definition 14.2

If \(A\subseteq \mathbb {N}\), we say that A nearly contains a (t, d, w)-progression if there is a (t, d, w)-progression \(\operatorname {B}(b,t,d,w)\) such that A ∩ [b + id, b + id + w] ≠ ∅ for each i = 1, …, t − 1.

Thus, if A nearly contains a (t, d, 0)-progression, then A actually contains a t-term arithmetic progression. Consequently, when A nearly contains a (t, d, w)-progression with “small” w, then this says that A is “close” to containing an arithmetic progression. The main result of this chapter allows us to conclude that even relatively sparse sets with a certain amount of density regularity nearly contain block progressions satisfying a further homogeneity assumption that we now describe.

Definition 14.3

Suppose that \(A\subseteq \mathbb {N}\), I is an interval in \(\mathbb {N}\), and 0 < s < 1. We say that A nearly contains a (t, d, w)-progression in I with homogeneity s if there is some \(\operatorname {B}(b,t,d,w)\) contained in I such that the following two conditions hold for all i, j = 0, 1, …, t − 1:

  1. (i)

    δ(A, [b + id, b + id + w]) ≥ (1 − s)δ(A, I)

  2. (ii)

    δ(A, [b + id, b + id + w]) ≥ (1 − s)δ(A, [b + jd, b + jd + w]).

Thus, for small s, we see that A meets each block in a density that is roughly the same throughout and that is roughly the same as on the entire interval.

The density regularity condition roughly requires that on sufficiently large subintervals of I, the density does not increase too rapidly. Here is the precise formulation:

Definition 14.4

Suppose that \(I\subseteq \mathbb {N}\) is an interval, \(r\in \mathbb {R}\), r > 1, and \(m\in \mathbb {N}\). We say that A ⊆ I has the (m, r)-density property on I if, whenever J ⊆ I is an interval with |J|∕|I|≥ 1∕m, then δ(A, J) ≤ (A, I).

Of course, given any \(m\in \mathbb {N}\) and A ⊆ I, there is \(r\in \mathbb {R}\) such that A has the (m, r)-density property on I. The notion becomes interesting when we think of r as fixed.

Given a hyperfinite interval \(I\subseteq { }^{\ast }\mathbb {N}\), \(r\in { }^{\ast }\mathbb {R}\), r > 1, and \(M\in { }^{\ast }\mathbb {N}\), we say that an internal set A ⊆ I has the internal (M, r)-density property on I if the conclusion of the definition above holds for internal subintervals J of I.

Lemma 14.5

Suppose that A ⊆ [1, N] is an internal set with the internal (M, r)-density property for some \(M>\mathbb {N}\) . Let f : [0, 1] → [0, 1] be the (standard) function given by

$$\displaystyle \begin{aligned}f(x):=\operatorname{st}\left(\frac{|A\cap [1,xN]|}{|A\cap [1,N]|}\right).\end{aligned}$$

Then f is a Lipschitz function with Lipschitz constant r.

Proof

Fix x < y in [0, 1]. Write \(x:=\operatorname {st}(K/N)\) and \(y:=\operatorname {st}(L/N)\). Since y − x ≠ 0, we have that \(\frac {L-K}{N}\) is not infinitesimal; in particular, \(\frac {L-K}{N}>1/M\). Since A has the (M, r)-density property on [1, N], we have that δ(A, [K, L]) ≤ (A, [1, N]). Thus, it follows that

$$\displaystyle \begin{aligned} f(y)-f(x)=\operatorname{st}\left(\frac{|A\cap [K,L]|}{|A\cap [1,N]|}\right)&=\operatorname{st}\left(\delta(A,[K,L]\frac{L-K}{|A\cap [1,N]|}\right) \\ & \leq r\operatorname{st}\left(\frac{L-K}{N}\right)=r(y-x). \end{aligned} $$

Here is the main result of this section:

Theorem 14.6 (Leth)

Fix functions \(g,h:\mathbb {R}^{>0}\to \mathbb {R}^{>0}\) such that h is increasing and g(x) →∞ as x ∞. Fix also s > 0, r > 1, and \(j,t\in \mathbb {N}\) . Then there is \(m=m(g,h,s,r,t,j)\in \mathbb {N}\) such that, for all n > m, whenever I is an interval of length n and A  I is nonempty and has the (m, r)-density property on I, then A contains a (t, d, w)-almost progression with homogeneity s such that wd < h(dn) and 1∕g(m) < dn < 1∕j.

Roughly speaking, if A has sufficient density regularity, then A contains an almost-progression with “small” w (small compared to the distance of the progression).

The proof of the theorem relies on the following standard lemma; see [89, Lemma 1].

Lemma 14.7

Suppose that \(E\subseteq \mathbb {R}\) has positive Lebesgue measure and \(t\in \mathbb {N}\) . Then there is v > 0 such that, for all 0 < u < v, there is an arithmetic progression in E of length t and difference u.

We stress that in the previous lemma, u and v are real numbers.

Proof of Theorem 14.6

Fix g, h, s, r, j, t as in the statement of Theorem 14.6. We show that the conclusion holds for all infinite M, whence by underflow there exists \(m\in \mathbb {N}\) as desired. Thus, we fix \(M>\mathbb {N}\) and consider N > M, an interval \(I\subseteq { }^{\ast }\mathbb {N}\) of length N, and a hyperfinite subset A ⊆ I that has the internal (M, r)-density property on I. Without loss of generality, we may assume that I = [1, N]. Suppose that we can find \(B,D,W\in { }^{\ast }\mathbb {N}\) and standard c > 0 such that [B, B + (t − 1)D + W] ⊆ [1, N] and, for all i = 0, 1, …, t − 1, we have:

$$\displaystyle \begin{aligned}\delta(A,[1,N])(c-\frac{s}{2})\leq \delta(A,[B+iD,B+iD+W])\leq \delta(A,[1,N])(c+\frac{s}{4}). \quad (\dagger)\end{aligned}$$

We claim that A nearly contains the internal (t, D, W)-progression \(\operatorname {B}(B,t,D,W)\) with homogeneity s. Indeed, item (i) of Definition 14.3 is clear. For item (ii), observe that

$$\displaystyle \begin{aligned} \delta(A,[B+iD,B+iD+W])&\geq \delta(A,[1,N])(c-\frac{s}{2}) \\ &\geq \delta(A,[B+jD,B+jD+W])(\frac{c-\frac{s}{2}}{c+\frac{s}{4}}) \end{aligned} $$

and note that \(\frac {c-\frac {s}{2}}{c+\frac {s}{4}}>1-s\). Thus, it suffices to find B, D, W, c satisfying (†) and for which WD < h(DN) and 1∕g(M) < DN < 1∕j.

Let f be defined as in the statement of Lemma 14.5. Set \(b:=\operatorname {st}(B/N)\), \(d:=\operatorname {st}(D/N)\), and \(w:=\operatorname {st}(W/N)\). Assume that w ≠ 0. Then we have that

$$\displaystyle \begin{aligned}\operatorname{st}\left(\frac{\delta(A,[B+iD,B+iD+W])}{\delta(A,[1,N])}\right)=\frac{f(b+id+w)-f(b+id)}{w}.\end{aligned}$$

We thus want to find B, D, W and c satisfying

$$\displaystyle \begin{aligned}c-\frac{s}{2}< \frac{f(b+id+w)-f(b+id)}{w}<c+\frac{s}{4}. \quad (\dagger\dagger)\end{aligned}$$

Now the middle term in (††) looks like a difference quotient and the idea is to show that one can bound f′(b + id) for i = 0, 1, …, t − 1. Indeed, by Lemma 14.5, f is Lipschitz, whence it is absolutely continuous. In particular, by the Fundamental Theorem of Calculus, f is differentiable almost everywhere and \(f(x)=\int _0^x f'(u)du\). Since f(0) = 0 and f(1) = 1, it follows that \(\{x\in [0,1] \ : \ f'(x)\geq (1-\frac {s}{4})\}\) has positive measure. In particular, there is c > 1 such that

$$\displaystyle \begin{aligned}E:=\{x\in [0,1] \ : \ c-\frac{s}{4}\leq f'(x)\leq c\}\end{aligned}$$

has positive measure. By Lemma 14.7, there is b ∈ E and 0 < u < 1∕j such that b, b + u, b + 2u, …, b + (t − 1)u ∈ E. Take B, D ∈ [1, N] such that \(b=\operatorname {st}(B/N)\) and \(u=\operatorname {st}(D/N)\). Note that g(M) is infinite and DN is noninfinitesimal, so 1∕g(M) < DN < 1∕j. It remains to choose W. Since f is differentiable on E, there is w > 0 sufficiently small so that for all i = 0, 1, …, t − 1, we have \(|f'(b+id)-\frac {f(b+id+w)-f(b+id)}{w}|<\frac {s}{4}\). For this w, (††) clearly holds; we now take W such that \(w=\operatorname {st}(W/N)\). Since h(DN) is nonfinitesimal (as DN is noninfinitesimal), if w is chosen sufficiently small, then WD < h(DN).

Theorem 14.6 implies a very weak form of Szemeredi’s theorem.

Corollary 14.8

Suppose that \(\operatorname {BD}(A)>0\) . Suppose that g, h, s, t, j are as in the hypothesis of Theorem 14.6 . Then for n sufficiently large, there is an interval I of length n such that A  I contains a (t, s, d)-almost progression in I with wd < h(dn) and 1∕g(m) < dn < 1∕j.

Proof

Take \(r\in \mathbb {R}\) with r > 1 satisfying \(\operatorname {BD}(A)>1/r\). Let m := m(g, h, s, r, t, j) as in the conclusion of Theorem 14.6. Let n > m and take an interval I of length n such that δ(A, I) > 1∕r. It remains to observe that A ∩ I has the (m, r)-density property on I.

2 Connection to the Erdős-Turán Conjecture

Leth’s original motivation was the following conjecture of Erdős and Turán from [47]:

Conjecture 14.9 (Erdős-Turán)

Suppose that A = (a n) is a subset of \(\mathbb {N}\) such that \(\sum 1/a_n\) diverges. Then A contains arbitrarily long arithmetic progressions.

Leth first observed the following standard fact about the densities of sequences satisfying the hypotheses of the Erdős-Turán conjecture.

Lemma 14.10

Suppose that A = (a n) is enumerated in increasing order and is such that \(\sum 1/a_n\) diverges. Then, for arbitrarily large n, one has \(\delta (A,n)>1/(\log n)^2\).

Proof

We argue by contrapositive. Suppose that \(\delta (A,n)\leq 1/(\log n)^2\) for all n ≥ n 0 ≥ 4. We first show that this implies that \(a_n\geq \frac {1}{2}n(\log n)^2\) for all n > n 0. Suppose otherwise and fix n ≥ n 0. Then \(|A\cap [1,\frac {1}{2}n(\log n)^2]|\geq n\). On the other hand, by our standing assumption, we have that

$$\displaystyle \begin{aligned}|A\cap [1,\frac{1}{2}n(\log n)^2])\leq \frac{1/2n(\log n)^2}{(\log((1/2n(\log n))^2}\leq \frac{1}{2}n,\end{aligned}$$

yielding the desired contradiction.

Since \(a_n\geq \frac {1}{2}n(\log n)^2\) eventually, we have that

$$\displaystyle \begin{aligned}\sum \frac{1}{a_n}\leq \sum \frac{2}{n(\log n)^2},\end{aligned}$$

whence \(\sum \frac {1}{a_n}\), converges.

The truth of the following conjecture, together with the theorem that follows it, would imply that, for sets satisfying the density condition in the previous lemma, the existence of almost arithmetic progressions implies the existence of arithmetic progressions.

Conjecture 14.11 (Leth)

Fix \(t\in \mathbb {N}\) and c > 0. Then there is n 0 := n 0(t, c) such that, for all n ≥ n 0, whenever \(A\subseteq \mathbb {N}\) is such that \(\delta (A,n)>1/(c\log n)^{2\log \log n}\), then A nearly contains a (t, d, w)-progression on [1, n] with wd < dn where d is a power of 2.

We should remark that requiring that d be a power of 2 is not much of an extra requirement. Indeed, our proof of Theorem 14.6 shows that one can take d there to be a power of 2. For any t and c, we let L(t, c) be the statement that the conclusion of the previous conjecture holds for the given t and c. We let L(t) be the statement that L(t, c) holds for all c > 0.

Theorem 14.12

Suppose that L(t) is true for a given \(t\in \mathbb {N}\) . Further suppose that \(A\subseteq \mathbb {N}\) is such that there is c > 0 for which, for arbitrarily large n, one has \(\delta (A,n)>c/(\log n)^2\) . Then A contains an arithmetic progression of length t.

Before we prove this theorem, we state the following standard combinatorial fact, whose proof we leave as an exercise to the reader (alternatively, this is proven in [89, Proposition 1]).

Proposition 14.13

Let \(m,n\in \mathbb {N}\) be such that m < n, let \(A\subseteq \mathbb {N}\) , and let I be an interval of length n. Then there is an interval J  I of length m such that δ(A, J) > δ(A, I)∕2.

Proof of Theorem 14.12

For reasons that will become apparent later in the proof, we will need to work with the set 2A rather than A. Note that 2A satisfies the hypothesis of the theorem for a different constant c′ > 0.

By overflow, we may find \(M>\mathbb {N}\) such that \(\delta ({ }^{\ast }(2A),M)>\frac {c'}{(\log M)^2}\). Take \(L>\mathbb {N}\) such that \(2^{2^L}\leq M<2^{2^{L+1}}\) and set \(N:=2^{2^{L}}\). If we apply Proposition 14.13 to any n ≤ N and I = [1, N], we can find an interval I n ⊆ [1, M] of length n such that

$$\displaystyle \begin{aligned}|{}^{\ast}(2A)\cap I_n|>\frac{c'M}{2(\log M)^2}\geq \frac{c'M}{2(\log 2^{2^{L+1}})^2}=\frac{c'/8}{(\log N)^2}.\end{aligned}$$

For 1 ≤ k ≤ L, write \(I_{2^{2^k}}=[x_k,y_k]\).

We will now construct an internal set B ⊆ [1, N] such that \(\delta (B,N)>\frac {1}{(c''\log N)^{2\log \log N}}\), where \(c'':=\sqrt {8/c'}\). Since we are assuming that L(t) holds, by transfer we will be able to find an internal (t, d, w)-progression nearly inside of B with wd < dN and w and d both powers of 2. The construction of B will allow us to conclude that (2A) contains a t-termed arithmetic progression of difference d, whence so does 2A by transfer, and thus so does A.

Set B 0 := [1, N] and, for the sake of describing the following recursive construction, view B 0 as the union of two subintervals of length \(N/2=2^{2^L-1}=2^{2^L-2^0}\); we refer to these subintervals of B 0 as blocks. Now divide each block in B 0 into \(2=2^{2^0}\) intervals of length \(2^{2^L-{2^0}}/2^{{2^0}}=2^{2^L-2^1}\) and, for each \(0\leq j< 2^{2^0}\), we place the j th subblock of each block in B 0 into B 1 if and only if x 0 + j ∈2A.

Now divide each block in B 1 into \(2^{2^1}\) intervals of length \(2^{2^L-2^1}/2^{2^1}=2^{2^L-2^2}\) and, for each \(0\leq j<2^{2^1}\), we place the jth subblock of each block in B 1 into B 2 if and only if x 1 + j ∈2A.

We continue recursively in this manner. Thus, having constructed the hyperfinite set B k, which is a union of blocks of length \(2^{2^L-2^k}\), we break each block of B k into \(2^{2^k}\) many intervals of length \(2^{2^L-2^k}/2^{2^k}=2^{2^L-2^{k+1}}\) and we place the jth subblock of each block in B k into B k+1 if and only if x k + j ∈2A.

We set B := B L. Since \(|B_{k+1}|/|B_k|>\frac {c'/8}{(\log N)^2}\) for each 0 ≤ k < L, it follows that

$$\displaystyle \begin{aligned}|B|>\frac{(c'/8)^LN}{(\log N)^{2L}}=\frac{N}{(c''\log N)^{2\log\log N}}.\end{aligned}$$

By applying transfer to L(t), we have that B nearly contains an internal (t, d, w)-progression \(\operatorname {B}(b,t,d,w)\) contained in [1, N] such that wd < dN and d is a power of 2. Take k such that \(2^{2^L-2^{k+1}}\leq d<2^{2^L-2^k}\). Note that this implies that \(2^{2^L-2^{k+1}}\mid d\). Also, we have

$$\displaystyle \begin{aligned}w<(d/N)\cdot d<(2^{-2^k})2^{2^L-2^k}=2^{2^L-2^{k+1}}.\end{aligned}$$

We now note that \(\operatorname {B}(b,t,d,w)\) must be contained in a single block C of B k. Indeed, since \(d\mid 2^{2^L-2^k}\) and \(w\mid 2^{2^L-2^{k+1}}\), we have \(d+w<(\frac {1}{2}+\frac {1}{2^{2^k}})(2^{2^L-2^k})\), whence the fact that [b, b + w] and [b + d, b + d + w] both intersect B k would imply that [x k−1, y k−1] contains consecutive elements of 2A, which is clearly a contradiction.

Now write \(d=m\cdot 2^{2^L-2^{k+1}}\). Take \(0\leq j<2^{2^k}\) so that [b, b + w] intersects B k+1 in the jth subblock of C so x k + j ∈2A. Since [b + d, b + d + w] ∩ B k+1 ≠ ∅, we have that at least one of x k + j + (m − 1), x k + j + m, or x k + j + (m + 1) belong to (2A). However, since x k + j and m are both even, it follows that we must have x k + j + m ∈(2A). Continuing in this matter, we see that x k + j + im ∈2A for all i = 0, 1, …, t − 1. It follows by transfer that 2A contains a t-term arithmetic progression, whence so does A.

Putting everything together, we have:

Corollary 14.14

The Erdős-Turán conjecture follows from Leth’s Conjecture .

Leth used Theorem 14.6 to prove the following theorem, which is similar in spirit to Conjecture 14.6, except that it allows sparser sequences but in turn obtains almost progressions with weaker smallness properties relating d and w.

Theorem 14.15

Suppose that s > 0 and \(t\in \mathbb {N}^{>2}\) are given. Further suppose that h is as in Theorem 14.6 . Let \(A\subseteq \mathbb {N}\) be such that, for all 𝜖 > 0, we have δ(A, n) > 1∕n 𝜖 for sufficiently large n. Then for sufficiently large n, A nearly contains an (t, d, w)-progression on [1, n] of homogeneity s with \(w/d<h(\log d/\log n)\) , where d is a power of 2.

Proof

Suppose that the conclusion is false. Then there is N such that A does not nearly contain any internal (t, d, w)-progression on [1, N] of homogeneity s with \(w/d<h(\log d/\log N)\). It suffices to show that there is 𝜖 > 0 such that δ( A, N) < 1∕N 𝜖. Let m be as in the conclusion of Theorem 14.6 with r = 2 and g(x) = x (and h as given in the assumptions of the current theorem).

Claim

If I ⊆ [1, N] is a hyperfinite interval with \(|I|>\sqrt {N}\), then A does not have the (m, 2)-density property on I.

We will return to the proof of the claim in a moment. We first see how the claim allows us to complete the proof of the theorem. Let \(K>\mathbb {N}\) be the maximal \(k\in { }^{\ast }\mathbb {N}\) such that m 2k ≤ N, so m 2K ≤ N < m 2K+2. We construct, by internal induction, for i = 0, 1, …, K, a descending chain of hyperfinite subintervals (I i) of I of length m 2Ki as follows. By Proposition 14.13, we may take I 0 to be any hyperfinite subinterval of I of length m 2K such that δ( A, I 0) ≥ δ( A, N)∕2. Suppose that i < K and I i has been constructed such that |I i| = m 2ki. Since A does not have the (m, 2) density property on I i, there is a subinterval I i+1 of length |I i|∕m 2ki−1 with δ( A, I i+1) ≥ 2δ( A, I i). Notice now that I K is a hyperfinite interval of length \(m^K\leq \sqrt {N}<m^{K+1}\) and δ( A, I K) ≥ 2K δ( A, I 0). It follows that

$$\displaystyle \begin{aligned}\delta({}^{\ast}A,N)\leq 2\delta(A,I_0)\leq 2^{-(K-1)}\delta(A,I_K)\leq 2^{-(K-1)}.\end{aligned} $$

It follows that

$$\displaystyle \begin{aligned}|A\cap [1,N]|\leq 2^{-(K-1)}N\leq 2^{-(K-1)}m^{2K+2}=m^{2K+2-(K-1)\frac{\log 2}{\log m}}=(m^{2K})^{1-z}.\end{aligned}$$

if we set \(z:=\frac {(K-1)\log 2}{2K\log m}-\frac {1}{K}\). If we set \(\epsilon :=\operatorname {st}(z/2)=\frac {\log 2}{4\log m}\), then it follows that |A ∩ [1, N]|≤ N 1−𝜖, whence this 𝜖 is as desired.

We now prove the claim. Suppose, towards a contradiction, that I ⊆ [1, N] is a hyperfinite interval with \(|I|>\sqrt {N}\) and is such that A does have the (m, 2)-density property on I. By the choice of m, A nearly contains an internal (t, d, w)-almost progression of homogeneity s with wd < h(d∕|I|) and \(d>|I|/m>\sqrt {N}/m\). Notice now that \(\operatorname {st}\left (\frac {\log d}{\log N}\right )\geq \operatorname {st}\left (\frac {1/2\log N-\log m}{\log N}\right )=\frac {1}{2}\). Note that we trivially have that d∕|I| < 1∕t, whence \(d/|I|<\log d/\log N\); since h is increasing, we have that \(w/d<h(\log d/\log N)\), contradicting the choice of N. This proves the claim and the theorem.

In [90, Theorem 3], Leth shows that one cannot replace \((\log d)/(\log n)\) with dn in the previous theorem.

Notes and References

There are other generalizations of arithmetic progressions appearing in the literature, e.g. the notion of quasi-progression appearing in [131]. It should be noted that they use the term (t, d, w)-progression in a related, but different, manner than it is used in this chapter. The Erdős-Turan conjecture, first formulated in [48], is one of the most important open problems in combinatorial number theory. A positive solution would immediately generalize both Szemeredi’s Theorem and the Green-Tao theorem on the existence of arbitrarily long arithmetic progressions in the primes [64].