1 Introduction

For any integer \(K \ge 1\), a colouring in K colours of the set \({{\mathfrak {Q}}}\) of the squares of the integers is a partition of \({{\mathfrak {Q}}}\) into K disjoint subsets. Each subset of \({{\mathfrak {Q}}}\) in such a partition is called a colour of the colouring. Let s(K), for any integer \(K \ge 1\), be the smallest integer such that given any colouring of \({{\mathfrak {Q}}}\) in K colours, every sufficiently large integer is expressible as a sum of at most s(K) squares, all of the same colour. Then Sárközy remarks on page 29 of [10] that it is easily seen that s(K) is finite for each integer \(K \ge 1\) and, in Problem  40 of the list of problems in [10], Sárközy asks for bounds, in terms of K, for s(K) as well as the corresponding integer in the analogous problem for the set of prime numbers.

Our present contribution towards the solution of Sárközy’s problem for the squares is the following theorem.

Theorem 1.1

For any integer \(K \ge 2\) we have \(s(K) \le K \exp \left( \frac{(3\log 2 + \mathrm{o}(1))\log K}{\log \log K}\right) \).

Here \(o(1) \ll \frac{\log \log \log K}{\log \log K}\) for all large enough K. This improves on the bounds \(s(K) \ll (K\log K)^5\) given by Theorem 1, page 318 of Hegyvári and Hennecart [4] and \(s(K) \ll _{\epsilon } K^{2 +\epsilon }\) given subsequently by Theorem 1.1, page 18 of Akhilesh and Ramana [1]. Moreover, our upper bound for s(K) compares fairly well with the lower bound

$$\begin{aligned} s(K) \ge K \exp \left( \frac{(\log 2 +o(1))\log K}{\log \log K}\right) \end{aligned}$$
(1)

for all \(K \ge 2\) provided by Theorem 2, page 319 of [4].

For the convenience of the reader we summarise here the proof of the lower bound (1) from [4]. For any integer \(m \ge 1\), let \(U_{m}\) be the product of the first m prime numbers. We partition the squares coprime to \(U_m\) by the classes they belong to in \(\mathbf{Z}/U_{m}{} \mathbf{Z}\) and partition the remaining squares by their smallest divisor from the set of primes dividing \(U_m\). This defines a colouring of \({{\mathfrak {Q}}}\). The number of colours in this colouring is \(K_m = m + b_m\), where \(b_m\) is the number of invertible square classes in \(\mathbf{Z}/U_{m}\mathbf{Z}\). It is then verified that at least \(U_m\) summands are required to represent any given squarefree multiple of \(U_m\) as a sum of squares, all of the same colour with respect to this colouring of \({{\mathfrak {Q}}}\). This implies that \(s(K_m) \ge U_m\) for all \(m \ge 1\). The lower bound (1) results on applying this conclusion to m such that \(K_{m} \le K < K_{m+1}\) for a given integer \(K \ge 1\) and using standard estimates on the distribution of prime numbers to express \(U_m\) in terms of K.

We now turn to the proof of Theorem 1.1. As with Ramana and Ramaré [7], which treats Sárközy’s problem for the set of primes, and [1], our proof of Theorem 1.1 ultimately relies on the elegant principle underlying the argument in [4] for the upper bound \(s(K) \ll (K\log K)^5\). We paraphrase this principle in Lemma 1.2 below with the aid of the following notation.

For any subset S of the integers and any integer \(m \ge 1\), we write \(E_{m}(S)\) for the number of tuples \((x_1, x_2, \ldots , x_{2m})\) in \(S^{2m}\) satisfying

$$\begin{aligned} x_1 + x_2 + x_3 + \cdots + x_m = x_{m+1} + x_{m+2} + \cdots + x_{2m}. \end{aligned}$$
(2)

Lemma 1.2

Let NL and m be integers and D a real number satisfying the conditions \(L \ge N \ge 2D(mD+1), D \ge 1\) and \(m \ge 2\). If S is a subset of the integers in the interval \((N, N+L]\) such that

$$\begin{aligned} E_{m}(S)\; \le \;\frac{|S|^{2m} D }{L} \end{aligned}$$
(3)

and if S contains an integer that is not divisible by any prime number \(p \le \lceil mD\rceil \) then every integer \(n \ge (2\lceil mD \rceil +1)m(N+L)\) is a sum of no more than \(\frac{n}{N}\) elements of S.

This lemma is a consequence of a well-known finite addition theorem, also due to Sárközy. We use this theorem in the form provided by Lev [5]. Deferring the detailed proof of Lemma 1.2 to Sect. 4.1, let us describe how this lemma applies to Sárközy’s problem. For an integer \(K \ge 1\), let \({{\mathcal {B}}}\) be the set of squares of integers that are not divisible by any prime \(p \le B\), where B is a fixed but large power of K, say \(B = K^{13}\). For a given integer \(N \ge 1\), let \({{\mathcal {B}}}(N)\) denote \({{\mathcal {B}}} \cap (N,4N]\). It is then readily verified that there is a \(C >0\) such that \(|{{\mathcal {B}}}(N)| \ge \frac{N^{\frac{1}{2}}}{C\log K} \ge K\) when N is large enough. Suppose now that \(\cup _{1 \le i \le K} {{\mathfrak {Q}}}_i\) is a partition of the set \({{\mathfrak {Q}}}\) into K disjoint subsets. Then for some i in [1, K] the set \({{\mathfrak {Q}}}_i \cap {{\mathcal {B}}(N)}\) contains at least \(\frac{|{{\mathcal {B}}(N)}|}{K}\) of the elements of \({{\mathcal {B}}(N)}\). Thus if we set

$$\begin{aligned} S = {{\mathfrak {Q}}}_i \cap (N, 4N], \end{aligned}$$
(4)

then S is a subset of the squares in the interval (N, 4N] satisfying \(|S| \ge N^{\frac{1}{2}}/A\), with \(A =C K \log K \ge 1\). As we verify later (see (57)), it follows from the classical bounds for the number of representations of integers as sums of five squares that for any subset S of the squares in (N, 4N] satisfying \(|S| \ge N^{\frac{1}{2}}/A\) for some \(A \ge 1\) we have

$$\begin{aligned} E_{5}(S)\; \ll |S|^{5}N^{\frac{3}{2}} \; \ll \;\frac{|S|^{10} A^5 }{N}. \end{aligned}$$
(5)

Therefore the bound (3) holds with \(m =5\) and \(L = 3N\) and \(D = C_1 A^5\), for some \(C_1 \ge 1\). Since \(\lceil 5D \rceil \le K^{13}\) when K is large enough and since S contains elements of \({{\mathcal {B}}}\), the set S satisfies the conditions of Lemma 1.2. We then conclude that every integer \(n \ge (200D+60)N\) is a sum of no more than \(\frac{n}{N}\) elements of S. In particular, every integer in the interval \(I(N) = ((200D+60)N, (200D+61)N]\) is a sum of at most \(C_2(K\log K)^5\) squares all belonging to S and hence to \({{\mathfrak {Q}}}_i\), for some \(C_2 >0\). Thus when N is large enough, every integer in I(N) is the sum of no more than \(C_2(K\log K)^5\) squares, all of the same colour. Note, of course, that the colour may vary with N. Nevertheless, since I(N) meets \(I(N+1)\) for all large enough N, we deduce that \(s(K) \ll (K\log K)^5\), as given by [4].

In the remainder of this article we shall show that the argument of the preceding paragraph can be improved to yield Theorem 1.1 essentially by taking S in (4) to be \({{\mathfrak {Q}}}_i \cap {{\mathcal {B}}}(N)\) rather than \({{\mathfrak {Q}}}_i \cap (N, 4N]\). This is on account of the following theorem, suggested by [7] and the recent work of Browning and Prendiville [2].

Theorem 1.3

Let \(A \ge e^{e^2}\) and \(l \ge 12\) be real numbers. Then for all sufficiently large integers N, depending only on A and \(\ell \), and any subset S of the squares in the interval (N, 4N] with \(|S| \ge \frac{N^{\frac{1}{2}}}{A}\) and such that no integer in S is divisible by a prime \(p \le A^{l}\) we have

$$\begin{aligned} E_{6}(S) \,\le \, \frac{|S|^{11}}{N^{\frac{1}{2}}} \exp \left( \frac{\left( 3\log 2 + o_{l}(1)\right) \log A}{\log \log A}\right) , \end{aligned}$$
(6)

where \(o_{\ell }(1) \ll _{\ell } \frac{\log \log \log A}{\log \log A}\).

Let us note that the bound (6) does not necessarily hold if we assume only that S is a subset of the squares in the interval (N, 4N] satisfying \(|S| \ge \frac{N^{\frac{1}{2}}}{A}\) for some \(A \ge 1\). For instance, we may take \(S = \{ A^2n^2 \,|\, M < n \le 2M\}\) where M and A are integers \(\ge 1\). Then with \(N = A^2M^2\) we have \(S \subseteq (N, 4N]\) and \(|S| = M = \frac{N^{\frac{1}{2}}}{A}\). A classical application of the circle method now shows that for some \(C >0\) we have \(E_{6}(S) \sim C M^{10}\) as \(M \rightarrow +\infty \), so that \(E_{6}(S) \gg |S|^{10}\), contradicting (6) when A and M are sufficiently large.

We prove Theorem 1.3 in Sect. 3. Our basic strategy for proving this theorem is similar to that in [7] and goes back to the method of Ramaré and Ruzsa [8]. More precisely, we set \(U = \prod _{ p \le A^{\ell }} p\) and first show that

$$\begin{aligned} E_{6}(S) \, \le \,\frac{5\tau (U)}{2{N}^{\frac{1}{2}}} |\{ x \in S^{11}| f(x) \,\hbox { an invertible square mod}\ 4U \} |+ O\left( \frac{|S|^{11}}{A{N}^{\frac{1}{2}}}\right) , \end{aligned}$$
(7)

where f(x) denotes \(x_1 + x_2+ \cdots +x_6 -x_{7} - \cdots -x_{11}\) for any \(x= (x_1,x_2, \ldots , x_{11}) \in S^{11}\) and \(\tau (U)\) is the number of divisors of U. We obtain (7) by an application of the circle method following [2]. We then complete the proof of Theorem 1.3 by estimating

$$\begin{aligned} |\{ x \in S^{11}| f(x) \,\hbox { an invertible square mod}\ 4U \} | \end{aligned}$$
(8)

using Theorem 2.1 of Sect. 2, which treats a more general problem. In Sect. 4, our concluding section, we finally detail the path from Theorems 1.3 to 1.1.

Throughout this article we use e(z) to denote \(e^{2\pi i z}\), for any complex number z and write \(e_{p}(z)\) for \(e^{\frac{2 \pi i z}{p}}\) when p is a prime number. Further, all constants implied by the symbols \(\ll \) and \(\gg \) are absolute except when dependencies are indicated, either in words or by subscripts to these symbols. The Fourier transform \(\widehat{f}\) of an integrable function f on \(\mathbf{R}\) is defined by \(\widehat{f}(u) = \int _\mathbf{R} f(t)e(-ut) dt\). Finally, the notations [ab], (ab] etc. will denote intervals in \(\mathbf{Z}\), rather than in \(\mathbf{R}\), with end points ab, unless otherwise specified.

2 The local problem

The main result of this section is Theorem 2.1. We shall suppose that \(A \ge e^{e^2}\) and \(l\ge 2\) and let

$$\begin{aligned} U = \prod _{ p \le w} p,\;\; \text {where }\; w = A^l. \end{aligned}$$
(9)

In addition, we let \({{\mathcal {Z}}}\) be a subset of the integers satisfying the conditions

$$\begin{aligned} |{{\mathcal {Z}}}| \ge \frac{M}{A} \quad \text {and}\quad |\{ z \in {{\mathcal {Z}}}| z \equiv a \,\mathrm{mod}\, U\}| \le \frac{BM}{U}, \end{aligned}$$
(10)

for all classes a in \(\mathbf{Z}/U\mathbf{Z}\) and some \(B >0\) and \(M \ge 1\), real numbers. As before, \(\tau (U) = 2^{\pi (w)}\) is the number of divisors of U. Also, we denote by \(\mathbf{c} = \{c(i)\}_{i \in I}\) a given finite sequence of integers and finally we let \({R}_{U}({{\mathcal {Z}}}, \mathbf{c})\) denote the set of triples (xyi) in \({{\mathcal {Z}}} \times {{\mathcal {Z}}} \times I\) such that \(x ^2 + y^2 + c(i)\) reduces to an invertible square modulo U.

Theorem 2.1

With notation as above and supposing also that \(A^{\ell } \ge 4BA \ge 4e^{e^2}\) we have

$$\begin{aligned} |{R}_{U}({{\mathcal {Z}}}, \mathbf{c})| \le \frac{|{{\mathcal {Z}}}|^2|I|}{\tau (U)} \exp \left( \frac{(3\log 2 + o_{\ell ,B}(1))\log BA}{\log \log BA}\right) , \end{aligned}$$
(11)

where \(o_{\ell ,B}(1) \ll _{\ell } \frac{\log \log \log BA}{\log \log BA}\).

We prove Theorem 2.1 in Sect. 2.4. We do this by using the optimisation principle given by Lemma 2.7 to pass to a problem in \(\mathbf{Z}/U\mathbf{Z}\), dealt with by Theorem 2.6. By means of a pair of applications Hölder’s inequality and the Chinese Remainder Theorem we reduce the proof of Theorem 2.6 to the solution of a problem in \(\mathbf{Z}/p\mathbf{Z}\) for a given prime p|U. This problem is treated by Proposition 2.2 of the following subsection. Theorem 2.6 is the analogue of Proposition 2.3 of [7] in our context. However, the argument we use for Theorem 2.6 is both conceptually simpler and more efficient than the argument leading to Proposition 2.3 in [7], even if the first few steps in both cases are similar. In fact, and as will be shown in another paper, our proof of Theorem 2.6 can be adapted to improve the conclusion of the cited proposition from [7] and hence also that of the main result of [7].

2.1 A sum over Z/pZ

Throughout this subsection p shall denote fixed prime number, \(G_p\) the ring \(\mathbf{Z}/p\mathbf{Z}\) and c a given element of \(G_p\). Also, \(\lambda _{p}(x)\) shall denote the Legendre symbol \((\frac{x}{p})\), for any x in \(G_p\). Furthermore, for any (xy) in \(G_p^2\) we set \(\delta _{p}(x,y) = \lambda _{p}(x^2+y^2 +c)\) and \(\epsilon _{p}(x,y) = 1 +\delta _{p}(x,y)\).

We endow \(G_p\), and likewise \(G_p^t\) for any integer \(t\ge 1\), with their uniform probability measures and write \({{\mathbb {E}}}_{{x}}\) and \({{\mathbb {E}}}_{x_1, x_2, \ldots , x_t}\) in place of \(\frac{1}{p}\sum _{x \in G_p}\) and \(\frac{1}{p^t}\sum _{x_1, x_2, \ldots , x_t \in G_p}\) respectively. When t is fixed, we will use \(\mathbf{x}=(x_1, x_2, \ldots , x_t)\) for elements of \(G_p^t\) and abbreviate \({{\mathbb {E}}}_{x_1, x_2, \ldots , x_t}\) further to \({{\mathbb {E}}}_{\mathbf{x}}\). Also, we will use these notations in the same sense with other letters in place of x. Finally, we define \({{\mathcal {E}}}_{p}(k,t)\) for any integer k with \(1 \le k \le t\) by

$$\begin{aligned} {{\mathcal {E}}}_{p}(k,t) = {{\mathbb {E}}}_{y_1, y_2, \ldots , y_t} {{\mathbb {E}}}_{x_1, x_2, \ldots , x_t} \prod _{\begin{array}{c} 1 \le i \le t,\\ 1\le j \le k. \end{array}} \epsilon _{p}(x_i,y_j) = {{\mathbb {E}}}_\mathbf{{y}} {{\mathbb {E}}}_\mathbf{{x}} \prod _{\begin{array}{c} 1 \le i \le t,\\ 1\le j \le k. \end{array}} \epsilon _{p}(x_i,y_j). \end{aligned}$$
(12)

Proposition 2.2

For any integers tk satisfying \(t \ge 2\) and \(1 \le k \le \frac{t}{2}\) we have

$$\begin{aligned} {{\mathcal {E}}}_{p}(k,t) \le \exp \left( \frac{8kt^4 2^{t} }{p}\right) . \end{aligned}$$
(13)

We shall prove (13) for a given integer \(t \ge 2\) and all integers k satisfying \(1 \le k \le \frac{t}{2}\) by induction on k starting from \(k =1\), using Proposition 2.3 and Lemma 2.4 below. Let us note that the trivial upper bound \(2^{kt}\) for \({{\mathcal {E}}}_{p}(k,t)\) implies (13) when \(p \le 8t^32^{t}\). This allows us, in particular, to assume that \(p >2\).

Proposition 2.3

Let t be an integer \(\ge 1\) and J be a non-empty subset of \(\{1, 2, \ldots t\}\). Further, let \({{\mathcal {B}}}(J)\) be the subset of \(G_p^t\) consisting of \(\mathbf{y} = (y_1, y_2, \ldots , y_t)\) in \(G_p^t\) such that either \(y_{j}^2 = y_{k}^2\) for some distinct jk in J or \(y_{j}^2 =-c\) for some j in J. Then we have

  1. (i)

    \(|{{\mathbb {E}}}_{x} \prod _{j \in J}\delta _{p}(x, y_j)| < \frac{2|J|}{\sqrt{p}}\) when \(\mathbf{y} = (y_1,y_2, \ldots , y_t) \notin {{\mathcal {B}}}(J)\) and

  2. (ii)

    \(\big |{{\mathbb {E}}}_{\mathbf{y}} {{\mathbb {E}}}_{x} \prod _{j \in J} \delta _{p}(x, y_j)\big | \le \frac{2}{p}\).

Proof

The bound (i) is a consequence of the Weil bounds for character sums. Indeed, it follows from Theorem 2\(\text {C}\) on page 43 of [11] that

$$\begin{aligned} |{{\mathbb {E}}}_{x} \prod _{j \in J}\delta _{p}(x, y_j) | = \frac{1}{p} \left| \sum _{x \in G_p} \lambda _{p}\left( \prod _{j \in J}( x^2 +y_j^2 +c)\right) \right| \le \frac{2|J|-1}{ \sqrt{p}}, \end{aligned}$$
(14)

when the polynomial \(f(X) = \prod _{j \in J} (X^2 +y_j^2 +c)\) is not a square in \({\overline{\mathbf{F}}}_{p}[X]\), where \({\overline{\mathbf{F}}}_{p}\) is an algebraic closure of \(\mathbf{F}_{p}\). Since this condition holds for \(\mathbf{y} = (y_1, y_2, \ldots , y_t) \notin {{\mathcal {B}}}(J)\), we have (i).

To verify (ii), we begin by recalling that for all \(x \in G_p\) we have the classical identity

$$\begin{aligned} \gamma _p \lambda _p(x) = \sum _{a \in G_p} \lambda _{p}(a) e_p(ax) \end{aligned}$$
(15)

where \(\gamma _p\), the Gauss sum to modulus p, is the right hand side of the above relation evaluated at \(x =1\). If for any \(a \in G_p\) we set \(\ell (a) = 0\) when \(a =0\) and \(\ell (a) = \frac{\gamma _p}{p}\) when \(a \ne 0\) then it is easily seen from (15) that

$$\begin{aligned} \lambda _p(a) {{\mathbb {E}}}_y e_p(a y^2) = \ell (a) \quad \hbox {for all}\;\; a\; \hbox {in}\; G_p. \end{aligned}$$
(16)

On combining (15) and (16) we deduce that for any \(b \in G_p\) we have

$$\begin{aligned} {{\mathbb {E}}}_{y} \lambda _{p}(y^2 +b) = \frac{1}{\gamma _p} \sum _{a \in G_p} \lambda _{p}(a) {{\mathbb {E}}}_{y} e_p(a y^2) e_p( ab) = \frac{\mu (b)}{p}, \end{aligned}$$
(17)

where we have set \(\mu (b) = \sum _{a \in G_p^*}e(ab)\), for any \(b \in G_p\), with \(G_p^*\) denoting the set of non-zero elements of \(G_p\). Thus \(\mu (b)\) is \(p-1\) when \(b =0\) and is \(-\,1\) when \(b \ne 0\). By means of (17) we then have that

$$\begin{aligned} {{\mathbb {E}}}_{x} {{\mathbb {E}}}_{\mathbf{y}}\prod _{j \in J} \delta _{p}(x, y_j) = {{\mathbb {E}}}_{x} \prod _{j \in J} {{\mathbb {E}}}_{y_j}\lambda _{p}(x^2+y_j^2 +c) = \frac{1}{p^{m}} {{\mathbb {E}}}_{x}\, \mu (x^2 +c)^{m}, \end{aligned}$$
(18)

where \(m = |J|\). On using the values of \(\mu (b)\) given above to evaluate the last term in each of the cases \(c=0, -\,c\) is non-zero square and \(-\,c\) is not a square we finally get

$$\begin{aligned} \left| {{\mathbb {E}}}_{\mathbf{y}}{{\mathbb {E}}}_{x} \prod _{j \in J} \delta _{p}(x, y_j)\right| = \left| {{\mathbb {E}}}_{{x}}{{\mathbb {E}}}_{\mathbf{y}} \prod _{j \in J} \delta _{p}(x, y_j)\right| \le \frac{2p^{m}}{p^{m +1}} = \frac{2}{p}. \end{aligned}$$
(19)

\(\square \)

The following is the well-known Hoeffding’s lemma from elementary probability theory.

Lemma 2.4

Let Z be a real valued random variable on a probability space satisfying \(a \le Z \le b\), for real numbers \(a \le b\). Then for any real s we have

$$\begin{aligned} {{\mathbb {E}}} \exp (s Z) \le \exp (s{{\mathbb {E}}} Z) \exp \left( \frac{s^2(b-a)^2}{8} \right) . \end{aligned}$$
(20)

Proof

Replacing Z with \(Z - {{\mathbb {E}}}Z\), we may suppose that \({{\mathbb {E}}}Z = 0\). Then (20) is easily deduced from the convexity of the function \(r \mapsto \exp (sr)\) on the interval [ab]. The details may be found in the proof of Lemma 5.1, page 64 of [3], for example. \(\square \)

Proof of Proposition 2.2

Let t be an integer \(\ge 2\). We begin by noting that

$$\begin{aligned} {{\mathcal {E}}}_{p}(1,t)= & {} {{\mathbb {E}}}_{\mathbf{x}} {{\mathbb {E}}}_{y_1}\prod _{1 \le i \le t} \left( 1 + \delta _p(x_i,y_1)\right) \le 1 \nonumber \\&+\, \sum _{ \begin{array}{c} J\subseteq \{1,2,\ldots , t\},\\ J \ne \emptyset . \end{array}} \left| {{\mathbb {E}}}_{\mathbf{x}}{{\mathbb {E}}}_{y_1} \prod _{i \in J} \delta _p (x_i, y_1)\right| , \end{aligned}$$
(21)

on expanding the product over \(1 \le i \le t\) and using the triangle inequality. From the bound (ii) of Proposition 2.3 applied to each summand in the sum over J in (21) we then obtain

$$\begin{aligned} {{\mathcal {E}}}_{p}(1,t) \le 1 + \frac{2 \cdot 2^t}{p} \le \exp \left( \frac{2^{t+1}}{p}\right) , \end{aligned}$$
(22)

which verifies (13) for \(k =1\). Suppose now that \(t \ge 4\) and that (13) holds for \(k-1\), where k is an integer satisfying \(2 \le k \le \frac{t}{2}\), and let us verify it for k. We recall the definition of \({{\mathcal {B}}}(J)\) from Proposition 2.3 and set \({{\mathcal {B}}} = {{\mathcal {B}}}(J)\) with \(J = \{1, 2, \ldots , k\}\). Then on writing \({{\mathcal {B}}}^{\prime }\) for the complement of \({{\mathcal {B}}}\) in \(G_{p}^t\) we have

$$\begin{aligned} {{\mathcal {E}}}_{p}(k,t) = {{\mathbb {E}}}_{\mathbf{y}} 1_{{\mathcal {B}}}(\mathbf{y}) {{\mathbb {E}}}_{\mathbf{x}} \prod _{\begin{array}{c} 1 \le i \le t,\\ j \in J. \end{array}} \epsilon _{p}(x_i,y_j)\, + {{\mathbb {E}}}_{\mathbf{y}} 1_{{{\mathcal {B}}}^{\prime }}(\mathbf{y}) {{\mathbb {E}}}_{\mathbf{x}} \prod _{\begin{array}{c} 1 \le i \le t,\\ j \in J. \end{array}} \epsilon _{p}(x_i,y_j). \end{aligned}$$
(23)

Let us estimate the first term on the right hand side of (23). To this end, we set \(\alpha _{l}(\mathbf{y}) = 1\) for any \(l \in J\) and any \(\mathbf{y}\) in \((y_1, y_2, \ldots , y_t) \in G_p^t\) if either \(y_l^2 = y_j^2\) for some \(j \in J\) distinct from l or if \(y_l^2 = -c\) and set \(\alpha _{l}(\mathbf{y}) =0\) otherwise. Then for all \(\mathbf{y} \in G_p^t\) we have \(1_{{\mathcal {B}}}(\mathbf{y}) \le \sum _{l \in J} \alpha _{l}(\mathbf{y})\) and consequently

$$\begin{aligned} {{\mathbb {E}}}_{\mathbf{y}} 1_{{\mathcal {B}}}(\mathbf{y}) {{\mathbb {E}}}_{\mathbf{x}} \prod _{\begin{array}{c} 1 \le i \le t,\\ j \in J. \end{array}} \epsilon _{p}(x_i,y_j)\, \le \, \sum _{l \in J}{{\mathbb {E}}}_{\mathbf{y}} {{\mathbb {E}}}_{\mathbf{x}} \alpha _{l}(\mathbf{y}) \prod _{\begin{array}{c} 1 \le i \le t,\\ j \in J. \end{array}} \epsilon _{p}(x_i,y_j). \end{aligned}$$
(24)

For any \(l \in J\) let us write \({{\mathbb {E}}}_{\hat{\mathbf{y}}_l}\) for \({{\mathbb {E}}}_{y_1, y_2, \ldots , y_{t}}\) with the variable \(y_l\) dropped. Then the trivial bound \(\prod _{1 \le i \le t} \epsilon _{p}(x_i,y_l) \le 2^t\) shows that the right hand side of (24) does not exceed

$$\begin{aligned} 2^t \sum _{l \in J} {{\mathbb {E}}}_{\hat{\mathbf{y}}_l} {{\mathbb {E}}}_{\mathbf{x}}\, {{\mathbb {E}}}_{y_l} \alpha _{l}(\mathbf{y}) \prod _{\begin{array}{c} 1 \le i \le t,\\ j \in J,\, j \ne l. \end{array}} \epsilon _{p}(x_i,y_j). \end{aligned}$$
(25)

For any \(l \in J\) we have \({{\mathbb {E}}}_{y_l}\alpha _{l}(\mathbf{y}) \le \frac{2k}{p}\) and \({{\mathbb {E}}}_{\hat{\mathbf{y}}_l} {{\mathbb {E}}}_{\mathbf{x}} \prod _{\begin{array}{c} 1 \le i \le t,\\ j \in J,\, j \ne l. \end{array}} \epsilon _{p}(x_i,y_j) = {{\mathcal {E}}}_{p}(k-1,t).\) Since \(|J| =k\), it follows that (25) does not exceed \(\frac{2^{t+1}k^2}{p}{{\mathcal {E}}}_{p}(k-1,t)\). We then conclude from (24) that

$$\begin{aligned} {{\mathbb {E}}}_{\mathbf{y}} 1_{{\mathcal {B}}}(\mathbf{y}) {{\mathbb {E}}}_{\mathbf{x}} \prod _{\begin{array}{c} 1 \le i \le t,\\ j \in J. \end{array}} \epsilon _{p}(x_i,y_j)\, \le \, \frac{2^{t+1}k^2}{p}{{\mathcal {E}}}_{p}(k-1,t). \end{aligned}$$
(26)

Turning now to the second term on the right hand side of (23), we define the random variable X on \(G_p^t\) by

$$\begin{aligned} X(\mathbf{y}) = \left( {{\mathbb {E}}}_{x} \prod _{j \in J} \epsilon _{p}(x,y_j)\right) -1 = \sum _{\begin{array}{c} I\subseteq J,\\ I \ne \emptyset . \end{array}} {{\mathbb {E}}}_{x} \prod _{j \in I} \delta _{p}(x, y_j). \end{aligned}$$
(27)

and set \(Z = 1_{{{\mathcal {B}}}^{\prime }} X\). Then we have

$$\begin{aligned} {{\mathbb {E}}}_{\mathbf{y}} 1_{{{\mathcal {B}}}^{\prime }}(\mathbf{y}) {{\mathbb {E}}}_{\mathbf{x}} \prod _{\begin{array}{c} 1 \le i \le t,\\ j \in J. \end{array}} \epsilon _{p}(x_i,y_j) = {{\mathbb {E}}}_{\mathbf{y}}1_{{{\mathcal {B}}}^{\prime }}(1 +X)^t \le \,{{\mathbb {E}}}_{\mathbf{y}}\exp (tZ), \end{aligned}$$
(28)

since \(1_{{{\mathcal {B}}}^{\prime }}(1 +X) \le \exp (1_{{{\mathcal {B}}}^{\prime }}X)\) and also \(0 \le 1+X \) from (27). We apply Lemma 2.4 to estimate the last term of (28). Let us first note from (27) that for any \(\mathbf{y} \in G_p^t\) we have

$$\begin{aligned} |Z(\mathbf{y})| = 1_{{{\mathcal {B}}}^{\prime }}(\mathbf{y}) |X(\mathbf{y})| \le 1_{{{\mathcal {B}}}^{\prime }}(\mathbf{y}) \sum _{\begin{array}{c} I\subseteq J,\\ I \ne \emptyset \end{array}} \left| {{\mathbb {E}}}_{x} \prod _{j \in I} \delta _{p}(x, y_j)\right| \le \frac{2k\cdot 2^k}{\sqrt{p}}, \end{aligned}$$
(29)

by the triangle inequality and (i) of Proposition 2.3 applied to each summand in the sum over I. Further, we have \(Z \le X + 1_\mathcal{B}\) since \(0 \le 1_\mathcal{B}(1 +X)\). It follows that

$$\begin{aligned} {{\mathbb {E}}}_{\mathbf{y}} Z \le \sum _{\begin{array}{c} I \subseteq J,\\ I \ne \emptyset \end{array}} \left| {{\mathbb {E}}}_{\mathbf{y}} {{\mathbb {E}}}_{x} \prod _{j \in I} \delta _{p}(x, y_j)\right| + {{\mathbb {E}}}_{\mathbf{y}} 1_{{{\mathcal {B}}}} \le \frac{2^{k+1} + 2k^2}{p}, \end{aligned}$$
(30)

on now using (ii) of Proposition 2.3 to bound each summand in the sum over I and remarking that \({{\mathbb {E}}}_{\mathbf{y}} 1_{{\mathcal {B}}}(\mathbf{y}) \le \sum _{l \in J} {{\mathbb {E}}}_{{ y}_l} \alpha _{l}(\mathbf{y}) \; \le \frac{2k^2}{p}\). From (30), (29) and Lemma 2.4 we then conclude that

$$\begin{aligned} {{\mathbb {E}}}_{\mathbf{y}} \exp (tZ)\le \exp \left( \frac{(2^{k+1} +2k^2)t + 2k^2t^2 4^k}{p}\right) \le \exp \left( \frac{4t^4 2^{t}}{p}\right) , \end{aligned}$$
(31)

by means of the inequalities \(2^{k+1} +2k^2 \le 2^{t+1}\) and \(2k^2t^2 4^{k} \le 2t^42^{t}\), valid since \(t \ge 4\) and \(k \le \frac{t}{2}\). This relation taken together with (28), (26) and (23) gives

$$\begin{aligned} {{\mathcal {E}}}_{p}(k,t) \le \frac{2^{t+1}k^2}{p}{{\mathcal {E}}}_{p}(k-1,t) + \exp \left( \frac{4t^4 2^t}{p}\right) . \end{aligned}$$
(32)

By the induction hypothesis (13) holds for \(k-1\) and consequently we deduce from (32) that

$$\begin{aligned} {{\mathcal {E}}}_{p}(k,t) \le \left( \frac{2^{t+1}k^2}{p} + 1\right) \exp \left( \frac{(2(k-1) + 1)4t^42^t}{p}\right) . \end{aligned}$$
(33)

Using again the inequality \(1 + s \le \exp (s)\) and noting that \(2^{t+1}k^2 \le 4t^42^t\) we then conclude from (33) that (13) holds for k, completing the induction step. \(\square \)

Remark 2.5

It is perhaps the case that the conclusion of Proposition 2.2 holds for all \(t \ge 2\) and all k satisfying \(1 \le k \le t\). A proof of this assertion would allow us to replace \(3\log 2\) with \(2\log 2\) in (11) and, as a consequence, in Theorem 1.1 as well.

2.2 The problem modulo U

Let, as above, \(l\ge 2\) and \(A \ge e^{e^2}\) be real numbers and \(U = \prod _{p \le w} p\), where \(w = A^{\ell }\). Suppose further that \({{\mathcal {X}}}\) and \({{\mathcal {Y}}}\) are subsets of \(\mathbf{Z}/U\mathbf{Z}\) of density at least \(\frac{1}{A}\). That is,

$$\begin{aligned} |{{\mathcal {X}}}| \quad \text {and}\quad |{{\mathcal {Y}}}| \ge \frac{U}{A}. \end{aligned}$$
(34)

For a given element c of \(\mathbf{Z}/U\mathbf{Z}\), let \({T}_{c}({{\mathcal {X}}},{{\mathcal {Y}}})\) denote the set of pairs \((x,y) \in {{\mathcal {X}}} \times {{\mathcal {Y}}}\) such that \(x ^2 + y^2 + c \) is an invertible square in \(\mathbf{Z}/U\mathbf{Z}\).

Theorem 2.6

For all \(l, A, U, {{\mathcal {X}}}, {{\mathcal {Y}}}\) and c as above, we have

$$\begin{aligned} |{T}_{c}({{\mathcal {X}}},{{\mathcal {Y}}} )| \le \frac{|{{\mathcal {X}}}||{{\mathcal {Y}}}|}{\tau (U)} \exp \left( \frac{\left( 3\log 2 + O_{l}\left( \frac{\log \log \log A}{\log \log A}\right) \right) \log A}{\log \log A}\right) . \end{aligned}$$
(35)

Proof

We shall write G for the ring \(\mathbf{Z}/U\mathbf{Z}\) and continue to use \(G_p\) for \(\mathbf{Z}/{p\mathbf{Z}}\). Also, for any x in G and p|U we denote the canonical image of x in \(\mathbf{Z}/p\mathbf{Z}\) by \(x_p\) and, to be consistent with the notation of preceding subsection, write \(\lambda _{p}(x)\) for the Legendre symbol \((\frac{x_p}{p})\). Then we have that

$$\begin{aligned} |{T}_{c}({{\mathcal {X}}},{{\mathcal {Y}}})| \le \sum _{x \in {{\mathcal {X}}}}\, \sum _{y \in {\mathcal {Y}}} \prod _{p|U} \left( \frac{1 + \lambda _{p}(x^2+y^2+c)}{2}\right) , \end{aligned}$$
(36)

since \(0 \le 1 + \lambda _{p}(x^2+y^2+c) \le 2\) for any pair (xy) in \({{\mathcal {X}}} \times {{\mathcal {Y}}}\), with equality in the upper bound for every prime p|U when \(x^2+y^2+c\) is an invertible square in G. On extending the definitions of \(\delta _{p}\) and \(\epsilon _p\) from Sect. 2.2 by setting \(\delta _{p}(x,y) = \lambda _{p}(x^2+y^2 +c)\) and \(\epsilon _{p}(x,y) = 1 +\delta _{p}(x,y)\) for any (xy) in \(G^2\) and p|U, we may rewrite (36) as

$$\begin{aligned} |{T}_{c}({{\mathcal {X}}},{{\mathcal {Y}}})| \le \frac{1}{\tau (U)} \sum _{x \in {{\mathcal {X}}}}\, \sum _{y \in {\mathcal {Y}}} \prod _{p|U} \epsilon _{p}(x,y). \end{aligned}$$
(37)

Let \(t \ge 2\) be an even integer. Then an interchange of summations followed by an application of Hölder’s inequality to exponent t to the right hand side of (37) gives

$$\begin{aligned} |{T}_{c}({{\mathcal {X}}},{{\mathcal {Y}}})| \le \frac{|{{\mathcal {Y}}}|^{1-\frac{1}{t}}}{\tau (U)} \left( \sum _{y \in {{\mathcal {Y}}}}\, \left( \sum _{x \in {\mathcal {X}}} \, \prod _{p|U} \epsilon _{p}(x,y)\right) ^{t} \right) ^{\frac{1}{t}}. \end{aligned}$$
(38)

To bound the sum over \(y \in {{\mathcal {Y}}}\) on the right hand side of the inequality above, we first expand the summand in this sum and extend the summation to all \(y \in G\). By this we see that

$$\begin{aligned} \sum _{y \in {{\mathcal {Y}}}}\, \left( \sum _{x \in {\mathcal {X}}} \, \prod _{p|U} \epsilon _{p}(x,y)\right) ^{t} \le \sum _{y \in G}\, \sum _{(x_1,x_2, \ldots ,x_t) \in {{\mathcal {X}}}^{t}} \, \prod _{1 \le i \le t} \prod _{p|U} \epsilon _{p}(x_i, y). \end{aligned}$$
(39)

Interchanging the summations over G and \({{\mathcal {X}}}^t\) on the right hand side of the above relation and applying Hölder’s inequality again, this time to exponent \(\frac{t}{2}\), we deduce that the right hand side of (39) does not exceed

$$\begin{aligned} |{{\mathcal {X}}}|^{t-2} \left( \sum _{(x_1,x_2, \ldots ,x_t) \in {{\mathcal {X}}}^{t}} \left( \sum _{y \in G}\, \prod _{1 \le i \le t} \prod _{p|U} \epsilon _{p}(x_i,y) \right) ^{\frac{t}{2}}\right) ^{\frac{2}{t}}. \end{aligned}$$
(40)

Finally, on expanding the summand in the sum over \({{\mathcal {X}}}^t\) in (40) and extending the summation to all of \(G^t\) we conclude using (39) and (38) and a rearrangement of terms that

$$\begin{aligned} |{T}_{c}({{\mathcal {X}}},{{\mathcal {Y}}})| \le \frac{|{{\mathcal {X}}}| |{{\mathcal {Y}}}|}{\tau (U)} \left( \frac{U^3}{|{{\mathcal {X}}}|^2 |{{\mathcal {Y}}}|}\right) ^{\frac{1}{t}} {{\mathcal {E}}}\left( \frac{t}{2},t\right) ^{\frac{2}{t^2}}, \end{aligned}$$
(41)

where, for any integer k with \(1 \le k \le t\), we have set

$$\begin{aligned} {{\mathcal {E}}}(k,t) = \frac{1}{U^{2t}}\sum _{(y_1,y_2, \ldots ,y_t) \in G^{t}} \sum _{(x_1,x_2,, \ldots , x_t) \in G^{t}}\, \prod _{p|W} \prod _{\begin{array}{c} 1 \le i \le t, \\ 1 \le j \le k. \end{array}} \epsilon _{p}(x_i,y_j).\end{aligned}$$
(42)

The Chinese Remainder Theorem gives \(G = \prod _{p|U} G_p\). Moreover, for all p|U and (xy) in \(G^2\) we have \(\epsilon _{p}(x,y) = \epsilon _{p}(x_p, y_p)\). It follows that \({{\mathcal {E}}}(k,t) = \prod _{p|U} {{\mathcal {E}}}_{p}(k,t)\), where \({{\mathcal {E}}}_{p}(k,t)\) is as defined by (12). Using (13) with \(k = \frac{t}{2}\), valid on account of Proposition 2.2, and recalling that \(U = \prod _{ p \le A^{\ell }} p\) we then obtain

$$\begin{aligned} {{\mathcal {E}}}(k,t) = \prod _{p|U} {{\mathcal {E}}}_{p}(k,t) \le \exp \left( 4 t^5 2^t \sum _{p \le A^{\ell }} \frac{1}{p} \right) . \end{aligned}$$
(43)

From (3.20) on page 70 of [9] we see that \(\sum _{p \le A^{\ell }} \frac{1}{p} \le (\log 2\ell ) \log \log A\), since \(A \ge 4\) and \(\ell \ge 2\). On combining this remark with (43), (34) and (41) we then conclude that for any even integer \(t \ge 2\) we have

$$\begin{aligned} |{T}_{c}({{\mathcal {X}}},{{\mathcal {Y}}})| \le \frac{|{{\mathcal {X}}}| |{{\mathcal {Y}}}|}{\tau (U)}\exp \left( \frac{3\log A}{t} + 8(\log 2\ell ) t^3 2^t \log \log A \right) . \end{aligned}$$
(44)

Let us now set \(v \log 2 = \log \left( \frac{\log A}{ (\log \log A)^6}\right) \) and suppose that \(A_0 \ge e^{e^2}\) is such that we have \(\frac{\log \log A}{\log \log \log A} \ge 12\) and \(v \ge 2\) for all \(A > A_0\). For such A we take t in (44) to be an even integer satisfying \(v \le t \le v+2\). Also, with \(\kappa = \frac{6\log \log \log A}{\log \log A}\) we have \(\kappa \le \frac{1}{2}\) and \(v = \frac{(1-\kappa )\log \log A}{\log 2}\). Thus \(\frac{1}{t} \le \frac{1}{v} \le \frac{(\log 2) (1+2\kappa )}{\log \log A}\) and \(t^32^t \le 32v^3 2^v \le \frac{32\log A}{(\log 2)^3(\log \log A)^3}\). Substituting these inequalities in (44) we obtain (35) for \(A > A_0\). To obtain (35) for \(e^{e^2} \le A \le A_{0}\) it suffices to take \(t =2\) in (44). \(\square \)

2.3 An optimisation principle

This subsection summarises Subsection 2.3 of [7]. Suppose that \(n \ge 1\) is an integer and let P and H be real numbers \(> 0\). Further, assume that the subset \({{\mathcal {K}}}\) of points \(x = (x_1,x_2,\ldots ,x_n)\) in \(\mathbf{R}^n\) satisfying the conditions

$$\begin{aligned} \sum _{1 \le i \le n} x_i = P \quad \text {and}\quad 0 \le x_i \le H \quad {\text {for all}}\; i. \end{aligned}$$
(45)

is not empty. Then \({{\mathcal {K}}}\) is a non-empty, compact and convex subset of \(\mathbf{R}^n\) and we have the following standard fact.

Lemma 2.7

If \(f: \mathbf{R}^n \times \mathbf{R}^n \mapsto \mathbf{R}\) a bilinear form with real coefficients then

  1. (i)

    There are extreme points \(x^{*}\) and \(y^{*}\) of \({{\mathcal {K}}}\) so that \(f(x,y) \le f(x^{*},y^{*})\) for all \(x,y \in {{\mathcal {K}}}\).

  2. (ii)

    If \(x^{*} = (x_1^{*}, x_2^{*}, \ldots , x_n^{*})\) is an extreme point of \({{\mathcal {K}}}\) then \(x_i^{*} = 0\) or \(x_{i}^{*} = H\) for all i excepting at most one. Thus if m is the number of i such \(x_i^{*} \ne 0\) then \(mH \ge P > (m-1) H\).

Proof

See the proof of Proposition 2.2 of [7], for example.

2.4 Proof of Theorem 2.1

Let ab be any elements of \(\mathbf{Z}/U\mathbf{Z}\). For any i in I we set \(\alpha _i (a,b) = 1\) if \(a^2 +b^2 +c(i)\) is an invertible square in \(\mathbf{Z}/U\mathbf{Z}\) and 0 otherwise. Further, we write m(a) for the number of z in \({{\mathcal {Z}}}\) such that \(z \equiv a\) mod U. Then if \(\tilde{{{\mathcal {Z}}}}\) denotes the image of \({{\mathcal {Z}}}\) in \(\mathbf{Z}/U\mathbf{Z}\) we have

$$\begin{aligned} |{R}_{U}({{\mathcal {Z}}}, \mathbf{c})| = \sum _{i \in I} \sum _{(a,b) \in {\tilde{Z}}^2} \alpha _{i}(a,b)\, m(a)m(b). \end{aligned}$$
(46)

Moreover, we have

$$\begin{aligned} \sum _{a \in \tilde{{{\mathcal {Z}}}}} m(a) = |{{\mathcal {Z}}}| \quad \text {and}\quad 0 \le m(a) \le H, \end{aligned}$$
(47)

with \( H = \frac{BM}{U}\), on account of the second assumption in (10). Let us bound the inner sum on the right hand side of (46) for a given i in I. By means of Lemma 2.7 and (47) we obtain

$$\begin{aligned} \sum _{(a,b) \in {\tilde{{{\mathcal {Z}}}}}^2} \alpha _{i}(a,b)\, m(a)m(b) \le \sum _{(a,b) \in {\tilde{Z}}^2} \alpha _{i}(a,b)\, x^{*}_{a} y^{*}_{b}, \end{aligned}$$
(48)

for some \(x^{*}_{a}\) and \(y^{*}_{b}\), with a and b varying over \(\tilde{{{\mathcal {Z}}}}\), satisfying the following condition. All the \(x^{*}_{a}\), and similarly all the \(y^{*}_{b}\), are either equal to 0 or to H excepting at most one, which must lie in (0, H). Let \({{\mathcal {X}}}\) and \({{\mathcal {Y}}}\) be, respectively, the subsets of \(\tilde{{{\mathcal {Z}}}}\) for which \(x^{*}_{a} \ne 0\) and \(y^{*}_{b} \ne 0\). Then we have from (ii) of Lemma 2.7 that \(|{{\mathcal {X}}}| H \ge |{{\mathcal {Z}}}| > (|{{\mathcal {X}}}|-1) H\). From the first condition in (10) we then get \(|{{\mathcal {X}}}| \ge \frac{|{{\mathcal {Z}}}|}{H} \ge \frac{U}{AB}\ge 2\), since \(U \ge \frac{A^{\ell }}{2} \ge 2AB\). This gives \(H \le \frac{|{{\mathcal {Z}}}|}{|{{\mathcal {X}}}|-1}\le \frac{2|{{\mathcal {Z}}}|}{|{{\mathcal {X}}}|}\). The same inequalities hold with \(|{{\mathcal {X}}}|\) replaced by \(|{{\mathcal {Y}}}|\). It follows that \(H^2 \le \frac{4|{{\mathcal {Z}}}|^2}{|{{\mathcal {X}}}||{{\mathcal {Y}}}|}\). Further, with \(T_{c(i)}({{\mathcal {X}}}, {{\mathcal {Y}}})\) as in Sect. 2.2 we have \(\sum _{(a,b) \in {{\mathcal {X}}\times {\mathcal {Y}}}} \alpha _{i}(a,b) = T_{c(i)}({{\mathcal {X}}}, {{\mathcal {Y}}})\). Since \(\alpha _i(a,b) \ge 0\) for all (ab), we then deduce that

$$\begin{aligned} \sum _{(a,b) \in {\tilde{Z}}^2} \alpha _{i}(a,b)\, x^{*}_{a} y^{*}_{b} \le H^2 \sum _{(a,b) \in {{\mathcal {X}}\times {\mathcal {Y}}}} \alpha _{i}(a,b) \le \frac{4|T_{c(i)}({{\mathcal {X}}}, {{\mathcal {Y}}})||{{\mathcal {Z}}}|^2}{|{{\mathcal {X}}}||{{\mathcal {Y}}}|}. \end{aligned}$$
(49)

Combining this with (48), (46) and the bound supplied by (35) for \(|T_{c(i)}({{\mathcal {X}}}, {{\mathcal {Y}}})|\), applicable since \(AB \ge e^{e^2}\), we conclude that (11) holds. \(\square \)

3 An application of the circle method

We prove Theorem 1.3 in this section. As stated in Sect. 1, our first step will be to prove the inequality (7). This is carried out in Sects. 3.1 through 3.3 starting from the preliminaries given below. We then complete the proof of Theorem 1.3 in Sect. 3.4 by applying Theorem 2.1 to estimate (8).

We suppose that \(A \ge e^{e^2}\) and \(l\ge 12\) are real numbers and assume that N is a sufficiently large integer depending only on A and l, its actual size varying to suit our requirements at various stages of the argument. We set

$$\begin{aligned} U = \prod _{ p \le w} p\quad \text {and}\quad W = 2U,\quad \text { where }\;\; w = A^l. \end{aligned}$$
(50)

Also, we set \(\alpha (t) = 1 -\left| \frac{2t}{5N}\right| \) when \(|t|\le \frac{5N}{2}\) and 0 for all other \(t \in \mathbf{R}\) and set \(\beta (t) = \alpha (t - \frac{5N}{2})\). Thus \(\beta (t) \ge 0\) for all t in \(\mathbf{R}\) and \(\beta (t) \ge \frac{2}{5}\) when \(t \in [N, 4N]\). Further, S will denote a given subset of the squares in (N, 4N] satisfying the hypotheses of Theorem 1.3. Finally, for all \(t \in \mathbf{R}\) we set

$$\begin{aligned} \psi (t) = \sum _{\begin{array}{c} 0 \le r < W, \\ (r, W) = 1. \end{array}} \, \sum _{ n \, \equiv \, r\, \mathrm{mod}\, W } 2n \beta (n^2)e(n^2t) \end{aligned}$$
(51)

and \(\widehat{S}(t) = \sum _{x \in S} e(xt)\). Then by analogy with (3.1) of [7] we observe that

$$\begin{aligned} \frac{4}{5} \sqrt{N}E_{6}(S) \, \le \, \int _{0}^{1} \widehat{S}(t)^6 \widehat{S}(-t)^{5}{\psi }(-t)\, dt. \end{aligned}$$
(52)

Indeed, \(E_{6}(S)\) is the same as the number of \(x \in S^{11}\) such that \(f(x) \in S\), with f(x) as in (7). For any such x if \(f(x) = n^2\) then \(\frac{4}{5}\sqrt{N} \le 2n\beta (n^2) \) and n is invertible modulo W. This remark implies (52) by positivity of \(\beta \) and orthogonality.

We shall apply the circle method to estimate the integral on the right hand side of (52). To this end, we set \(L = (\log N)^2, Q = W^2A^{12}, M = \frac{N}{L}\) and, for any integers a and q satisfying

$$\begin{aligned} 0 \le a \le q \le Q \quad \text {and}\quad (a,q) =1, \end{aligned}$$
(53)

we call the interval \([\frac{a}{q} - \frac{1}{M}, \frac{a}{q}+\frac{1}{M})\) the major arc \({{\mathfrak {M}}}(\frac{a}{q})\). It is easily checked that distinct major arcs are in fact disjoint when \(M > 2Q^2\), which holds when N is sufficiently large depending only on A and l. We denote by \({{\mathfrak {M}}}\) the union of the family of major arcs \({{\mathfrak {M}}}(\frac{a}{q})\). Each interval in the complement of \({{\mathfrak {M}}}\) in [0, 1) is called a minor arc. We denote the union of the minor arcs by \({{\mathfrak {m}}}\).

We have

$$\begin{aligned} \int _{0}^{1} \widehat{S}(t)^6 \widehat{S}(-t)^{5}\psi (-t)\, dt = \int _{-\frac{1}{M}}^{1-\frac{1}{M}} \widehat{S}(t)^6 \widehat{S}(-t)^{5}\psi (-t)\, dt \end{aligned}$$
(54)

by the periodicity of the integrand. From the definitions given above it is easily seen that the interval \([-\frac{1}{M}, 1-\frac{1}{M})\) is the union of \({{\mathfrak {m}}}\) and \({{\mathfrak {M}}}{\setminus } [1-\frac{1}{M}, 1+\frac{1}{M})\). Since distinct major arcs are disjoint, it then follows that the right hand side of (54) is the same as

$$\begin{aligned} \sum _{1 \le q \le Q} \sum _{\begin{array}{c} 0 \le a < q,\\ (a,q) =1. \end{array}} \int _{{{\mathfrak {M}}}(\frac{a}{q})} \widehat{S}(t)^6 \widehat{S}(-t)^{5}\psi (-t)\, dt + \int _{{{\mathfrak {m}}}} \widehat{S}(t)^6 \widehat{S}(-t)^{5}\psi (-t)\, dt. \end{aligned}$$
(55)

We shall presently estimate each of the two terms in (55). We begin by observing that

$$\begin{aligned} \int _{0}^{1} |\widehat{S}(t)|^{11} \; dt \ll \, |S|^{9}A^3. \end{aligned}$$
(56)

In effect, the integral in (56) does not exceed \(|S| E_{5}(S)\). Thus (56) follows from \(|S| \ge N^{\frac{1}{2}}/A\) and

$$\begin{aligned} E_{5}(S) = \sum _{1 \le n} R_{5}^{2}(n) = \sum _{1 \le n \le 20 N } R_{5}^{2}(n) \ll N^{\frac{3}{2}} \sum _{n \ge 1} R_{5}(n) = |S|^5 N^{\frac{3}{2}}, \end{aligned}$$
(57)

where \(R_{5}(n)\) denotes the number of representations of an integer n as a sum of five elements of S. To verify (57) we note that \( R_{5}(n) = 0\) when \(n > 20 N\) and \( R_{5}(n) \le r_{5}(n)\), the number of representations of n as a sum of five squares of natural numbers, and recall that \(r_{5}(n) \ll n^{\frac{3}{2}}\), by a standard application of the circle method. As a consequence of (56) we have

$$\begin{aligned} \sum _{1 \le q \le Q} \sum _{\begin{array}{c} 0 \le a < q,\\ (a,q) =1. \end{array}} \int _{{{\mathfrak {M}}}(\frac{a}{q})} |\widehat{S}(t)|^{11} dt \le \int _{-\frac{1}{M}}^{1-\frac{1}{M}} |\widehat{S}(t)|^{11} dt \ll |S|^{9} A^3. \end{aligned}$$
(58)

3.1 The minor arc contribution

Here we bound the second term in (55). Let us first verify that for all \(t \in {{\mathfrak {m}}}\) we have

$$\begin{aligned} |\psi (t)| \ll \frac{N}{A^6}, \end{aligned}$$
(59)

when N is large enough, depending only on A and l. Indeed, for any real t Dirichlet’s approximation theorem gives a rational number \(\frac{a}{q}\) satisfying \(|t-\frac{a}{q}| \le \frac{1}{qM}\) together with \(1 \le q \le M\) and \((a,q)=1\). When t is in \({{\mathfrak {m}}}\) we see that \(\frac{a}{q}\) is in [0, 1] since \({{\mathfrak {m}}} \subseteq [\frac{1}{M}, 1-\frac{1}{M})\). Consequently, we also have \(0 \le a \le q\). Since, however, t is not in \({{\mathfrak {M}}}\), we must then have \(Q < q\) on account (53). We then conclude using \(q^2 \le qM\) that for each t in \({{\mathfrak {m}}}\) there are integers a and \(q \ne 0\) with \((a,q) =1\) satisfying

$$\begin{aligned} |t- \frac{a}{q}|\le \frac{1}{q^2} \quad \text { and }\quad Q < q \le M. \end{aligned}$$
(60)

Next, for a given class r modulo W, we temporarily let \(a(n) = 1\) when \(n \equiv r\, \text {mod}\, W\) and \(a(n) = 0\) otherwise. Then on setting \(P =\sqrt{5N}\) and \(T(u) = \sum _{0 \le n \le u} a(n)e\left( n^2t\right) \) for a given t in \({{\mathfrak {m}}}\) and integrating by parts we get

$$\begin{aligned} \sum _{ n \, \equiv \, r\, \mathrm{mod}\, W } 2n \beta (n^2)e(n^2 t) = \int _{0}^{P}2u\beta (u^2) dT(u) \ll \sqrt{N} \sup _{0 \le u \le P} |T(u)|, \end{aligned}$$
(61)

since \(u \mapsto 2u\beta (u^2)\) is monotonic on each of the intervals on \([0, \frac{P}{\sqrt{2}}]\) and \((\frac{P}{\sqrt{2}}, P]\). By means of the classical Weyl squaring and differencing argument, given, for example, on page 17 of [6], and remarking that for any \(n, a(n)a(n+h)\) is a(n) when W|h and is 0 otherwise, we obtain

$$\begin{aligned} |T(u)|^2 \le \sum _{0 \le n \le u} a(n) + \sum _{\begin{array}{c} 1 \le |h| \le u,\\ W|h. \end{array}} \left| \sum _{n \in I(h)} a(n) e(2htn) \right| , \end{aligned}$$
(62)

for all \(u, 0 \le u \le P\), where I(h) is an interval of length \(u -|h| \le P\). If N is large enough so that \(P \ge W\), the first term on the right hand side of (62) is \(\le \frac{2P}{W}\). Also, on using (2), page 40 of [6] to bound the sum over \(n \in I(h)\) in (62) we get

$$\begin{aligned} \sum _{\begin{array}{c} 1 \le |h| \le u,\\ W|h. \end{array}} \left| \sum _{n \in I(h)} a(n) e(2htn) \right| \ll \sum _{\begin{array}{c} 1 \le |k| \le 2PW, \\ 2W^2 |k. \end{array}} \inf \left( \frac{P}{W}, \frac{1}{\Vert kt\Vert }\right) . \end{aligned}$$
(63)

We estimate the right hand side of (63) ignoring the condition \(2W^2 |k\) and applying (9), page 41 of [6] with \(\frac{a}{q}\) as in (60). This together with (62) gives

$$\begin{aligned} |T(u)|^2 \ll \frac{P^2}{q} + PW\log q + \frac{P}{W} + q\log q \ll \frac{P^2}{Q} \ll \frac{N}{Q}, \end{aligned}$$
(64)

for all \(u, 0 \le u \le P\). Combining (64) with (61), (51) and applying the triangle inequality we obtain (59). From (59) and (56) we then conclude that for all N large enough, depending only on A and \(\ell \), we have

$$\begin{aligned} \int _{{{\mathfrak {m}}}} |\widehat{S}(t)|^{11}\,|\psi (t)| \, \; dt \ll \frac{{N}}{A^{6}} \int _{0}^{1} |\widehat{S}(t)|^{11} \; dt \;\ll \frac{N|S|^9}{A^3} \ll \frac{|S|^{11}}{A}, \end{aligned}$$
(65)

since \(|S| \ge N^{\frac{1}{2}}/A\). It now follows that

$$\begin{aligned} \int _{{{\mathfrak {m}}}} \widehat{S}(t)^6 \widehat{S}(-t)^{5}\psi (-t)\, dt \;\ll \; \frac{|S|^{11}}{A}. \end{aligned}$$
(66)

3.2 The function \(\psi \) on a major arc

For any integers aq and r, with \( q >0\), we set \(G_{r}(a,q) = \sum _{ 0 \le m < q} e\left( \frac{a(r+mW)^2}{q}\right) \).

Lemma 3.1

Let a and q be any integers satisfying (53). Then for all t in the major arc \({{\mathfrak {M}}}(\frac{a}{q})\) we have

$$\begin{aligned} \psi (t) = \frac{1}{qW} \sum _{\begin{array}{c} 0 \le r < W, \\ (r, W) = 1. \end{array}} G_{r}(a,q) \; \overline{\widehat{\beta }}\left( t-\frac{a}{q}\right) + O(Q\phi (W)\sqrt{N}(\log N)^2). \end{aligned}$$
(67)

Proof

Let \(\theta = t-\frac{a}{q}\) and \(\eta (u) = 2u\beta (u^2)e(u^2\theta )\) for any real u. Then we have

$$\begin{aligned} \sum _{ n \, \equiv \, r\, \mathrm{mod}\, W } 2n \beta (n^2)e(n^2 t) = \sum _{j} \eta (r+jW)e\left( \frac{a(r+jW)^2}{q}\right) . \end{aligned}$$
(68)

We split j on the right hand side of (68) into arithmetical progressions modulo q and sum both sides over r to get

$$\begin{aligned} \psi (t) = \sum _{\begin{array}{c} 0 \le r< W, \\ (r, W) = 1. \end{array}}\, \,\sum _{0 \le m < q} e\left( \frac{a(r+mW)^2}{q}\right) \sum _{ k } \eta (r+(m+kq)W). \end{aligned}$$
(69)

Let \(\varphi (u) = \eta (r+(m+uq)W)\) for all real u. Then \(\varphi \) is a continuous compactly supported function on \(\mathbf{R}\). Its support is the union of two disjoint intervals on the interior of each of which \(\varphi \) is differentiable. Applying the Euler–Mclaurin formula to \(\varphi \) on each of these intervals and adding the results we obtain

$$\begin{aligned} \sum _{k } \varphi (k) = \int \varphi (u) du + O\left( \sup _{u}|\varphi (u)| + \int |\varphi ^{\prime }(u)| du\right) . \end{aligned}$$
(70)

The left hand side of (70) is the same as the sum over k in (69). From the definitions of \(\eta , \beta \) we have \(\sup _{u}|\varphi (u)| \ll \sqrt{N}\). By means of the change of variable \((r+(m+uq)W)^2 \mapsto u\) we see that \(\int \varphi (u) du = \frac{1}{qW}{\overline{\widehat{\beta }}}(\theta )\). Finally, the change of variable \((r+(m+uq)W) \mapsto u\) gives \(\int |\varphi ^{\prime }(u)| du= \int |\eta ^{\prime }(u)| du\). From the definition of \(\eta (u)\) we have

$$\begin{aligned} \eta ^{\prime }(u) = 2\beta (u^2)e(u^2\theta ) + 4u^2 \beta ^{\prime }(u^2)e(u^2\theta ) + 8\pi iu^2 \theta \beta (u^2)e(u^2\theta ), \end{aligned}$$
(71)

which gives \(|\eta ^{\prime }(u)| \ll \frac{N}{M}\), since \(0 \le \beta (u^2) \le 1, | \beta ^{\prime }(u^2)| \ll \frac{1}{N}, |\theta |\le \frac{1}{M}\) and \(u^2 \ll N\) for u in the support of \(\eta ^{\prime }\). Since the measure of this support is \(\sqrt{5N}\), we conclude on recalling the definition of M that \(\int |\varphi ^{\prime }(u)| du \ll \sqrt{N}(\log N)^2\). The preceding remarks together with (70) and (69) and the triangle inequality yield (67). \(\square \)

The following lemma gives the key parts of Lemmas 5.2 and 5.3 of [2] in our context. The conclusions of this lemma explain the utility of the condition \((r, W) =1\) in the definition (51) of  \(\psi (t)\).

Lemma 3.2

Let a and q be integers satisfying (53) and r any integer coprime to W. Then we have

  1. (i)

    \(G_{r}(a,q) =0\) unless q|2W or there is a prime \(p >w \) such that p|q.

  2. (ii)

    \(\frac{1}{q}|G_{r}(a,q)|\le \sqrt{\frac{2}{w}}\) when q does not divide 2W.

Proof

Following [2], we first note that for any integers \(c_0, c_1, c_2\) and \(d >0\), if P(X) is the quadratic polynomial \(c_0X^2 +c_1X +c_2\) and \(d_1, d_2 >0\) are integers such that \(d = d_1d_2\) and \(d_2|c_0\) then

$$\begin{aligned} \sum _{0 \le m< d} e\left( \frac{P(m)}{d}\right) = \sum _{0 \le m_1< d_1} e\left( \frac{P(m_1)}{d}\right) \sum _{0\le m _2 < d_2} e\left( \frac{c_1m_2}{d_2}\right) . \end{aligned}$$
(72)

This is verified by remarking that the map \((m_1, m_2) \mapsto m_1 + m_2d_1\) is a bijection from \([0, d_1 ) \times [0, d_2)\) to [0, d) and that if \(m = m_1 +m_2d_1\), then \(\frac{P(m)}{d}- \frac{P(m_1)}{d}- \frac{c_1m_2}{d_2} \in \mathbf{Z}\), because \(d|c_{0}d_1\). Now the sum over \(m_2\) on the right hand side of (72) is 0 unless \(d_2 |c_1\). Therefore the sum on the left hand side of (72) is also 0 unless \(d_2 |c_1\). Using this with \(P(X) = a(r + WX)^2= aW^2 X^2 + 2WarX + ar^2, d_1 = \frac{q}{(q, W^2)}\) and \(d_2 = (q, W^2)\), we deduce that for any integer r coprime to W we have \(G_{r}(a,q) = 0\) unless \((q,W^2)|2W\), since ar is coprime to \((q, W^2)\). If for any integer m and prime p, we write \(v_{p}(m)\) for the exponent of p in the prime factorisation of m, then the condition \((q,W^2)|2W\) is equivalent to \(\inf (v_{p}(q), 2v_{p}(W)) \le v_{p}(2W)\) for all primes p|2W. From the definition of W in (50) we have \(2v_{p}(W) > v_{p}(2W)\) for all primes p|2W. Consequently, \(G_{r}(a,q) = 0\) unless \(v_{p}(q) \le v_{p}(2W)\) for all primes p|2W, which is the same as (i).

In light of the preceding paragraph, we may verify (ii) supposing that \((q,W^2) | 2W\) and \(\frac{q}{(q,W^2)} > w\). Let us set \(Q(X) = \frac{aW^2}{(q,W^2)} X^2 + \frac{2War}{(q,W^2)}X\). Then with \(P(X), d_1\) and \(d_2\) as above we obtain from (72) that

$$\begin{aligned} \frac{1}{q}|G_{r}(a,q)| = \frac{d_2}{q} \left| \sum _{0 \le m_1 < d_1} e\left( \frac{Q(m_1)}{d_1}\right) \right| \le \sqrt{\frac{2(q,W^2)}{q}} \le \sqrt{\frac{2}{w}}, \end{aligned}$$
(73)

on remarking that \(\left| \sum _{0 \le m_1 < d_1} e\left( \frac{Q(m_1)}{d_1}\right) \right| \le \sqrt{2d_1}\), by the classical quadratic Weyl bound, applicable since the leading coefficient of Q(X) and \(d_1\) are coprime. \(\square \)

3.3 The major arc contribution

In this subsection we complete the proof of (7). Let us first dispose of the first term in (55), which we denote here by T. Also, we shall write \(T_1\) for

$$\begin{aligned} \sum _{\begin{array}{c} 0 \le r< W, \\ (r, W) = 1. \end{array}} \sum _{1 \le q \le Q} \frac{1}{qW} \sum _{\begin{array}{c} 0 \le a < q,\\ (a,q) =1. \end{array}} {G_{r}}(-a,q) \;\int _{{{\mathfrak {M}}}(\frac{a}{q})} {\widehat{\beta }}\left( t-\frac{a}{q}\right) \widehat{S}(t)^6 \widehat{S}(-t)^{5}\, dt. \end{aligned}$$
(74)

Then by substituting the complex conjugate of right hand side of (67) for \(\psi (-t) = {\overline{\psi }}(t)\) in T and using the triangle inequality together with (58) we deduce that

$$\begin{aligned} T-T_{1} \ll Q W\sqrt{N}(\log N)^2 \int _{0}^{1} |\widehat{S}(t)|^{11} dt \ll A^3 QW |S|^{9}\sqrt{N}(\log N)^2. \end{aligned}$$
(75)

If we now set

$$\begin{aligned} T(W) = \sum _{\begin{array}{c} 0 \le r< W, \\ (r, W) = 1. \end{array}} \sum _{ q|2W} \frac{1}{qW} \sum _{\begin{array}{c} 0 \le a < q,\\ (a,q) =1. \end{array}} G_{r}(-a,q) \;\int _{{{\mathfrak {M}}}(\frac{a}{q})} {\widehat{\beta }}\left( t-\frac{a}{q}\right) \widehat{S}(t)^6 \widehat{S}(-t)^{5}\, dt. \end{aligned}$$
(76)

then by (ii) of Lemma 3.2 combined with the triangle inequality and (58) we get

$$\begin{aligned} T_{1} - T(W) \ll \frac{\phi (W)\Vert \widehat{\beta }\Vert _{\infty } |S|^{9} A^3}{W\sqrt{w}} \ll \frac{A^3 |S|^9 N}{\sqrt{w}}, \end{aligned}$$
(77)

since \(\Vert \widehat{\beta }\Vert _{\infty } = \sup _{t \in \mathbf{R}}|\widehat{\beta }(t)|\le \frac{5N}{2}\). From (77), (75) and on recalling that \(|S| \ge \frac{\sqrt{N}}{A}\) and \(w = A^{l} \ge A^{12}\) we conclude that

$$\begin{aligned} T = T(W) + O\left( \frac{|S|^{11}}{A}\right) , \end{aligned}$$
(78)

when N is sufficiently large, depending only on A and l. Let us now estimate T(W). When q|2W we have \((r+mW)^2 \equiv r^2\) modulo q for all integers m, since \(2W |W^2\). Therefore we have \(G_{r}(-a,q) = qe\left( -\frac{ar^2}{q}\right) \) when q|2W, for all \(0 \le a <q\). Furthermore, since \(r \mapsto r+W\) is a bijection from the integers coprime to 2W in [0, W) to those in (W, 2W] coprime to 2W, we obtain

$$\begin{aligned} \frac{1}{qW} \sum _{\begin{array}{c} 0 \le r< W, \\ (r, W) = 1. \end{array}} G_{r}(-a,q) = \frac{1}{2W} \sum _{\begin{array}{c} 0 \le r < 2W, \\ (r, 2W) = 1. \end{array}} e\left( -\frac{ar^2}{q}\right) \end{aligned}$$
(79)

for any q|2W and all \(0 \le a <q\). Also, we have \(\widehat{S}(t)^6 \widehat{S}(-t)^{5} = \sum _{x \in S^{11}} e\left( f(x)t\right) \), with f(x) as in (7). By means of the change of variable \(t -\frac{a}{q} \mapsto t\) in the integrals in (76) we then see that

$$\begin{aligned} T(W) = \frac{1}{2W} \sum _{\begin{array}{c} 0 \le r< 2W, \\ (r, 2W) = 1. \end{array}} \sum _{ q|2W} \sum _{\begin{array}{c} 0 \le a < q,\\ (a,q) =1. \end{array}} \int _{-\frac{1}{M}}^{\frac{1}{M}} {\widehat{\beta }}\left( t\right) \sum _{x \in S^{11}} e\left( tf(x)\right) e\left( \frac{a(f(x) -r^2)}{q} \right) \, dt. \end{aligned}$$
(80)

Finally, on interchanging summations and remarking that

$$\begin{aligned} \frac{1}{2W} \sum _{ q|2W} \sum _{\begin{array}{c} 0 \le a< q,\\ (a,q) =1. \end{array}} e\left( \frac{a(f(x) -r^2)}{q} \right) = \frac{1}{2W} \sum _{{0 \le a<2W}}e\left( \frac{a(f(x) -r^2)}{2W} \right) \end{aligned}$$
(81)

we conclude that the right hand side of (80) is the same as the left hand side of

$$\begin{aligned} \sum _{\begin{array}{c} 0 \le r< 2W, \\ (r, 2W) = 1. \end{array}} \sum _{\begin{array}{c} x \in S^{11}, \\ f(x)\equiv r^2 \mathrm {mod}\, 2W \end{array}} \int _{-\frac{1}{M}}^{\frac{1}{M}} {\widehat{\beta }}(t) e\left( tf(x)\right) dt \le \sum _{\begin{array}{c} 0 \le r < 2W, \\ (r, 2W) = 1. \end{array}} \sum _{\begin{array}{c} x \in S^{11}, \\ f(x)\equiv r^2 \mathrm {mod}\, 2W \end{array}} 1, \end{aligned}$$
(82)

where we have used \(|\int _{-\frac{1}{M}}^{\frac{1}{M}} {\widehat{\beta }}(t) e(tf(x)) dt| \le \int _\mathbf{R} \widehat{\alpha }(t) dt = 1\), since \(|{\widehat{\beta }}(t)| = \widehat{\alpha }(t)\) for all \(t \in \mathbf{R}\). For each class b in \(\mathbf{Z}/{2W}{} \mathbf{Z}\), the number of r in [0, 2W) coprime to 2W and such that \(r^2 \equiv b\) modulo 2W is \(2\tau (U)\). Then it follows from (82) and (80) that

$$\begin{aligned} T(W) = 2\tau (U)|\{ x \in S^{11}\,|\, f(x) \;\hbox { an invertible square mod}\ 2W \} |. \end{aligned}$$
(83)

Since \(W=2U\) we obtain (7) on combining (83) with (78), (66) and recalling that (55) is the same as the integral in (52).

3.4 Proof of Theorem 1.3 completed

It remains only to bound (8) using Theorem 2.1. Let \({{\mathcal {Z}}}\) be the set of integers \(n >0\) such that \(n^2 \in S\). The set \({{\mathcal {Z}}}\) is contained in [M, 2M) with \(M = \sqrt{N}\) and satisfies \(|{{\mathcal {Z}}}| \ge \frac{M}{A}\) and \(|\{ z \in {{\mathcal {Z}}}| z \equiv a \,\mathrm{mod}\, U\}| \le \frac{MB}{U}\) with \(B =2\), when N is sufficiently large depending on A and l. Finally, let \(I = S^{9}\) and for any \(x = (x_1, x_2, \ldots , x_9) \in S^{9}\) we set \(c(x) = x_1 + \cdots +x_4 -x_5- \cdots -x_9\). Then with \({R}_{U}({{\mathcal {Z}}}, \mathbf{c})\) as in Theorem 2.1 we have that

$$\begin{aligned} |\{ x \in S^{11}\,|\, f(x) \;\hbox { an invertible square modulo}\ 2W \} | \le |{R}_{U}({{\mathcal {Z}}}, \mathbf{c})|, \end{aligned}$$
(84)

since U|2W. On combining the bound for \(|{R}_{U}({{\mathcal {Z}}}, \mathbf{c})|\) given by Theorem 2.1 with (84) and (7) we finally obtain (6), as required.

4 Monochromatic representation

Here we deduce Theorem 1.1 from Theorem 1.3. We first take up Lemma 1.2.

4.1 Proof of Lemma 1.2

A standard application of the Cauchy–Schwarz inequality gives \(|mS|\, E_{m}(S) \ge |S|^{2m}\). Using (3) and \(L \ge 2(mD+1)D\) we then obtain

$$\begin{aligned} |mS| \ge \frac{L}{D} \ge \frac{mL}{k+1} + 2 \end{aligned}$$
(85)

for any \(k \ge mD\). We take k to be the integer \(\lceil mD \rceil \). Since the set mS contained in the interval \((mN, mN+mL]\), its translate \(mS - mN\) is contained in [1, mL] and satisfies \((k+1)(|mS-mN|-2) + 1 \ge {mL}\) on account of (85). Then by means of Theorem \(2^{\prime }\), page 129 of [5] applied to the set \(mS-mN\) we conclude that there are integers hd and e with \(1 \le h \le 2k+1\) and \(1 \le d \le k\) such that hmS contains the arithmetical progression

$$\begin{aligned} {{\mathcal {A}}} = hmN + \{ (e+1)d, (e+2)d, \ldots , (e +mL)d\}, \end{aligned}$$
(86)

of mL terms and to the modulus d. Since \(hmS \subseteq (hmN, hm(N+L)]\), each a in \({{\mathcal {A}}}\) satisfies

$$\begin{aligned} hmN < a \le hm(N+L) \le (2\lceil mD \rceil +1)m(N+L). \end{aligned}$$
(87)

Since \(1 \le d \le \lceil mD \rceil \), there is an integer x in S coprime to d. Also, we have \(x \le mL\) since \(x \le N+L, L \ge N\) and \(m \ge 2\). Therefore the number of terms in the arithmetical progression \({{\mathcal {A}}}\) is at least x and its modulus d is coprime to x. Consequently, \({{\mathcal {A}}}\) contains a complete system of residue classes modulo x and every integer n can be written as \(n = a + rx\) with a in \({{\mathcal {A}}} \) and \(r \in \mathbf{Z}\). For any integer \(n \ge (2\lceil mD \rceil +1)m(N+L)\) we have from \(N \le x\) and the lower bound for a in (87) that

$$\begin{aligned} 0 \le r = \frac{n-a}{x} \le \frac{n}{N} - hm. \end{aligned}$$
(88)

Since each \(a \in {{\mathcal {A}}}\) is a sum of hm elements of S, the conclusion of the lemma now follows.

4.2 Proof of Theorem 1.1

Since s(K) is increasing with K, it suffices to prove Theorem 1.1 for all K sufficiently large. For such a K, let \(\cup _{1 \le i \le K} {{\mathfrak {Q}}}_i\) be a partition of the set of squares \({{\mathfrak {Q}}}\) into K disjoint subsets.

As in Sect. 1, let \({{\mathcal {B}}}\) be the set of squares of integers that are not divisible by any prime \(p \le B\), where \(B = K^{13}\), and let \({{\mathcal {B}}}(N)\) denote \({{\mathcal {B}}} \cap (N, 4 N]\), for a given integer \(N \ge 1\). Then for all \(N \ge N_{0}\), with \(N_{0}\) depending only on K, we have by the principle of inclusion and exclusion and Mertens’ formula as given by (3.27), page 70 of [9] that

$$\begin{aligned} |{{\mathcal {B}}}(N)| \ge N^{\frac{1}{2}}\prod _{p\le B}\left( 1- \frac{1}{p}\right) - 2^{B} \ge \frac{N^{\frac{1}{2}}}{4\log B} - 2^{B} \ge \frac{N^{\frac{1}{2}}}{100\log K} \ge e^{e^2}K. \end{aligned}$$
(89)

Let N be an integer \(\ge N_{0}\). There is an \(i, 1 \le i \le K\), such that \({{\mathfrak {Q}}}_i \cap {{\mathcal {B}}(N)}\) contains at least \(\frac{|{{\mathcal {B}}(N)}|}{K}\) of the elements of \({{\mathcal {B}}(N)}\). For such an i we set \(S = {{\mathfrak {Q}}}_i \cap {{\mathcal {B}}}(N)\). Then S is a set of squares in (N, 4N] with \(|S| \ge \frac{N^{\frac{1}{2}}}{A}\), where \(A = 100 K \log K \ge e^{e^2}\) and no integer in S is divisible by a prime \(p \le A^{12}\), since \(A^{12} \le B\) when K is sufficiently large. It now follows from Theorem 1.3 that (3) holds with \(m =6, L =3N\) and \(D = A\exp \left( \frac{\left( 3\log 2 + o(1)\right) \log A}{\log \log A}\right) \). Since S contains an element of \({{\mathcal {B}}}\) and since \(\lceil 6D\rceil \le B\) when K is large enough, we may apply Lemma 1.2 to S to deduce that every integer \(n \ge (288D+72)N\) is a sum of no more than \(\frac{n}{N}\) elements of S. In particular, there is a \(C_1 >0\) such that every integer \(I(N) = ((288D+72)N, (288D +73)N]\) is a sum of at most \(C_1 D\) squares all belonging to S and therefore to \({{\mathfrak {Q}}}_i\). Thus for all large enough N, every integer in the interval I(N) can be expressed as a sum of no more than \(C_1D\) squares all of the same colour. On remarking that the interval I(N) meets \(I(N+1)\) for all large enough N, we obtain that \(s(K) \le C_1 D\). This yields the conclusion of Theorem 1.1 since \(A = 100 K \log K\) and therefore \(C_1 D \le K\exp \left( \frac{\left( 3\log 2 + o(1)\right) \log K}{\log \log K}\right) \).