1 Introduction

In the last 15 years, the circle method has been applied to problems in Harmonic Analysis, Ergodic Theory, and Partial Differential Equations. See e.g., [1, 2, 4,5,6, 11, 14, 15, 19, 21, 23, 27,28,29, 34]. Thus it seems worthwhile to have an introductory exposition to the topic. The present article tries to give such an exposition. As such, it is a slightly expanded version of a talk given at the IAS Park City Institute in July 2003. Over the last several years, I have been working with A. Magyar and E. M. Stein on several applications of the circle method. Conversations I have had with Magyar and Stein have greatly enriched my understanding of this method. I have also profited greatly from Magyar’s paper [14] and unpublished lecture notes of Stein [25].

The circle method of Hardy, Littlewood, and Ramanujan is a method of studying asymptotically the number of solutions of diophantine equations. For example, Hardy and Littlewood [10] (with later improvements by Vinogradov [32]) studied the number of representations of an integer m as a sum of \(\ell \) kth powers. That is they studied the number of solutions in positive integers \(n_1, n_2,\ldots , n_{\ell }\) of the equation

$$\begin{aligned} m=n_{1}^k+\cdots +n_{\ell }^k. \end{aligned}$$

Another example is a theorem of Vinogradov [32] asserting that every sufficiently large odd integer can be written as a sum of three primes.

We will begin by considering \(r_{d}(\lambda )\), the number of lattice points in \(\mathbb {R}^d\) on the sphere centered at the origin of radius \(\lambda \). A lattice point in \(\mathbb {R}^d\) is a point \(n=(n_1, n_2,\ldots , n_{d})\) with \(n_1, n_2,\ldots , n_{d}\) integers. Then we shall give a brief discussion of Vinogradov’s Theorem asserting that every sufficiently large odd number can be written as a sum of 3 primes, and will try to explain why it seems so difficult to use the method to show that every sufficiently large even integer can be written as a sum of 2 primes. Finally, we will give some references for further reading.

2 The Number of Lattice Points on the Sphere of Radius \(\lambda \)

\(r_{d}(\lambda )\), the number of lattice points in \(\mathbb {R}^d\) on the sphere of radius \(\lambda \) centered at the origin, is the number of solutions in integers \((n_1, n_2,\ldots , n_{d})\)of the equation

$$\begin{aligned} \lambda ^2=n_{1}^2+\cdots +n_{d}^2. \end{aligned}$$
(1)

So \(r_{d}(\lambda )=0\) unless \(\lambda ^2\) is an integer, and we will always assume \(\lambda ^2\) is an integer. Then there is the following theorem.

Theorem 1

For \(d\ge 5\), there are positive constants \(c_{1}(d)\) and \(c_{2}(d)\) such that

$$\begin{aligned} c_{1}(d)\lambda ^{d-2}\le r_{d}(\lambda )\le c_{2}(d)\lambda ^{d-2}. \end{aligned}$$

See [8, 13] if \(d\ge 6\). The statement is false for \(d\le 4\). Note that the power of \(\lambda \) that occurs in Theorem 1 is \(\lambda ^{d-2}\) while the area of the corresponding sphere in \(\mathbb {R}^d\) is \(C(d)\lambda ^{d-1}\). We should expect the power \(\lambda ^{d-2}\) to arise for the following reason: We expect the number of lattice points in the annulus \(\Lambda \le |x|\le 2\Lambda \) to be about \(c\Lambda ^{d}\) for large \(\Lambda \). On the other hand, the number of spheres having lattice points of radius \(\lambda \) with \(\Lambda \le \lambda \le 2\Lambda \) is the number of \(\lambda \) with \(\Lambda \le \lambda \le 2\Lambda \) such that \(\lambda ^2\) is an integer, and thus the number of integers in the interval \([\Lambda ^2,4\Lambda ^2]\). Thus if \(r_{d}(\lambda )\sim \lambda ^a\), \(\Lambda ^a\cdot \Lambda ^2\sim \Lambda ^d\) so \(a=d-2\).

We shall try to outline the proof of the upper bound in Theorem 1 by the circle method, and the lower bound for \(d\ge 27\). We will then briefly indicate how to obtain the lower bound for \(5\le d\le 27\).

To prove Theorem 1, one shows

$$\begin{aligned} r_{d}(\lambda )=M_{d}(\lambda )\lambda ^{d-2}+E_{d}(\lambda ), \end{aligned}$$
(2)

where

$$\begin{aligned} C_{1}(d)\le M_{d}(\lambda )\le C_{2}(d) \end{aligned}$$

with \(C_{1}(d)\) and \(C_{2}(d)\) positive, and

$$\begin{aligned} |E_{d}(\lambda )|\le C\lambda ^{d/2}. \end{aligned}$$

Because in Eq. (1), the power of the \(n_{j}\) is 2, other methods can be used to study \(r_{d}(\lambda )\). In particular, Hardy [9] showed \(E_{d}(\lambda )=0\) if \(5\le d\le 8\). See [8, 13]. Even when \(d=3\) or 4, it can be shown that \(E_{d}(\lambda )=0\). See [3, 26]. A better understanding of this can be found in the work of Mordell [17]. See [20] for a more recent development using the modular group. In general, the estimate for \(E_{d}(\lambda )\) can be improved. See [12].

\(M_{d}(\lambda )\) itself is complicated to describe. In particular, it involves Gauss sums, S(aq). If \((a,q)=1\), that is a and q are relatively prime, and \(1\le a\le q\),

$$\begin{aligned} S(a,q)=\sum _{n=0}^{q-1} e^{-2\pi in^2a/q}. \end{aligned}$$

If \(q=1\), the only integer a with \((a,q)=1\) is \(a=1\) and

$$\begin{aligned} S(1,1)=1. \end{aligned}$$

If \(q=2\), the only integer a with \((a,q)=1\) is again \(a=1\) and

$$\begin{aligned} S(1,2)=\sum _{n=0}^{1} e^{-2\pi in^2 1/2}=1+e^{-i\pi }=1-1=0. \end{aligned}$$

There is the following estimate for the size of S(aq).

Lemma 2

$$\begin{aligned} |S(a,q)|\le \sqrt{2q}. \end{aligned}$$

We defer the proof of Lemma 2 until later.

Now

$$\begin{aligned} M_{d}(\lambda )=C(d)\sum _{q=1}^{\infty }\mathop \sum \limits _{\mathop {a=1}\limits _{(a,q)=1}}^{q}e^{-2\pi i\lambda {^2}\frac{a}{q}} \left( \frac{1}{q}S(a,q)\right) ^{d}. \end{aligned}$$
(3)

\( M_{d}(\lambda )\) is generally referred to as the singular series. Note that Lemma 2 easily implies that \( |M_{d}(\lambda )|\le C(d)\) for \(d\ge 5\). Also our remarks on S(1, 1) and S(1, 2) together withLemma 2 imply

$$\begin{aligned} |M_{d}(\lambda )|\ge C(d)\left( 1-\sum _{q\ge 3}^{\infty }q\cdot \left( \sqrt{\frac{2}{q}}\right) ^{d}\right) >\overline{C}(d) \end{aligned}$$

if \(d\ge 27\). The condition \(d\ge 27\) could of course easily be improved.

In thinking about \(r_{d}(\lambda )\), our first task is to change the combinatorial problem of studying the number of solutions of Eq. (1) to an analytic problem. A key observation is that

$$\begin{aligned} \int _{-\frac{1}{2}}^{\frac{1}{2}}\,e^{2\pi i(\lambda ^2-n_{1}^2-\cdots -n_{d}^2)\theta }\,\text {d}\theta =\left\{ \begin{array}{ll} 1, &{} \quad \hbox {if}\; n_{1}^2+\cdots +n_{d}^2=\lambda ^2; \\ 0, &{}\quad \hbox {otherwise.} \end{array} \right. \end{aligned}$$

So

$$\begin{aligned} r_{d}(\lambda )=\sum _{n_{1},\ldots ,n_{d}=-\infty }^{\infty } \int _{-\frac{1}{2}}^{\frac{1}{2}}\,e^{2\pi i(\lambda ^2-n_{1}^2-\cdots -n_{d}^2)\theta }\,\text {d}\theta \end{aligned}$$
(4)

or formally

$$\begin{aligned} r_{d}(\lambda )=\int _{-\frac{1}{2}}^{\frac{1}{2}}e^{2\pi i\lambda ^2\theta }\sum _{n_{1},\ldots ,n_{d}=-\infty }^{\infty } \int _{-\frac{1}{2}}^{\frac{1}{2}}\,e^{-2\pi i(n_{1}^2+\cdots +n_{d}^2)\theta }\,\text {d}\theta \end{aligned}$$

or

$$\begin{aligned} r_{d}(\lambda )=\int _{-\frac{1}{2}}^{\frac{1}{2}}e^{2\pi i\lambda ^2\theta }\left( \sum _{n=-\infty }^{\infty } e^{2\pi i n^2\theta }\right) ^{d}\,\text {d}\theta . \end{aligned}$$
(5)

Of course, the infinite sun in (5) does not converge. To make (5) rigorous, we can either truncate the sum in (4) or introduce an \(\epsilon \). It turns out that in the case of squares, introducing an \(\epsilon >0\) is more convenient. Thus, we note that

$$\begin{aligned}&\int _{-\frac{1}{2}}^{\frac{1}{2}}\,e^{2\pi i\lambda ^2\theta }\,e^{-2\pi i(n_{1}^2+\cdots +n_{d}^2)\theta }\, e^{-2\pi \epsilon (n_{1}^2+\cdots +n_{d}^2)}\,\text {d}\theta \\&=\left\{ \begin{array}{ll} e^{-2\pi \epsilon (n_{1}^2+\cdots +n_{d}^2)}=e^{-2\pi \epsilon \lambda ^2}, &{} \hbox {if}\quad n_{1}^2+\cdots +n_{d}^2=\lambda ^2; \\ 0, &{} \hbox {otherwise.} \end{array} \right. \end{aligned}$$

So

$$\begin{aligned} r_{d}(\lambda )&=e^{2\pi \epsilon \lambda ^2}\sum _{n_{1},\ldots ,n_{d}=-\infty }^{\infty } \int _{-\frac{1}{2}}^{\frac{1}{2}}\,e^{-2\pi i(n_{1}^2+\cdots +n_{d}^2)\theta }\, e^{-2\pi \epsilon (n_{1}^2+\cdots +n_{d}^2)\theta }\,\text {d}\theta \\&=e^{2\pi \epsilon \lambda ^2}\int _{-\frac{1}{2}}^{\frac{1}{2}}e^{2\pi i \lambda ^2 \theta } \left( \sum _{n=-\infty }^{\infty }e^{-2\pi (\epsilon +i\theta )(n_{1}^2+\cdots +n_{d}^2)}\right) ^{d}\,\text {d}\theta , \end{aligned}$$

or

$$\begin{aligned} r_{d}(\lambda )=e^{2\pi \epsilon \lambda ^2}\int _{-\frac{1}{2}}^{\frac{1}{2}}e^{2\pi i \lambda ^2 \theta } \{F(\epsilon +i\theta )\}^{d}\,\text {d}\theta . \end{aligned}$$
(6)

where

$$\begin{aligned} F(\epsilon +i\theta )=\sum _{n=-\infty }^{\infty }e^{-2\pi (\epsilon +i\theta )n^2}. \end{aligned}$$
(7)

We will always take \(\epsilon =\dfrac{1}{\lambda ^2}\) so that the factor \(e^{2\pi \epsilon \lambda ^2}\) will be a constant. To study the analytical problem posed by (6), we have to understand \(F(\epsilon +i\theta )\), which is of course essentially a classical theta function. A convenient way to study \(F(\epsilon +i\theta )\) is via the Poisson summation formula. The Poisson summation formula asserts that under suitable hypothesis on a function f,

$$\begin{aligned} \sum _{n} f(n)=\sum _{n} \hat{f}(n) \end{aligned}$$

where

$$\begin{aligned} \hat{f}(\xi )=\int _{-\infty }^{\infty }e^{-2\pi x\cdot \xi }f(x)\,\text {d}x. \end{aligned}$$

See [30].

We are going to apply the Poisson summation formula with

$$\begin{aligned} f(x)=e^{-2\pi (\epsilon +i\theta )x^2}. \end{aligned}$$

Now \(e^{-\pi x^2}\) is its own Fourier transform. See [30]. So by a change of variables

$$\begin{aligned} \hat{f}(\xi )=\frac{1}{\sqrt{2(\epsilon +i\theta )}}e^{-\pi \frac{\xi ^2}{2(\epsilon +i\theta )}}. \end{aligned}$$

Then the Poisson summation formula asserts

$$\begin{aligned} F(\epsilon +i\theta )=\sum _{n=-\infty }^{\infty }\frac{1}{\sqrt{2(\epsilon +i\theta )}} e^{-\pi \frac{n^2}{2(\epsilon +i\theta )}}. \end{aligned}$$
(8)

If one is lucky in using the Poisson summation formula, the main term in \(\displaystyle \sum \hat{f}(n)\) is \(\hat{f}(0)\). Thus, we might hope

$$\begin{aligned} F(\epsilon +i\theta )=\frac{1}{\sqrt{2(\epsilon +i\theta )}}+\mathrm{Error} \end{aligned}$$
(9)

Just to see that we are on the right track, let us see what would happen if

$$\begin{aligned} F(\epsilon +i\theta )=\frac{1}{\sqrt{2(\epsilon +i\theta )}}. \end{aligned}$$

Then we would have

$$\begin{aligned} r_{d}(\lambda )=C_{d}\int _{-\frac{1}{2}}^{\frac{1}{2}}e^{2\pi i\lambda ^2\theta }\frac{1}{(\epsilon +i\theta )^{d/2}} \,\text {d}\theta . \end{aligned}$$

Now let us make a change of variable \(\theta =x\epsilon \). Then recalling the fact that \(\epsilon =\dfrac{1}{\lambda ^2}\), we arrive at the equation

$$\begin{aligned} r_{d}(\lambda )=C_{d}\,\lambda ^{d-2}\int _{-\frac{\lambda ^2}{2}}^{\frac{\lambda ^2}{2}} e^{2\pi ix}\frac{1}{(1+ix)^{d/2}}\,\text {d}x. \end{aligned}$$

Notice that the integrand is independent of \(\lambda \), so that we would have

$$\begin{aligned} r_{d}(\lambda )=C_{d}\,\lambda ^{d-2}\int _{-\infty }^{\infty } e^{2\pi ix}\frac{1}{(1+ix)^{d/2}}\,\text {d}x+\mathcal {O}(1) \end{aligned}$$

(which of course is too good an error to be true).

It is not hard to see that \(\displaystyle \int _{-\infty }^{\infty } \frac{e^{2\pi ix}}{(1+ix)^{d/2}}\,\text {d}x\ne 0\). If \(d\ge 5\) is an even integer, one sees this by the residue theorem. If \(d=1\), this follows by distorting the contour to an integral over \([i,i\infty ]\) and using the fact that \((1+ix)^{-1/2}\) is multiple valued. If d is a larger odd integer, we can reduce the matters to \(d=1\) by integration by parts.

Let us now return to Eq. (9), and consider the error.

The error is

$$\begin{aligned} \frac{1}{\sqrt{2(\epsilon +i\theta )}}\sum _{n\ne 0}e^{-\pi \frac{n^2}{\epsilon +i\theta }}. \end{aligned}$$

The absolute value of the nth term in the above series is exp\(\left( -C\frac{n^2\epsilon }{\epsilon ^2+\theta ^2}\right) \) for some positive C. Thus, we can expect to control the error only if \(\theta ^2\le \epsilon \), that is, if \(|\theta |\le \dfrac{1}{\lambda }\). If \(|\theta |\le \dfrac{1}{\lambda }\), then

$$\begin{aligned} |\mathrm{Error}|\le C_{1}\,\frac{1}{(\epsilon ^2+\theta ^2)^{1/4}}\,e^{-C_{2}\frac{\epsilon }{\epsilon ^2+\theta ^2}}. \end{aligned}$$

So

$$\begin{aligned} F^{d}(\epsilon +i\theta )&= \left( \frac{1}{2(\epsilon +i\theta )}\right) ^{d/2} +\mathcal {O}\left( \left( \frac{1}{\epsilon ^2+\theta ^2}\right) ^{d/4}\right) \, e^{-C_{3}\frac{\epsilon }{\epsilon ^2+\theta ^2}} \\&=\left( \frac{1}{2(\epsilon +i\theta )}\right) ^{d/2} +\mathcal {O}\left( \frac{1}{\epsilon ^{d/4}}\left( \frac{\epsilon }{\epsilon ^2+\theta ^2}\right) ^{d/4}\right) \,e^{-C_{3}\frac{\epsilon }{\epsilon ^2+\theta ^2}}\\&=\left( \frac{1}{2(\epsilon +i\theta )}\right) ^{d/2} +\mathcal {O}(\lambda ^{d/2}). \end{aligned}$$

This leads to the estimate

$$\begin{aligned} \int _{-\frac{1}{\lambda }}^{\frac{1}{\lambda }}e^{2\pi i \lambda ^2 \theta } \{F(\epsilon +i\theta )\}^{d}\,\text {d}\theta =C(d)\lambda ^{d-2}+\mathcal {O}(\lambda ^{\frac{d}{2}-1}) \end{aligned}$$

Now the thrust of the circle method is that the main contribution to the integral in (6) should come from small intervals around rationals a/q with \(1\le a\le q\), \((a,q)=1\) and q not too large. In the present example, we define

$$\begin{aligned} I(a,q)=\left\{ \theta :\left| \theta -\frac{a}{q}\right| \le \frac{1}{q\lambda }\right\} . \end{aligned}$$

The intervals I(aq) are disjoint for \(q\le \dfrac{\lambda }{20}\) since if \(I(a,q)\cap I(a_1,q_1)\ne \varnothing \)

$$\begin{aligned} \left| \frac{a}{q}-\frac{a_1}{q_1}\right| \le 2\frac{1}{q^{*}\lambda } \end{aligned}$$

where \(q^{*}=\mathrm{min}(q,q_1)\). Since \((a,q)=(a_1,q_1)\)=1,

$$\begin{aligned} \left| \frac{a}{q}-\frac{a_1}{q_1}\right| \ge \frac{1}{qq_1}. \end{aligned}$$

Thus

$$\begin{aligned} \frac{1}{2}q^{*}\lambda \le qq_1, \end{aligned}$$

and \(\mathrm{min}(q,q_1)\ge \dfrac{\lambda }{2}\). Thus

$$\begin{aligned} r_{d}(\lambda )=\sum _{q=1}^{\frac{\lambda }{20}}\mathop \sum \limits _{\mathop {a=1}\limits _{(a,q)=1}}^{q}\int _{I(a,q)}e^{2\pi i \lambda ^2 \theta } \{F(\epsilon +i\theta )\}^{d}\,\text {d}\theta +\int _{ \mathcal {E}_{\lambda } }e^{2\pi i \lambda ^2 \theta } \{F(\epsilon +i\theta )\}^{d}\,\text {d}\theta . \end{aligned}$$

According to well-known principle of Dirichlet, for each \(\theta \in [0,1]\) there is a q with \((a,q)=1\) such that \(\left| \theta -\dfrac{a}{q}\right| \le \dfrac{1}{\lambda q}\), \(q\le \lambda \). Thus

$$\begin{aligned} \mathcal {E}_{\lambda }\subset \bigcup _{\frac{\lambda }{2\theta }\le q\le \lambda } I(a,q). \end{aligned}$$

Now we would like to find an approximation to \(F(\epsilon +i\theta )\) for \(\theta \in I(a,q)\) with \(q\le \lambda \). To this end for \(\theta \in I(a,q)\), we write n in the sum defining \(F(\epsilon +i\theta )\) as

$$\begin{aligned} n=mq+\mu ,\quad 0\le \mu \le q-1. \end{aligned}$$

Thus

$$\begin{aligned} F(\epsilon +i\theta )=\sum _{n=-\infty }^{\infty }e^{-2\pi n^2(\epsilon +i\theta )} =\sum _{\mu =0}^{q-1}\sum _{m=-\infty }^{\infty } e^{-2\pi (mq+\mu )^2(\epsilon +i(\theta -\frac{a}{q})+i\frac{a}{q})}. \end{aligned}$$

Since

$$\begin{aligned} e^{-2\pi i (mq+\mu )^2\frac{a}{q}}= & {} e^{-2\pi i \mu ^2\frac{a}{q}},\\ F(\epsilon +i\theta )= & {} \sum _{\mu =0}^{q-1} e^{-2\pi i \mu ^2\frac{a}{q}}\,F_{\mu }\left( \epsilon +i\left( \theta -\frac{a}{q}\right) \right) . \end{aligned}$$

where

$$\begin{aligned} F_{\mu }\left( \epsilon +i\left( \theta -\frac{a}{q}\right) \right) =\sum _{m=-\infty }^{\infty } e^{-2\pi (mq+\mu )^2(\epsilon +i(\theta -\frac{a}{q}))}. \end{aligned}$$

We study \(F_{\mu }\left( \epsilon +i\left( \theta -\frac{a}{q}\right) \right) \) by the Poisson’s summation formula with \(f(x)=e^{-2\pi (xq+\mu )^2(\epsilon +i(\theta -\frac{a}{q}))}\). Then

$$\begin{aligned} \hat{f}(0)=C(d)\frac{1}{q(\epsilon +i(\theta -\frac{a}{q}))^{1/2}}, \end{aligned}$$

(which is independent of \(\mu \)). Arguing as in the case that \(\dfrac{a}{q}=0\), we find for \(\theta \in I(a,q), q\le \lambda \)

$$\begin{aligned} (F(\epsilon +i\theta ))^{d}=\left( \frac{S(a,q)}{q}\right) ^{d}\frac{C(d)}{(\epsilon +i(\theta -\frac{a}{q}))^{d/2}} +O(\lambda ^{d/2}). \end{aligned}$$

To obtain this approximate expression for \(F(\epsilon +i\theta )\), it is necessary to show if \((a,q)=1\)

$$\begin{aligned} |S(a,q,m)|\le Cq^{1/2} \end{aligned}$$

where

$$\begin{aligned} S(a,q,m)=\sum _{n=1}^{q}e^{-2\pi i n^2\frac{a}{q}}e^{2\pi im\frac{a}{q}}. \end{aligned}$$

This estimate is proved in the same manner as Lemma 2 below.

So, by a change of variables

$$\begin{aligned}&\int _{I(a,q)}(F(\epsilon +i\theta ))^{d}e^{2\pi i\lambda ^{2}\theta }\,\text {d}\theta \\&\quad =C(d)\left( \frac{S(a,q)}{q}\right) ^{d} e^{2\pi i\lambda ^{2}\frac{a}{q}}\int _{|\beta |\le \frac{1}{\lambda q}}\frac{e^{2\pi i\lambda ^{2}\beta }}{(\epsilon +i\beta )^{d/2}}\,d\beta +O(\lambda ^{d/2-1}). \end{aligned}$$

Notice the factors \(\left( \frac{S(a,q)}{q}\right) ^{d} e^{2\pi i\lambda ^{2}\frac{a}{q}}\) are just those arising in the formula (3) for \(M_{d}(\lambda )\). Next we replace the range of integration \(|\beta |\le \dfrac{1}{\lambda q}\) by the entire real axis, making another error of order \(\lambda ^{d/2-1}\). Thus we find

$$\begin{aligned} r_{d}(\lambda )= & {} C(d)\lambda ^{d-2}\sum _{q\le \frac{\lambda }{20}}\sum _{{\mathop {\scriptscriptstyle {(a,q)=1}}\limits ^{a=1}}}^{q}\left( \frac{S(a,q)}{q}\right) ^{d} e^{2\pi i\lambda ^{2}\frac{a}{q}}\\&+O\left( \int _{\mathcal {E}_{\lambda }}|F(\epsilon +i\theta )|^{d}\,\text {d}\theta +O(\lambda ^{d/2})\right) . \end{aligned}$$

But for \(\theta \in {\mathcal {E}_{\lambda }}\), \(q\ge \frac{\lambda }{20}\), so \(|F(\epsilon +i\theta )|\le C\left( \frac{S(a,q)}{q}\right) ^{d}\frac{1}{\epsilon ^{d/2}}\le C\left( \frac{S(a,q)}{q}\right) ^{d}\lambda ^{d/2}\), and finally, since \(\left( \frac{S(a,q)}{q}\right) ^{d}\le C\left( \frac{1}{\lambda }\right) ^{d/2}\) for \(q\ge \frac{\lambda }{20}\), we arrive at the formula (2).

It remains to prove Lemma 2. We use what is commonly called Weyl differencing. See [16] or [33].

$$\begin{aligned} S(a,q)=\sum _{n=0}^{q-1}e^{-2\pi i n^2\frac{a}{q}}. \end{aligned}$$

Now

$$\begin{aligned} |S(a,q)|^2&= \sum _{n=0}^{q-1}\sum _{m=0}^{q-1}e^{2\pi i(m^2- n^2)\frac{a}{q}} = \sum _{n=0}^{q-1}\sum _{m=n}^{n+q-1}e^{2\pi i(m^2- n^2)\frac{a}{q}} \\&= \sum _{n=0}^{q-1}\sum _{k=0}^{q-1}e^{2\pi ik(k+2n)\frac{a}{q}}, \\ \end{aligned}$$

so

$$\begin{aligned} |S(a,q)|^2&\le \sum _{k=0}^{q-1}\left| \sum _{n=0}^{q-1}e^{4\pi ikn\frac{a}{q}}\right| . \end{aligned}$$

Since \((a,q)=1\), the inner sum is zero for all but at most two values of k. Thus

$$\begin{aligned} |S(a,q)|^2\le 2q. \end{aligned}$$

To show \(M_{d}(\lambda )\) is bounded below for \(d\ge 5\), we must consider

$$\begin{aligned} A_{\lambda }(q)=\sum _{{\mathop {\scriptscriptstyle {(a,q)=1}}\limits ^{a=1}}}^{q} e^{2\pi i\lambda ^{2}\frac{a}{q}}\left( \frac{S(a,q)}{q}\right) ^{d}. \end{aligned}$$

It turns out that if

$$\begin{aligned} (q_1, q_2)=1,\quad A_{\lambda }(q_1\,q_2)=A_{\lambda }(q_1)A_{\lambda }(q_2). \end{aligned}$$

This is done in [8, Chap. 12] and in the more general context of studying the number of representations of an integer as a sum of d  k–th powers in [18, 32]. Thus

$$\begin{aligned} M_{d}(\lambda )=C(d)\prod _{{\mathop {{\scriptstyle p}\, \mathrm{prime}}\limits ^{p}}}(1+A_{\lambda }(p)+\cdots +A_{\lambda }(p^m) +\cdots ). \end{aligned}$$

Next one can see that the proof of Lemma 2 shows \(|S(a,q)|\le \sqrt{2q}\) if \(q\equiv 0(4)\), \( S(a,q)=0\) if \(q\equiv 2(4)\) and \(|S(a,q)|\le \sqrt{q}\), if q is odd. Using these estimates and Eq. (3), it is straight forward to check that the infinite product is bounded below. I learned this argument from [25].

Another argument can be found in [8].

It is interesting to note that, for p prime, one may interpret \(1+A_{\lambda }(p)+\cdots +A_{\lambda }(p^m)\) in terms of solutions to the congruence

$$\begin{aligned} n_1^2+\cdots +n_d^2\equiv \lambda ^2(\mathrm{mod}\, p^m). \qquad \qquad \qquad (*) \end{aligned}$$

Note that the number of solutions of

$$\begin{aligned} n_1^2+\cdots +n_d^2\equiv \lambda ^2(\mathrm{mod}\, p^m) \end{aligned}$$

with \(1\le n_{j}\le p^m\) is

$$\begin{aligned}&\frac{1}{p^m}\sum _{n_1=1}^{p^m}\cdots \sum _{n_d=1}^{p^m}\sum _{a=1}^{p^m} e^{2\pi i(\lambda ^2-n_1^2-\cdots -n_d^2)\frac{a}{p^m}} \\&\quad = \frac{1}{p^m}\sum _{j=0}^{m}\sum _{n_1=1}^{p^m}\cdots \sum _{n_d=1}^{p^m}\sum _{{\mathop {\scriptscriptstyle {(a,p^{m-j})=1}}\limits ^{a=1}}}^{p^{m-j}} e^{2\pi i(\lambda ^2-n_1^2-\cdots -n_d^2)\frac{a}{p^{m-j}}} \\&\quad = \frac{1}{p^m}\sum _{j=0}^{m}p^{jd}\sum _{n_1=1}^{p^{m-j}}\cdots \sum _{n_d=1}^{p^{m-j}}\sum _{{\mathop {\scriptscriptstyle {(a,p^{m-j})=1}}\limits ^{a=1}}}^{p^{m-j}} e^{2\pi i(\lambda ^2-n_1^2-\cdots -n_d^2)\frac{a}{p^{m-j}}}\\&\quad = \frac{1}{p^m}\sum _{j=0}^{m}p^{jd} e^{2\pi i\lambda ^2\frac{a}{p^{m-j}}}[S(a,{p^{m-j}})]^d \\&\quad =p^{m(d-1)}\sum _{j=0}^{m}\frac{1}{p^{(m-j)d}}\sum _{{\mathop {\scriptscriptstyle {(a,p^{m-j})=1}}\limits ^{a=1}}}^{p^{m-j}} e^{2\pi i\lambda ^2\frac{a}{p^{m-j}}}[S(a,{p^{m-j}})]^d. \end{aligned}$$

So

$$\begin{aligned} \sum _{j=0}^{m}\sum _{{\mathop {\scriptscriptstyle {(a,p^{m-j})=1}}\limits ^{a=1}}}^{p^{j}}\left( \frac{S(a,p^{j})}{p^{j}}\right) ^{d} e^{2\pi i\lambda ^2\frac{a}{p^{m-j}}}=\frac{1}{p^{md-1}}\cdot ({\text{ number } \text{ of } \text{ solutions } \text{ of } }(*)). \end{aligned}$$

Thus

$$\begin{aligned} 1+A_{\lambda }(p)+\cdots +A_{\lambda }(p^m)=p^{m(1-d)}N(\lambda , p^{m})\qquad \qquad \qquad (**) \end{aligned}$$

where

$$\begin{aligned} N(\lambda , p^{m})=\text{ number } \text{ of } \text{ solutions } \text{ of } \text{ the } \text{ congruence }\, (*) \end{aligned}$$

A generalization of \((**)\) becomes important in studying the number of ways of representing an integer m as a sum of \(\ell \) kth powers. See [32, Chap. 2] or [18, Chap. 5].

3 The Number of Representations of an Integer as a Sum of Primes

In the discussion of \(r_{d}(\lambda )\), we were able to accurately describe the generating function, \(F(\epsilon +i\theta )\), for every \(\theta \). In many applications of the circle method, this is not possible, and the major difficulty arises in estimating the generating function on the set on which a really good approximation is unknown. A case in point is the problem of representing an integer N as a sum of two or three primes. We will give a short introduction to this topic. Details may be found in [7] or [22]. Thus, we let \(\rho _{2}(N)\) denote the number of representations of an even integer as a sum of two primes and \(\rho _{3}(N)\) the number of representations of an odd integer as a sum of three primes. We will first discuss what we might expect the size of \(\rho _{2}(N)\) and \(\rho _{3}(N)\) to be. Then we shall try to understand why one can successfully treat \(\rho _{3}(N)\) but not \(\rho _{2}(N)\). The substitute for \(F(\epsilon +i\theta )\) will be

$$\begin{aligned} S_{N}(\theta )=\sum _{p\le N} e^{2\pi i p\theta }. \end{aligned}$$

(In this section, p will always denote a prime.) We will indicate how \(S_{N}(\theta )\) is described well on a small set called the major arcs, and finally we shall try to give some hint as to how \(S_{N}(\theta )\) is estimated for \(\theta \) not in the major arcs.

Let us first make some guess as to the size of \(\rho _{2}(N)\) and \(\rho _{3}(N)\). Consider first \(\rho _{2}(N)\). The number of ways of writing

$$\begin{aligned} n=p_{1}+p_{2} \end{aligned}$$

with \(p_{1}\) and \(p_{2}\) is the number of primes in the sequence \(n-p_{1}\), with \(p_{1}\) prime. This latter sequence has about \(\dfrac{n}{\log n}\) terms, so if the primes were uniformly distributed in this sequence we would expect

$$\begin{aligned} p_{2}(n)\sim \dfrac{\frac{n}{\log n}}{\log \left( \frac{n}{\log n}\right) }\sim \frac{n}{\log ^{2} n}. \end{aligned}$$

We proceed to discuss \(\rho _{3}(N)\). Again the sequence \(n-p\), p prime has roughly \(\dfrac{n}{\log n}\) elements. If for most of these p, \(n-p=p_{2}+p_{3}\) in about \(\dfrac{n}{\log ^{2} n}\) ways, we would expect

$$\begin{aligned} \rho _{3}(N)\sim \dfrac{n^2}{\log ^{3} n}. \end{aligned}$$

And in fact Vinogradov proved

Theorem 3

There are positive constants \(N_{0}\), \(C_1\) and \(C_2\) such that for \(n\ge N_{0}\) and n odd,

$$\begin{aligned} C_1\,\dfrac{n^2}{\log ^{3} n}\le \rho _{3}(N)\le C_2\,\dfrac{n^2}{\log ^{3} n}. \end{aligned}$$

We give a brief introduction to the proof of Theorem 3 together with an explanation as to why the study of representing integers as sums of three primes is more tractable that handling the analogous problem for two primes. For more details, consult [22], which is the book I followed when I taught the material.

Let \(\displaystyle S_{N}(\theta )=\sum _{1\le p\le N} e^{2\pi i p\theta }\). At the present time, it is possible to find a good approximation to \(\displaystyle S_{N}\) for only small set of \(\theta \)’s (as N gets large). Call this set \(U_{N}\). Then write

$$\begin{aligned} \rho _{3}(N)=\int _{0}^{1}e^{2\pi i N\theta }[S_{N}(\theta )]^3\,\text {d}\theta =\int _{U_{N}}+\int _{C(U_{N})}. \end{aligned}$$

The integral over \(U_{N}\) will give the main contribution \(\sim \dfrac{N^2}{\log ^{3} N}\). Thus we have to prove \(\displaystyle \int _{C(U_{N})}\) is say \(\mathcal {O}\dfrac{N^2}{\log ^{4} N}\). We can estimate \(\displaystyle \int _{C(U_{N})}\) by

$$\begin{aligned} \sup _{\theta \in C(U_{N})}|S_{N}(\theta )|\int _{0}^{1}|S_{N}(\theta )|^2\,\text {d}\theta \le C \sup _{\theta \in C(U_{N})} |S_{N}(\theta )|\dfrac{N}{\log N} \end{aligned}$$

for some constant C by the Plancherel Theorem since the coefficients, \(a_n\), of \(S_{N}(\theta )\) are 1 if n is a prime and 0 otherwise, \(\displaystyle \int _{0}^{1}|S_{N}(\theta )|^2\,\text {d}\theta \) is just the number of primes \(\le N\).

On the other hand, one could not proceed this way in studying \(\rho _{2}(N)\), for if one took a power of \(S_{N}(\theta )\) out of the integral

$$\begin{aligned} \int _{0}^{1}[S_{N}(\theta )]^2\,e^{2\pi i N\theta }\,\text {d}\theta , \end{aligned}$$

one would no longer be in a position to use Plancherel’s Theorem.

It turns out that the main contribution comes from small intervals around \(\dfrac{a}{q}\) with \((a,q)=1\), and \(1\le q\le \log ^{4} N\) with N large.

Let us see how one finds an approximation for \(S_{N}(\theta )\). Note first that if \((a,q)=1\),

$$\begin{aligned} S_{N}\left( \dfrac{a}{q}\right) =\sum _{r=1}^{q}e^{2\pi i r\frac{a}{q}}\,\,\pi (N,r,q) \end{aligned}$$

where \(\pi (N,r,q)\) is the number of primes \(\le N\) which are congruent to \(r\!\mod q\). A theorem of Siegel asserts that if \((r,q)=1\)

$$\begin{aligned} \pi (N,r,q)=\frac{1}{\phi (q)}\, L(N)+{\mathcal {O}}_{A}\,N\,e^{-c\sqrt{\log N}} \end{aligned}$$

uniformly for \(q\le (\log N)^{A}\) for any positive A. Here \(\phi (q)=\) the number of integers q which are relatively prime to q and \(L(N)=\int _{2}^{N}\dfrac{\text {d}t}{\ln t}\). Thus for \(q\le (\log N)^{A}\)

$$\begin{aligned} S_{N}\left( \dfrac{a}{q}\right) =\frac{1}{\phi (q)}\, L(N)\sum _{{\mathop {\scriptscriptstyle {(r,q)=1}}\limits ^{r=1}}}^{q}e^{2\pi i r\frac{a}{q}}+\mathcal {O}(N\,e^{-c\sqrt{\log N}}). \end{aligned}$$

The r sum can be evaluated with the help of the Möbius inversion formula. The Möbius function \(\mu (d)\) is defined as follows:

$$\begin{aligned} \mu (d): \left\{ \begin{array}{ll} \mu (1)=1; &{} \hbox {} \\ \mu (d)=(-1)^{r}, &{} \hbox {if}\,\, d\,\, \hbox {is the product of}\,\, r \,\,\hbox {distinct primes;} \\ \mu (d)=0, &{} \hbox {if}\,\, d\,\,\hbox {is divisible by a square.} \end{array} \right. \end{aligned}$$

The Möbius inversion formula states that

$$\begin{aligned} \sum _{d|n}\,\mu (d)=\left\{ \begin{array}{ll} 1, &{} \hbox {if}\,\, n=1;\\ 0, &{} \hbox {otherwise.} \end{array} \right. \end{aligned}$$

The standard proof of the Möbius inversion formula given in elementary number theory always seemed mysterious to me. There is another proof using the Riemann zeta function that seems more natural to me.

For \(\mathrm{Re} s>1\),

$$\begin{aligned} \zeta (s)=\sum _{1}^{\infty }\frac{1}{n^s}=\prod _{{\mathop {\scriptscriptstyle {p\,\, \mathrm{prime}}}\limits ^{p}}}\frac{1}{\left( 1-\frac{1}{p^s}\right) }. \end{aligned}$$

So

$$\begin{aligned} \frac{1}{\zeta (s)}=\prod _{p}\left( 1-\frac{1}{p^s}\right) =\sum _{1}^{\infty }\frac{\mu (n)}{n^s}. \end{aligned}$$

Now if \(\displaystyle A(s)=\sum _{n=1}^{\infty }\frac{a(n)}{n^s}\) and \(\displaystyle B(s)=\sum _{n=1}^{\infty }\frac{b(n)}{n^s}\) are two Dirichlet series then

$$\begin{aligned}&A(s)B(s)=\sum _{n=1}^{\infty }\sum _{m=1}^{\infty }\frac{a(n)}{n^s}\frac{b(m)}{m^s} =\sum _{k=1}^{\infty }\frac{1}{k^s}\sum _{{\mathop {\scriptscriptstyle {k|n}}\limits ^{n=1}}}^{\infty }a(n)\,b\left( \frac{n}{k}\right) .\\&\zeta (s)\dfrac{1}{\zeta (s)}=1=\frac{1}{1^s}+\frac{0}{2^s}+\frac{0}{3^s}+\cdots \,. \end{aligned}$$

Take \(a(n)=1\) and \(b(m)=\mu (m)\) and get

$$\begin{aligned} \sum _{k|n}\mu \left( \frac{n}{k}\right)&= 0, \,\, {\text{ if }}\,\, k\ne 1,\\ \sum _{d|n}\mu (d)&=0, \,\, {\text{ if }}\,\, n\ne 1,\\ \mu (1)&=\sum _{d|1} \mu (d)=1. \end{aligned}$$

The Möbius inversion formula is often used in summing over values of r where r is restricted to be relatively prime to another integer q. Thus

$$\begin{aligned} \sum _{{\mathop {\scriptscriptstyle {(r,q)=1}}\limits ^{r=1}}^{q}}e^{2\pi i r\frac{a}{q}}= & {} \sum _{r=1}^{q}e^{2\pi i r\frac{a}{q}}\sum _{d|(r,q)} \mu (d)=\sum _{d|q} \mu (d)\sum _{{\mathop {\scriptscriptstyle {1\le r\le q}}\limits ^{d|r}}}e^{2\pi i r\frac{a}{q}}\\= & {} \sum _{d|q} \mu (d) \sum _{m=1}^{q/d}e^{2\pi i md\frac{a}{q}}=\mu (q) \end{aligned}$$

Since the inner sum is zero for \(d\ne q\). Thus one finds

$$\begin{aligned} S_{N}\left( \dfrac{a}{q}\right) =\frac{\mu (q)}{\phi (q)}L(N)+\mathcal {O}(N\,e^{-c\sqrt{\log N}}). \end{aligned}$$

Note that the main term does not depend on a as opposed to S(aq) arising in the study of \(r_{d}(\lambda )\). This is a big advantage in some problems. Also \(\phi (q)>c\,\dfrac{q}{\ln \ln q}\). Thus the factor \(\dfrac{\mu (q)}{\phi (q)}\) is better than the corresponding factor \(\dfrac{S(a,q)}{q}\) which arose before. Nest we note that we can find a good approximation to \(S_{N}\left( \dfrac{a}{q}+\beta \right) \) if \(|\beta |\le \dfrac{|\log N|^{A}}{N}, q\le \log ^{A}N\). To see this, we write

$$\begin{aligned} S_{N}\left( \dfrac{a}{q}+\beta \right) =\sum _{p\le N}e^{2\pi i p\frac{a}{q}}e^{2\pi i p\beta }. \end{aligned}$$

Put \(\displaystyle \Lambda (x)=\sum _{p\le x}e^{2\pi i p\frac{a}{q}}\). Then

$$\begin{aligned} S_{N}\left( \dfrac{a}{q}+\beta \right)&=\int _{3/2}^{N}e^{2\pi i t\beta }d\Lambda (t)\,\text {d}t\\&=\Lambda (N)e^{2\pi i N\beta } -2\pi i \beta \int _{3/2}^{N}\Lambda (t)e^{2\pi i t\beta }\,\text {d}t\\&=\Lambda (N)e^{2\pi i N\beta } -2\pi i \beta \int _{3/2}^{N}\frac{\mu (q)}{\phi (q)}L(t)e^{2\pi i t\beta }\,\text {d}t+\mathcal {O}(N\,e^{-\sqrt{\log N}}) \end{aligned}$$

if \(|\beta |\le C\dfrac{(\log N)^{U}}{N}\).

Thus another integration by parts shows

$$\begin{aligned} S_{N}\left( \dfrac{a}{q}+\beta \right) =\frac{\mu (q)}{\phi (q)}\int _{3/2}^{N}\frac{1}{\ln t}\,e^{2\pi i t\beta }\,\text {d}t +\mathcal {O}(N\,e^{-c\sqrt{\log N}}) \end{aligned}$$

for \(q\le (\log N)^{A}\) and \(|\beta |\le \dfrac{(\log N)^{A}}{N}\). This is the set \(U_{N}\). Note that \(|U_{N}|\le \dfrac{(\log N)^{3A}}{N}\).

To estimate \(S_{N}(\theta )\) in the complement of \(U_{N}\), Vinogradov used Schwartz’s inequality in a very clear way. Suppose

$$\begin{aligned} T=\sum _{n=1}^{N}\sum _{m=1}^{N}d_{n}b_{m}e^{2\pi i nm\theta }. \end{aligned}$$

Then even if the \(d_{n}\) and \(b_{m}\) are very rough, there can be cancelation in the double sum for T. To fix matters consider

$$\begin{aligned} T=\sum _{n=1}^{q}\sum _{m=1}^{q}d_{n}b_{m}e^{2\pi i mn\frac{a}{q}} \end{aligned}$$

with \((a,q)=1\). Let \(\displaystyle D=\left( \sum _{n=1}^{q}d_{n}^2\right) ^{1/2}\) and \(\displaystyle B=\left( \sum _{m=1}^{q}b_{m}^2\right) ^{1/2}\). Then the trivial estimate would be

$$\begin{aligned} T\le qDB. \end{aligned}$$

In fact one has the estimate

$$\begin{aligned} T\le \sqrt{q}DB.\qquad \qquad \qquad (*) \end{aligned}$$

If \(q\ge (\log N)^{A}\), since we are talking about beating a trivial estimate by a small power of \(\log N\), this makes a tremendous saving.

To see \((*)\) apply Schwartz’s inequality to the outer sum to see

$$\begin{aligned} |T|=D\left\{ \sum _{n=1}^{q}\left| \sum _{m=1}^{q}b_{m}e^{2\pi i mn\frac{a}{q}}\right| ^2\right\} ^{1/2} =D\left\{ \sum _{m_1=1}^{q}\sum _{m_2=1}^{q}\sum _{n=1}^{q}b_{m_1}\overline{b_{m_2}}e^{2\pi i (m_1-m_2)n\frac{a}{q}}\right\} ^{1/2}. \end{aligned}$$

Now the sum on n is zero unless \(m_1=m_2\) in which case it is q. Thus \(|T|\le \sqrt{q}DB\).

To see how double sums arise in studying \(S_{N}(\theta )\) note that

$$\begin{aligned} S_{N}(\theta )=\sum _{{\mathop {\scriptscriptstyle {(n,\underline{P})=1}}\limits ^{\sqrt{N}\le n\le N}}}e^{2\pi i n\theta } +\mathcal {O}(\sqrt{N}) \end{aligned}$$

where

$$\begin{aligned} \underline{P}=\prod _{p\le N} p. \end{aligned}$$

Now we have to apply the Möbius inversion formula.

$$\begin{aligned} \sum _{d|n}\,\mu (d)=\left\{ \begin{array}{ll} 1, &{} \quad \hbox {if}\,\, n=1; \\ 0, &{} \quad \hbox {otherwise.} \end{array} \right. \end{aligned}$$

So

$$\begin{aligned} S_{N}(\theta )=\sum _{\sqrt{N}\le n\le N}\sum _{d|(n,\underline{P})}\mu (d)e^{2\pi i n\theta }+\mathcal {O}(\sqrt{N}) =\sum _{d|\underline{P}}\mu (d)\sum _{{\mathop {\sqrt{N}\le n\le N}\limits ^{{\mathop {\scriptstyle d|n}\limits ^{\scriptstyle n}}}}} e^{2\pi i n\theta } \end{aligned}$$

The d’s we have to be careful about are those of the size roughly of N. We write \(n=md\) and get a sum of the form

$$\begin{aligned} \sum _{d}\mu (d)\sum _{m} e^{2\pi i m\theta }. \end{aligned}$$

If d is large, m must be small. Also all but a negligible number of large d have a large prime factor p, say \(p>e^{\sqrt{\log N}}\). Now the idea roughly to write \(d=pd_1\) where \(p>e^{\sqrt{\log N}}\). Then the range of summation on \(d_1\) is \(d_1\le \dfrac{N}{e^{\sqrt{\log N}}}\). Now the sum is something like

$$\begin{aligned} \sum _{d_1\le \frac{N}{e^{\sqrt{\log N}}}}\mu (d_1)\sum _{m} e^{2\pi i d_1 m p\theta } \end{aligned}$$

and one can control the size of mp.

So roughly the sum becomes

$$\begin{aligned} \sum _{d_1\le \frac{N}{e^{\sqrt{\log N}}}}\mu (d_1)\sum _{{\mathop {\scriptscriptstyle {\ell \, \mathrm{not\,\, too\,\, large}}}\limits ^{\ell }}} d(\ell ) e^{2\pi i d_1 \ell \theta }, \end{aligned}$$

where \(d(\ell )\) is dominated by the number of divisors of \(\ell \). This is now the type of double sum that can be controlled by an application of Schwartz’s inequality as described above. For more details see Pracher [22].

4 Further Reading

I taught a one semester course in the Fall semester 2002 on the circle method. I covered two topics: (1) The number of solutions in integers of \(m = n^{k}_{1}+ \cdots + n^{k}_{l}\) and (2) Vinogradov’s Theorem on the representation of an integer as a sum of three primes. The study of \(r_d(\lambda )\) was a simplified version of [15] together with arguments for the singular series for the general problem of the number of solution of \(m = n^{k}_{1}+ \cdots + n^{k}_{l}\), I followed [15]. See also [31]. For Vinogradov’s Theorem, I followed the treatment in Pracher [22], an algebraic approach to the study of \(r_d(\lambda )\) can be found in [24].