1 Introduction

In this brief note, we survey a sample of the deep and influential contributions of Jean Bourgain to the field of nonlinear dispersive equations. Bourgain also made many fundamental contributions to other areas of partial differential equations and mathematical physics (as well as to a myriad other areas in analysis, number theory, combinatorics, theoretical computer science, and more). Quoting the citation of the AMS L. P. Steele Prize for Lifetime Achievement, awarded to Bourgain in 2018, “Jean Bourgain is a giant in the field of mathematical analysis, which he has applied broadly and to great effect.”

Jean Bourgain’s contributions to mathematics will be remembered forever. Those of us who knew him will also remember his warmth, generosity, and graciousness.

2 Nonlinear Dispersive Equations: The Well-Posedness Theory Before Bourgain

The theory of nonlinear dispersive equation goes back to the nineteenth century, in connection with water waves in shallow water. The Korteweg-de Vries equation, which governs this phenomenon, was proposed by Boussinesq and by Korteweg-de Vries, in the late nineteenth century, as a way of explaining the discovery by Scott Russell (1835) of traveling waves. The generalized KdV equations (gKdV)k (k = 1 being the Korteweg-de Vries equation) are

$$\displaystyle \begin{aligned} (gKdV)_k\begin{cases} \partial_t u+\partial^3_x u+u^k\partial_x u=0, x \in \mathbb{R}, \mbox{ or } x \in \mathbb{T}, t \in \mathbb{R}\\ u|{}_{t=0}=u_0(x) \end{cases} \end{aligned}$$

(here, \(\mathbb {T}\) and \(\mathbb {T}^d\) are the 1-dimensional ( d-dimensional) torus). Another example of nonlinear dispersive equations is the nonlinear Schrödinger equations (NLS),

$$\displaystyle \begin{aligned} (NLS)\begin{cases} i\partial_t u+\Delta u\pm |u|{}^{p-1}u=0, x \in \mathbb{R}^d, \mbox{ or } x \in \mathbb{T}^d\\ u|{}_{t=0}=u_0(x) \end{cases} \end{aligned}$$

When d = 1, p = 3, these equations model the propagation of wave packets in the theory of water waves. The equations also appear in non-linear optics and in quantum field theory. These equations have a Hamiltonian structure and preserve mass and energy (although the energy maybe negative). For both equations, the conserved mass is \(\int |u_0|{ }^2\), where the integral is over \(\mathbb {R}^d\) or \(\mathbb {T}^d\). For (gKdV)k, the conserved energy is \(E(u_0)=\int [(\partial _x u_0)^2-c_k u_0^{k+1}]dx,\) and for (NLS), it is \(E(u_0)=\int [(\nabla _x u_0)^2\mp c_p|u_0|{ }^{p+1}]dx,\) where the integrals are over \(\mathbb {R}^d\) or \(\mathbb {T}^d.\)

These equations are called dispersive because their linear parts are dispersive. Heuristically, the linear equations, when defined for \(x\in \mathbb {R}^d\), are called dispersive, because the initial data gets “spread out” or “dispersed” by the evolution. (The linear equations can be solved by using Fourier’s method). Since the mass of the solution is constant (the L2 norm is conserved), this requires the size of the linear solution to become small for large t, the so called “dispersive effect.” Note that this is a feature of linear dispersive equations, the traveling wave solutions discovered by Russell do not have this property, and they are purely nonlinear objects. Moreover, when \(x \in \mathbb {T}^d\), there is no room for the solution to “spread out,” and the “dispersive effect” disappears.

Even though these equations were introduced in the nineteenth century/early twentieth century, their systematic study started much later. One of the first things to understand for such equations is the “well-posedness.” An equation like (gKdV)k or (NLS) is said to be locally well-posed (LWP) in a space B (with u 0 ∈ B), if the equation has a unique solution u (in a suitable sense) for u 0 ∈ B, for some T = T(u 0), 0 ≤ t ≤ T, u ∈ C([0, T];B), and the mapping u 0 ∈ B → u ∈ C([0, T];B) is continuous. (That is to say, in analogy with ODE, we have existence, uniqueness, and continuous dependence on the initial data). If we can take T = +, we say that the problem is globally well-posed (GWP). Since dispersive equations are (essentially) time reversible, we can replace [0, T] by [−T, T]. Usually in this subject, the space B is taken to be an L2 −based Sobolev space, (or sometimes a weighted L2 −based Sobolev space, with power weights, in case we are working in \(\mathbb {R}^d\)). The reason for using L2-based spaces as opposed to Lp-based spaces is the failure of estimates for u 0 ∈ Lp, p ≠ 2, in the associated linear problems. The first (LWP) results used the analogy of these problems to classical hyperbolic ones, which led (by the classical energy method and its refinements and compactness arguments [5, 6])to the (LWP) of (gKdV)k in \(H^s(\mathbb {R}),\) for \(s>\frac {3}{2},\) for k = 1, 2, …, with the same result holding in \(H^s(\mathbb {T}),\) and to the (LWP) of (NLS) in \(H^s(\mathbb {R}^d)\), for \(s>\frac {d}{2}\), with the same result holding in \(H^s(\mathbb {T}^d).\) (In the case of (NLS), some restrictions on p arise also, coming from the possible lack of “smoothness” of α →|α|p−1 α). Here, for f defined on \(\mathbb {R}^d,\) we set \(\widehat {f}(\xi )=\int _{\mathbb {R}^d}e^{2\pi i x \cdot \xi }f(x)dx,\) \(H^s(\mathbb {R}^d)=\{f : \int (1+|\xi |{ }^2)^s|\widehat {f}(\xi )|{ }^2d\xi <\infty \}\) and for f defined on \(\mathbb {T}^d,\) we set \(\widehat {f}(n)=\int _{\mathbb {T}^d}e^{2\pi i x \cdot n}f(x)dx,\) \(n \in \mathbb {Z}^d,\) and \(H^s(\mathbb {T}^d)=\{f : \sum _{n \in \mathbb {Z}^d}|\widehat {f}(n)|{ }^2(1+|n|{ }^2)^s<\infty \}.\) An inspection of these proofs shows that “dispersive properties” of \((\partial _t+\partial _x^3)\) or of (i∂ t +  Δ) are not used at all in the case of \(\mathbb {R}^d\), and hence they remain valid for the case of \(\mathbb {T}^d.\) Particular cases of (gKdV)k and (NLS) are closely connected to complete integrability, a theory which was first developed largely in this regard [1]. These are the cases k = 1, 2 in (gKdV)k and p = 3, d = 1 in (NLS). The applicability of this method initially required high order of differentiability of the data u 0, and, in the case \(x \in \mathbb {R},\) fast decay of u 0. More recently, this has been greatly improved (see [41, 42, 53]) but still only applies to a few specific cases.

In the late 1970s and early 1980s, the pioneering works of Ginibre-Velo [32,33,34], and Kato [45], through the use of important new advances in harmonic analysis [83, 86], led to “low regularity” (LWP) and (GWP) results for (NLS) in \(\mathbb {R}^d\), culminating with the definitive results of Tsutsumi [85]and Cazenave-Weissler [22]. This approach exploited the “dispersive properties” of (i∂ t +  Δ) and the connection with the “restriction problem” for the Fourier transform (discovered and formulated in the visionary work of E.M. Stein (see [81]) uncovered by Segal [78] and Strichartz [83]).

More precisely, the solution of the initial value problem

$$\displaystyle \begin{aligned} (LS)\begin{cases} i\partial_t u+\Delta u=0, x \in \mathbb{R}^d, t \in \mathbb{R}\\ u|{}_{t=0}=u_0(x) \end{cases} \end{aligned}$$

is given by \(\widehat {u}(\xi ,t)=e^{i t |\xi |{ }^2}\widehat {u}_0(\xi )=(e^{it\Delta }u_0)\hat (\xi )\) or, \(u(x,t)=\frac {c_d}{|t|{ }^{\frac {d}{2}}}\int _{\mathbb {R}^n}e^{i|x-y|{ }^2/4t} u_0(y)dy.\)

The second formula gives that, for u solving (LS),

$$\displaystyle \begin{aligned} |u(x,t)|\leq \frac{c_d}{|t|{}^{\frac{d}{2}}}\|u_0\|{}_{L^1}, \end{aligned} $$
(1)

which clearly shows the “dispersive effect” mentioned earlier. The relevant “restriction problem” here is the one to the paraboloid \(=\{(\xi ,|\xi |{ }^2): \xi \in \mathbb {R}^d\} \subset \mathbb {R}^{d+1}\). In this case, we have the “restriction” inequality (for \(f\in \mathscr {S}(\mathbb {R}^{d+1}))\)

$$\displaystyle \begin{aligned} \Big(\int |\widehat{f}(\xi,|\xi|{}^2)|{}^2d\xi\Big)^{\frac{1}{2}}\lesssim \|f \|{}_{L^{\frac{2(d+2)}{d+4}}(\mathbb{R}^{d+1})} \end{aligned} $$
(2)

(see [83, 86]). The connection with (LS) is that the dual inequality to (2) is the “extension inequality,” which gives, from the first formula for the solution u of (LS), the estimate

$$\displaystyle \begin{aligned} \|u\|{}_{L^{\frac{2(d+2)}{d}}(\mathbb{R}^{d+1})}\lesssim \|u_0\|{}_{L^2(\mathbb{R}^d)}. \end{aligned} $$
(3)

Now, to solve (NLS), one needs to solve (by Duhamel’s principle) the equation (with the notation eit Δ u 0 = S(t)u 0))

$$\displaystyle \begin{aligned} u(t)=S(t)u_0\pm \int_0^tS(t-t')|u|{}^{p-1}u(t')dt'. \end{aligned} $$
(4)

This is solved by using the contraction mapping principle on spaces constructed exploiting the estimate (2) and related ones [32,33,34, 45].

The result of Cazenave-Weissler [22] is

Theorem 2.1

Assume that \(u_0 \in H^s(\mathbb {R}^d),\) s ≥ 0, s  s 0, where \(p-1=\frac {4}{d-2s_0}.\) Assume also that p − 1 > [s] + 1 if \(p-1\notin 2\mathbb {Z}^{\star },\) where [s] is the greatest integer smaller than s. Then (NLS) is locally well-posed for t ∈ [−T, T]. In the subcritical case s > s 0, we can take \(T=T(\|u_0\|{ }_{H^s}),\) in the critical case s = s 0, T = T(u 0).

This approach, relying on the estimates (1) and (3), uses crucially the “dispersive properties”’ of (i∂ t +  Δ) in \(\mathbb {R}^d,\) and hence it does not apply to \(\mathbb {T}^d.\) On the other hand, on \(\mathbb {R}^d\) it yields essentially optimal results in terms of the values of s when \(B=H^s(\mathbb {R}^d),\) which greatly improve the results obtained by the energy method described earlier.

There are several motivations for hoping to have “low regularity” well-posedness results for (gKdV)k and (NLS). The first one is that, if one can obtain (LWP) at the regularity level given by the conserved mass, or the conserved energy, with time of existence \(T=T(\|u_0\|)_{L^2},\) or \(T=T(\|u_0\|{ }_{H^1})\), one can use the a priori control given by the conserved quantity, to obtain global well-posedness, simply iterating the local result. Another one is the belief that, since for the associated linear problem we have well-posedness in Hs, for any s, the threshold \(\overline {s}\) for the non-linear problem gives information on the nonlinear effects present in the problem. We will see later another motivation, at very low regularity levels, stemming from the connection with quantum field theory and giving global well-posedness for “generic” data. Turning to the “low regularity” local well-posedness theory for (gKdV)k, the new difficulty is the fact that the nonlinear term contains a derivative, which needs to be “recovered.” One might think that the fact that \((\partial _t+\partial _x^3)\) has a “stronger dispersive effect” (we have for instance the bound \(|u(x,t)|\lesssim \frac {1}{t^{1/3}}\|u_0\|{ }_{L^1}\) for the linear solution, which is stronger for small t than the \(\frac {1}{t^{1/2}}\) we get for (LS), d = 1) would compensate for the derivative in the nonlinearity, but this is not obviously the case. Kato [43, 44] found a “local smoothing” effect for solutions of (gKdV)k which allowed, when \(x \in \mathbb {R},\) to control “a priori,” with \(u_0 \in L^2(\mathbb {R})\) quantities like \(\int _{j}^{j+1}\int _0^{1}\Big (\partial _x u(x,t) \Big )^2 dx dt,\) \(j \in \mathbb {Z},\) uniformly in j, but this only gave rise to “weak solutions” with L2 data, but did not give uniqueness or continuous dependence on the data. This was also restricted to \(x \in \mathbb {R},\) since such an estimate in \(\mathbb {T}\) would contradict time reversibility and conservation of mass. In the 1980s and early 1990s, in a joint project with G. Ponce and L. Vega, we developed a new approach to the “low regularity” local and global well-posedness theory (for \(x \in \mathbb {R}\)) for (gKdV)k, which in the case k ≥ 4 gave essentially optimal (in some sense) results [4, 51]. This was also based on the contraction mapping theorem and used tools from harmonic analysis. In addition to the analogs of the “extension inequality” (3), (with (ξ, |ξ|2) being replaced by (ξ, ξ3)), we used a sharp form (for linear equations) of the Kato“local smoothing” estimate, introduced in [30, 82, 87], as well as an analog of the “maximal function” estimate introduced in [21] and motivated by statistical mechanics (see also [31, 87]). The combination of these two estimates allowed us to control well the nonlinear term uk x u. In addition, we also applied the multilinear harmonic analysis tools developed by Coifman-Meyer [23, 24]. This was all completely tied to dispersion and was totally dependent or the fact that \(x \in \mathbb {R}\). A sample result obtained, for KdV (k = 1), was

Theorem 2.2 ([49])

Let \(s>\frac {3}{4},\) \(u_0 \in H^s(\mathbb {R}).\) Then, \(\exists T=T(\|u_0\|{ }_{H^s}),\) and a space \(X^s_T \subset C([-T,T];H^s),\) such that KdV has a unique solution \(u \in X^s_T,\) which depends continuously on u 0.

The space \(X^s_T\) is constructed by using the estimates mentioned earlier, namely the sharp “local smoothing” estimate, the “maximal function” estimate, and the variants of the “extension estimate.” One then proves the result by the contraction mapping principle in the space \(X^s_T\), \(T=T(\|u_0\|{ }_{H^s}),\) showing that the mapping \(\Phi _{u_0}(u)=W(t)u_0+\int _0^tW(t-t')(u\partial _x u)(t')dt'\) has a fixed point in \(X^s_T\), where \(\widehat {W(t)f}(\xi )=e^{i t \xi ^3}\widehat {f}(\xi ).\)

Remark 1

The approach was, in a certain sense, sharp: if we have a space \(X^s_T\) such that \(\forall u_0 \in H^s(\mathbb {R}),\) the linear solution W(t)u 0 belongs to \(X^s_T\) and such that, for all \(v, w \in X^s_T\), we have \(v\partial _x w \in L^1_{loc}(\mathbb {R})\) and then \(s\geq \frac {3}{4}.\)

At this point, we had no idea on how to improve the results for k = 1, 3 (the k = 2 result in [49] was also “optimal,” as was shown in [51]), or how to do anything other than the \(s>\frac {3}{2}\) result given by the energy method in the case \(x \in \mathbb {T}.\)

3 Bourgain’s Transformative Work on the Well-Posedness Theory of Dispersive Equations

In the spring of 1990, I gave a lecture on the work (then in progress) in [49], and E. Speer was in the audience. He asked me the following question: consider the quintic (NLS) on \(\mathbb {T}\):

$$\displaystyle \begin{aligned} \begin{cases} i\partial_t u+\Delta u\pm |u|{}^4u=0, x \in \mathbb{T}, t \in \mathbb{R}\\ u|{}_{t=0}=u_0(x) \in H^s(\mathbb{T}). \end{cases} \end{aligned} $$
(5)

Is this problem well-posed for \(s<\frac {1}{2}\)?

I knew that the energy method gave \(s>\frac {1}{2},\) that complete integrability did not apply, and that the methods we developed with Ponce and Vega, which relied on dispersion, did not apply. Speer explained the reason for the question, which was in connection with the work [56] of Lebowitz, Rose, and Speer, in which they had constructed a Gibbs measure associated to the problem (5). The points that the authors of [56] were concerned with were that the measure they constructed used the periodic setting crucially and that the support of the measure was contained in very low regularity spaces. So, they wanted to have a flow for (5), in the support of the Gibbs measure, which kept the Gibbs measure invariant. If so, a by-product of all this would be that, for data in the support of the measure, local in time existence could be globalized in time, similarly to the arguments in the presence of conserved quantities that we saw before. I told Speer that I felt that the question was very hard and that I thought that the person who could make progress in it, and would probably be interested in the problem, was Jean Bourgain! Bourgain did get interested and resolved completely the Lebowitz-Rose-Speer questions [7, 8, 10]. In doing so, he transformed the theory of nonlinear dispersive equations, starting with his papers [7,8,9]. Moreover, he continued making fundamental contributions to all aspects of this theory and transformed not only the well-posedness theory and created the probabilistic theory suggested by [10, 11], and [56] but also many other central areas in the field. Let me now turn to Bourgain’s papers [7, 8], in which he made his first groundbreaking contributions to the well-posedness theory. These works address the following two fundamental questions:

  1. 1.

    How to prove low regularity well-posedness results for (NLS) and (gKdV)k, for \(x \in \mathbb {T}^d\)?

  2. 2.

    How to improve the well-posedness results on (KdV) on \(\mathbb {R}\)?

It turns out that, in solving the first question, Bourgain also found the path to solving the second one. Also, once the first question was solved, Bourgain turned to the Gibbs measure questions from [56], in [10, 11], settling them and extending their scope, as we shall see below. We thus turn to (NLS) on \(\mathbb {T}^d\), and we will concentrate on Bourgain’s results for d = 1, 2, which are the most relevant to our exposition.

Theorem 3.1 ([7])

  1. (i)

    (NLS) is locally well-posed in \(H^s(\mathbb {T}),\) for s ≥ 0, \(p-1<\frac {4}{1-2s}.\) Thus, for p − 1 = 4, (NLS) is (LWP) in \(H^s(\mathbb {T})\) for all s > 0.

  2. (ii)

    (NLS) is locally well-posed in \(H^s(\mathbb {T}^2)\) , for p − 1 = 2, s > 0.

Compared with corresponding results in \(\mathbb {R}, \mathbb {R}^2\), that we discussed earlier, one key difficulty is the lack of a “dispersive effect.” Another difficulty is that, in the periodic case, the Fourier transform, in the solution of the associated linear problem, is replaced by Fourier series, leading to “exponential sums” that are much more difficult to estimate than integrals. For instance, the operator eit Δ u 0 = S(t)u 0, now takes the form

$$\displaystyle \begin{aligned}S(t)u_0(x)=\sum_{n \in \mathbb{Z}^d} e^{i(xn+t|n|{}^2)}\widehat{u}_0(n).\end{aligned}$$

The proof the Theorem 3.1 proceeds by using the contraction mapping principle. The first step is to find estimates that replace the inequality (3), crucial in the case of \(\mathbb {R}^d,\) which is proved using oscillatory integral estimates. Bourgain achieved this by using analytic number theory, and the results that he obtained in doing this have independent interest in analytic number theory. As a sample, let me mention two such estimates:

  1. (a)
    $$\displaystyle \begin{aligned}\lVert \sum_{n \in \mathbb{Z}, |n| \leq N} a_n e^{i(nx+n^2t)}\rVert_{L^6(\mathbb{T}^2)}\lesssim N^{\varepsilon}\Big(\sum |a_n|{}^2)^{\frac{1}{2}}, \forall \varepsilon >0,\end{aligned}$$

    which is used in Theorem 3.1(i).

    and

  2. (b)
    $$\displaystyle \begin{aligned}\lVert \sum_{n \in \mathbb{Z}^2, |n_1|\leq N, |n_2| \leq N}a_n e^{i(nx+|n|{}^2t)}\rVert_{L^4(\mathbb{T}^3)} \lesssim N^{\varepsilon}\Big(\sum_{n \in \mathbb{Z}^2} |a_n|{}^2\Big)^{\frac{1}{2}}, \forall \epsilon>0,\end{aligned}$$

    which is used in Theorem 3.1(ii).

Their proof uses the argument of Tomas [86] in the proof of the “restriction inequality,” combined with the “major arc” description of exponential sums (due to Vinogradov) and number theoretic arguments inspired by Weyl type lemmas [88]. The second main contribution of Bourgain here is the introduction of new function spaces in which to apply the contraction mapping principle.

For K, N positive integers, consider \(\Lambda _{K,N}=\{ \zeta = (\xi , \lambda ) \in \mathbb {Z}^d\times \mathbb {R}: N \leq |\xi |\leq 2N \mbox{ and }K \leq |\lambda -|\xi |{ }^2|\leq 2K\}.\) For a function u in \(L^2(\mathbb {T}^d\times \mathbb {R}),\) let

$$\displaystyle \begin{aligned}u(x,t)=\sum_{\xi \in \mathbb{Z}^d}\int \widehat{u}(\zeta)e^{2\pi i(\xi x+t \lambda)}d\lambda,\end{aligned}$$

and define \({\left \vert \kern -0.25ex\left \vert \kern -0.25ex\left \vert u \right \vert \kern -0.25ex\right \vert \kern -0.25ex\right \vert }_s=\mbox{sup}_{K,N} (K+1)^{\frac {1}{2}}(N+1)^s\Big (\int _{\Lambda _{K,N}}|\widehat {u}(\zeta )|{ }^2d\zeta \Big )^{\frac {1}{2}}\).

Fixing an interval of t in [−δ, δ], one considers the restriction norm

$$\displaystyle \begin{aligned} {\left\vert\kern-0.25ex\left\vert\kern-0.25ex\left\vert u \right\vert\kern-0.25ex\right\vert\kern-0.25ex\right\vert}_{X^s}=\mbox{inf}{\left\vert\kern-0.25ex\left\vert\kern-0.25ex\left\vert \tilde{u} \right\vert\kern-0.25ex\right\vert\kern-0.25ex\right\vert}_s, \end{aligned} $$
(6)

where the infimum is taken over all \(\tilde {u}\) coinciding with u in [−δ, δ] and shows that the integral equation has a solution in Xs, for small δ, by (4), now on \(\mathbb {T}^d,\) using the contraction mapping theorem. This applies to (i) and (ii) and uses crucially the bounds (a) and (b).

It is difficult to overestimate the impact of this work in the well-posedness theory. It was simply a complete game changer. While versions of the spaces just described were in the literature before, in earlier works of Rauch and Reed [76] and M. Beals [3] dealing with propagation of singularities for solutions of semilinear wave equations, and also implicit in the contemporary work of Klainerman-Machedon [55] on the local well-posedness of semilinear wave equations, the flexibility and universality of Bourgain’s formulation of these spaces contributed decisively to their wide applicability in solving a large number of previously intractable problems, in the work of many researchers.

We now turn to the work in [8], on (gKdV )k, on \(\mathbb {T}\). We will restrict ourselves to commenting on the results for k = 1.

Theorem 3.2 ([8])

(KdV) is locally well-posed on \(L^2(\mathbb {T})\) , with time of existence depending on \(\|u_0\|{ }_{L^2}\) , and hence by conservation of the L2 norm, it is globally well-posed in \(L^2(\mathbb {T})\).

The proof also proceeds by a contraction mapping argument, in spaces related to the ones given by (6) but adapted to the linear operator \(\partial _t+\partial _x^3\). A first reduction is to the case of data of integral 0, that is, whose zero Fourier coefficient vanishes. The space X s now has norm

$$\displaystyle \begin{aligned}{\left\vert\kern-0.25ex\left\vert\kern-0.25ex\left\vert u \right\vert\kern-0.25ex\right\vert\kern-0.25ex\right\vert}_s = \left\{\sum_{n\in\mathbb{Z}, n\neq 0} |n|{}^{2s} \int_{-\infty}^{+\infty} (1+|\lambda-n^3|)|\widehat{u} (n,\lambda)|{}^2\,d\lambda\right\}^{1/2}\end{aligned}$$

for u defined for \((x,t)\in \mathbb {T}^2\), with mean in x equal to 0. The relevant version of (a), when s = 0, is now

  1. (a’)
    $$\displaystyle \begin{aligned} \|f\|{}_{L^4(\mathbb{T}^2)}\lesssim \left(\sum_{m,n\in\mathbb{Z}} (1+|n-m^3|)^{2/3}|\hat f(m,n)|{}^2\right)^{1/2}.\end{aligned}$$

A very important difference with (NLS) is the fact that there is a derivative in the non-linearity, and no linear local smoothing effect, as we mentioned earlier. Bourgain’s crucial insight here was that there is a nonlinear smoothing effect, best captured by the function spaces introduced above. This is given in the following estimates: let w(x, t) =  x(u2)(x, t), where we assume that \(\int _{\mathbb {T}} u(x,t)\,dx = 0\). Then, for s ≥ 0,

$$\displaystyle \begin{aligned}\left(\sum_{n\neq 0} |n|{}^{2s}\int\frac{|\widehat{w}(n,\lambda)|{}^2}{(1+|\lambda-n^3|)}\,d\lambda\right)^{1/2} \lesssim {\left\vert\kern-0.25ex\left\vert\kern-0.25ex\left\vert u \right\vert\kern-0.25ex\right\vert\kern-0.25ex\right\vert}_{X_s},\end{aligned}$$
$$\displaystyle \begin{aligned}\left(\sum_{n\neq 0} |n|{}^{2s}\left(\int\frac{|\widehat{w}(n,\lambda)|}{(1+|\lambda-n^3|)}\,d\lambda\right)^2\right)^{1/2} \lesssim {\left\vert\kern-0.25ex\left\vert\kern-0.25ex\left\vert u \right\vert\kern-0.25ex\right\vert\kern-0.25ex\right\vert}_{X_s},\end{aligned}$$

It is through these estimates, controlling x(u2) by u, that we see this nonlinear smoothing effect, which is a consequence of the “curvature” of (n, n3).

Finally, also in [8], Bourgain observed that this nonlinear smoothing effect also carries over to the case \(x\in \mathbb {R}\), using the function spaces

$$\displaystyle \begin{aligned}X_b^s = \left\{u(x, t):\iint(1+|\lambda-\xi^3|)^{2b}\cdot|1+|\xi||{}^{2s}|\widehat{u}(\xi,\lambda)|{}^2\,d\xi\,d\lambda <\infty,\text{ where }(\xi,\lambda)\in\mathbb R^2\right\}.\end{aligned}$$

He proved:

Theorem 3.3 ([8])

(KdV) is globally well-posed in \(L^2(\mathbb R)\).

Remark 2

By using a nonlinear smoothing effect, and thus replacing v∂ x w in Remark 1 by x(u2), Bourgain bypassed the objection for improving \(s>\frac {3}{4}\), given in Remark 1. To Ponce, Vega, and myself, this was a shocking observation. Of course, this was just one of the many shocking observations made by Bourgain over the years! These works of Bourgain have been and continue to be remarkably influential.

Remark 3

Theorem 3.2 and Theorem 3.3 generated substantial interest in the question of finding the optimal s for (LWP) in each theorem. In [50], it was shown that (LWP) for \(\mathbb {T}\) holds for \(s>-\frac {1}{2}\) and for \(\mathbb R\) for \(s>-\frac {3}{4}\), both by the contraction mapping principle. In [12], Bourgain observed that (LWP) cannot be proved by the contraction mapping principle, for \(s<-\frac {1}{2}\) on \(\mathbb {T}\) and for \(s<-\frac {3}{4}\) on \(\mathbb {R}\). In [40] and [54], it was shown (independently) that (LWP) holds in \(H^{-\frac {1}{2}}(\mathbb {T})\) and \(H^{-\frac {3}{4}}(\mathbb {R})\), by the contraction mapping principle, using a modification of the spaces \(X_b^s\) introduced by Bourgain. That a modification of the spaces was needed was shown by Nakanishi, Takaoka, and Tsutsumi [72]. Finally, (LWP) was shown in \(H^{-1}(\mathbb T)\) by Kappeler-Topalov [42] and by Killip-Visan in \(H^{-1}(\mathbb {R})\) [53], using inverse scattering. These are the optimal spaces for (LWP) in the scale of Sobolev spaces, as was shown by Molinet [69, 70].

4 A Quick Sampling of Some of the Other Groundbreaking Contributions of Bourgain to Nonlinear Dispersive Equations

4.1 Gibbs Measure Associated to Periodic (NLS)

We again consider the (NLS) equation

$$\displaystyle \begin{aligned}\begin{cases} i\partial_t u +\Delta u \pm |u|{}^{p-1}u=0, p>1, u:\mathbb{T}^d\times\mathbb R\to\mathbb{C}\\ u|{}_{t=0} = u_0 \end{cases}\end{aligned}$$

and recall the two conserved quantities: the mass

$$\displaystyle \begin{aligned}M(u) = \int_{\mathbb{T}^d}|u|{}^2\,dx = M(u_0)\end{aligned}$$

and the Hamiltonian (the energy)

$$\displaystyle \begin{aligned}H(u) = \frac{1}{2}\int_{\mathbb{T}^d} |\nabla u|{}^2\,dx \pm \frac{1}{p+1}\int_{\mathbb{T}^d}|u|{}^{p+1}\,dx = H(u_0).\end{aligned}$$

If we set \(\hat u(n,t) = a_n(t)+ib_n(t)\), we see that u solves (NLS) if and only if \(\dot a_n(t) = \frac {\partial H}{\partial b_n}\) and \(\dot b_n(t) = -\frac {\partial H}{\partial a_n}\), \(n\in \mathbb {Z}^d\). Thus, (NLS) can be viewed as an infinite-dimensional Hamiltonian system. If the Hamiltonian system is finite-dimensional, say we consider |n|≤ N, then the Gibbs measure , given by

$$\displaystyle \begin{aligned}d\mu = \frac{1}{Z_N}e^{-H(a_n,b_n)}\prod_{|n|\le N}da_n\,db_n,\end{aligned}$$

where Z N is a normalization constant, is well-defined and invariant with respect to the flow. In the paper [56], Lebowitz-Rose-Speer were able to make sense of the Gibbs measure associated to (NLS) in \(\mathbb {T}\), with p = 5. They considered the formal expression

$$\displaystyle \begin{aligned}``d\mu = \frac{1}{Z}e^{-H(a_n,b_n)}\prod_{n\in\mathbb{Z}}da_n\,db_n",\end{aligned}$$

by introducing first the Gaussian measure

$$\displaystyle \begin{aligned}d\rho = \frac{1}{\tilde{Z}} e^{-\sum_n(1+n^2)(|a_n|{}^2+|b_n|{}^2)}\prod_{n} da_n\,db_n,\end{aligned}$$

with support in \(H^s(\mathbb {T})\), \(s<\frac {1}{2}\), and then proved that is absolutely continuous with respect to . The questions they formulated were as follows:

  1. 1.

    Is (NLS) on \(\mathbb {T}\), with p = 5, on \(H^s(\mathbb {T})\), \(0<s<\frac {1}{2}\), well-defined for all times, at least for data in the support of the measure?

  2. 2.

    Is invariant with respect to the (NLS) flow?

In the paper [10], Bourgain answered both questions in the positive. To treat both issues, he used the (LWP) result in Hs, \(0<s<\frac {1}{2}\), given in Theorem 3.1, and then used the invariance of the measure under the flow to establish global well-posedness almost surely .

Bourgain then treated in [11] a very challenging question along these lines: Can one do this for the cubic (NLS) on \(\mathbb {T}^2\), at least in the defocusing case, that is, for the equation

$$\displaystyle \begin{aligned}i\partial_tu + \Delta u - |u|{}^2u=0, x\in\mathbb{T}^2?\end{aligned}$$

The existence of in this case was due to Glimm-Jaffe [36], but supp\(\mu \subset H^s(\mathbb {T}^2)\), s < 0, while Theorem 3.1 gives (LWP) in \(H^s(\mathbb {T}^2)\), s > 0.

Bourgain overcame this difficulty through another shocking breakthrough. He considered the following random data:

$$\displaystyle \begin{aligned}u_0^\omega = \sum_{n\in\mathbb{Z}^2}\frac{g_n(\omega)}{(1+|n|{}^2)^{\frac{1}{2}}}e^{inx},\end{aligned}$$

where the {g n} are identically distributed complex Gaussian random variables. Since \(u_0^\omega \in H^s(\mathbb {T}^2)\), s < 0, \(u_0^\omega \) belongs to the support of the Gibbs measure μ. (We are going to ignore here the need for “Wick-ordering” the (NLS) equation here; see [11]). The key observation is that if u is the (NLS) solution, \(w(t) = u(t) - S(t)u_0^\omega \) is (almost surely in ω) well-defined in \(H^{\bar s}(\mathbb {T}^2)\), where \(\bar s>0\), and one can then solve for w, to obtain a local in time solution. Finally, the local in time solution is extended globally in time, using the invariance of the Gibbs measure. This very influential paper led to the notion of “probabilistic well-posedness” in dispersive equations in works of Burq-Tzvelkov [20], T. Oh [73], and many others, including Bourgain-Bulut [17, 18].

4.2 Bourgain’s “High-Low Decomposition”

In Theorem 2.1, the local in time result can be extended to a global in time one, in case the Hs norm of the data is small, s ≥ s 0. In the mass (L2) subcritical case, when p − 1 < 4∕d, that is when s 0 < 0, the problem is locally well-posed in L2 and hence globally well-posed in L2. When p − 1 ≥ 4∕d, in the focusing case, that is when the sign in front of the nonlinearity in (NLS) is negative, and hence the Hamiltonian does not have a definite sign, sufficiently large smooth solution may blow-up in finite time (see Glassey [35], Merle [58, 59], Bourgain-Wang [19], Merle-Raphaël [60,61,62,63,64], Raphaël [74, 75], Merle-Raphaël-Rodnianski [65], etc.). Also, if the nonlinearity is “defocusing,” that is, the sign in front of the nonlinear term in (NLS) is negative so that the conserved Hamiltonian

$$\displaystyle \begin{aligned}H(u) = \frac{1}{2}\int |\nabla u|{}^2 +\frac{1}{p+1}\int |u|{}^{p+1},\end{aligned}$$

controls \(\int |\nabla u|{ }^2\), and if \(p-1<\frac {4}{d-2}\) (that is s 0 < 1) and hence the problem is energy subcritical, (NLS) is globally well-posed in the energy sphere \(H^1(\mathbb {R}^d)\), by iterating the result in Theorem 2.1.

Bourgain [13] developed a very general method to, in such circumstances, obtain global well-posedness below the energy norm. A sample result is

Theorem 4.1 ([13])

The problem

$$\displaystyle \begin{aligned}\begin{cases} {} i\partial_tu + \Delta u - u|u|{}^2 = 0\\ u|{}_{t=0} = u_0 \in H^s(\mathbb{R}^2) \end{cases}\end{aligned}$$

is globally well-posed for \(s>\frac {3}{5}\) . Moreover, the solution u satisfies \(u(t)-S(t)u_0\in H^1(\mathbb {R}^2)\) for all t (with a polynomial control in |t| of the H1 norm).

The general scheme of the method is as follows: first, one has to have a conserved quantity (say I(u 0)), such that I(u 0) controls a certain \(H^{s_0}\) norm. Next, one needs a local well-posedness result (LWP) in \(H^{s_1}\), for s 1 < s 0, with the flow map satisfying \(I(u(t)-S(t)u_0) \le F(\|u_0\|{ }_{H^{s_1}})\), where S(t) is the associated linear evolution, acting unitarily on all Hs spaces. One then expects a global well-posedness result in \(H^{s_2}\), for some s 1 < s 2 < s 0. In the theorem stated, I is the Hamiltonian. One then splits, for some T large and fixed, \(u_0 = u_{0,1}^{(N_0)}+u_{0,2}^{(N_0)}\), with \(u_{0,1}^{(N)} = \int _{|\xi |\le N_0} \hat {u}_0(\xi ) e^{ix\cdot \xi }\,d\xi \), where N 0 = N 0(T) is to be chosen.

It is simple to see that \(H(u_{0,1}^{(N_0)})\lesssim N_0^{2(1-s)}\). One then solves the nonlinear problem with initial data \(u_{0,1}^{(N_0)}\), for all times. If we choose the time interval I = [0, δ], where \(\delta = N_0^{-2(1-s)-\epsilon }\),

$$\displaystyle \begin{aligned}\|u_{0,1}^{(N_0)}\|{}_{L^4(\mathbb{R}^d\times I)} = o(1).\end{aligned}$$

If we let \(u = u_1^{(N_0)} + v\), where \(u_1^{(N_0)}\) is the global solution just mentioned, v satisfies the difference equation

$$\displaystyle \begin{aligned}\begin{cases} i\partial_tv + \Delta v - 2|u_1^{(N_0)}|{}^2 v - (u_1^{(N_0)})^2\bar v - \overline{(u_1^{(N_0)})} v^2 - 2 u_1^{(N_0)}|v|{}^2 - |v|{}^2v = 0\\ v|{}_{t=0} = u_{0,2}^{(N_0)}, \end{cases}\end{aligned}$$

with \(\|u_{0,2}^{(N_0)}\|{ }_{L^2} \lesssim N_0^{-s}\); \(\|u_{0,2}^{(N_0)}\|{ }_{H^s}\le C\). One then gets, after calculations, \(v = S(t)(u_{0,2}^{(N_0)}) + w\), where w(t) ∈ H1, \(\|w(t)\|{ }_{L^2} \lesssim N_0^{-s}\) and \(\|w(t)\|{ }_{H^1} \lesssim N_0^{1-2s+\epsilon }\).

Then, fixing t 1 = δ, we obtain u(t 1) = u 1 + v 1, where \(u_1 = u_1^{(N_0)}(t_1) + w(t_1)\), \(v_1 = S(t_1)(u_{0,2}^{(N_0)})\). Using the conservation of H, and the bounds for w, this yields

$$\displaystyle \begin{aligned}H(u_1) \le H(u_0) + CN_0^{2-3s+\epsilon},\end{aligned}$$

while v 1 has the same properties as \(u_{0,2}^{(N_0)}\). Iterating the procedure, to reach time T, we need a number of steps:

$$\displaystyle \begin{aligned}\frac{T}{\delta}\simeq T\cdot N_0^{2(1-s)+\epsilon}.\end{aligned}$$

Thus, we need to ensure that

$$\displaystyle \begin{aligned}T\cdot N_0^{2(1-s)+\epsilon}\cdot N_0^{2-3s+\epsilon} < H(u_{0,1}^{(N_0)}) \approx N_0^{2(1-s)}.\end{aligned}$$

This can be achieved for \(s>\frac {2}{3}\). A more elaborate argument gives \(s>\frac {3}{5}\).

This method, as mentioned before, is very general and has led to many global well-posedness results, due to many researchers, for instance, in energy subcritical, defocusing problems. The method also stimulated the “I-team” (Colliander, Keel, Staffilani, Takaoka and Tao) to develop the “I-method” to treat similar types of situations. The “I-method” has been extraordinarily successful (see, for instance, [25,26,27,28], etc.).

Besides his interest in global well-posedness for defocusing, energy subcritical (NLS), Bourgain was very interested in corresponding global in time results for energy critical and supercritical (NLS). In the next section, we will discuss Bourgain’s work in the energy critical case. Understanding the global in time, energy supercritical case was a problem that Bourgain considered very natural and intriguing. In [16], Bourgain conjectured the global existence of classical solutions, with smooth, well-localized data, for defocusing energy supercritical (NLS). For years, this problem was considered out of reach. Recently, this conjecture was disproved for d ≥ 5 in the spectacular series of papers by Merle, Raphaël, Rodnianski, and Szeftel [66, 67], who also were able to obtain corresponding results for the compressible Euler and Navier-Stokes flows [68].

4.3 Bourgain’s Work on the Defocusing Energy Critical (NLS)

In the remarkable paper [14], Bourgain considered the defocusing, energy critical (NLS)

$$\displaystyle \begin{aligned}\begin{cases} i\partial_tu + \Delta u - |u|{}^{\frac{4}{d-2}}u = 0, d\ge 3\\ u|{}_{t=0} = u_0 \in H^1(\mathbb{R}^d) \end{cases}\end{aligned} $$
(7)

Theorem 4.2

(7) is globally well-posed for u 0 radial, when d = 3, 4. Moreover, higher regularity of u 0 is preserved for all times.

Remark 4

The result was proved independently by Grillakis [39], when d = 3. It was extended to all d ≥ 3, still under u 0 radial, by Tao in 2005.

Remark 5

In addition to global well-posedness, Bourgain established scattering, that is, to say, there exist \(u_0^\pm \in H^1(\mathbb {R}^d)\), radial such that

$$\displaystyle \begin{aligned}\lim_{t\to\pm\infty}\left\|u(t) - S(t)(u_0^\pm)\right\|{}_{H^1(\mathbb{R}^d)} = 0.\end{aligned}$$

Remark 6

The corresponding result for the defocusing energy critical nonlinear wave equation

$$\displaystyle \begin{aligned}\begin{cases} \partial_t^2u - \Delta u + |u|{}^{\frac{4}{d-2}}u = 0\\ u|{}_{t=0} = u_0 \in H^1(\mathbb{R}^d)\\ \partial_tu|{}_{t=0} = u_1 \in L^2(\mathbb{R}^d) \end{cases}\end{aligned}$$

was established by Struwe [84] in the radial case and by Grillakis [37, 38] in the non-radial case (see also [79, 80]), with scattering being obtained in [2]. The key idea was to use the Morawetz identity [71], which for the wave equation has energy critical scaling, combined with finite speed of propagation (another important feature of the wave equation) to prevent “energy concentration.”

For the proof of Theorem 4.2, when d = 3, the starting point is to show that if

$$\displaystyle \begin{aligned} \int_0^{T_{\star}}\int_{\mathbb{R}^3} |u(x,t)|{}^{10}\,dx\,dt < \infty, \end{aligned} $$
(8)

where T is the “final time of existence” of u, then T  =  and u scatters. This fact is now referred to as “the standard finite time blow-up” criterion. In order to achieve (8), Bourgain’s idea was to do so by induction on the size of the Hamiltonian of u 0 and show that

$$\displaystyle \begin{aligned}\|u\|{}_{L_x^{10}L_{[0,T_{\star}]}^{10}} \le M(H(u_0)),\end{aligned}$$

for some function M. It is easy to show, from the proof of the local well-posedness result (since \(\|u_0\|{ }_{H^1} \lesssim H(u_0)\)), that this is the case if H(u 0) is small. Arguing by contradiction, one assumes that

$$\displaystyle \begin{aligned}\|u\|{}_{L_x^{10}L_{[0,T_{\star}]}^{10}} > M,\end{aligned}$$

for some M large and that \(\|v\|{ }_{L_x^{10}L_t^{10}} < M_1,\) whenever

$$\displaystyle \begin{aligned}\begin{cases} i\partial_tv + \Delta v - |v|{}^4v = 0\\ v|{}_{t=0} = v_0, \end{cases}\end{aligned}$$

provided H(v 0) < H(u 0) − η4, for some small η (depending only on H(u 0)), and then one reaches a contradiction for large M.

In order to reach this contradiction, Bourgain introduced a modification of the Morawetz estimate for the Schrödinger equation, due to Lin-Strauss [57]. Comparing Theorem 4.2 with the earlier work on the wave equation, by Grillakis, mentioned in Remark 6, key difficulties are the infinite speed of propagation and the unfavorable scaling of the estimate in [57]. This is addressed in

Proposition 1

Let u be a solution of (7) in the energy space on a time interval I on which (7) is well-posed in the energy space. Then,

$$\displaystyle \begin{aligned}\int_I \int_{|x|<|I|{}^{1/2}} \frac{|u(x,t)|{}^6}{|x|}\,dx\,dt \le CH(u_0)|I|{}^{1/2}.\end{aligned}$$

It is in the application of this Proposition (which allows one to handle energy concentration) that the radial hypothesis is used. The details of the proof are intricate. The “induction on energy” used in the proof is an audacious idea, which has been extremely influential. In [29], the “I-team” (Colliander-Keel-Staffilani-Takaoka-Tao) in a major breakthrough extended the d = 3 result in Theorem 4.2 to the non-radial case. An important ingredient of their proof is the introduction of an “interaction Morawetz” inequality, a version of Proposition 1, in which the origin is not a privileged point. This was extended to d = 4 by Ryckman-Visan [77] and to d ≥ 5 by Visan [89]. Later on, a new method, dubbed the “concentration-compactness/rigidity theorem method,” was introduced in [46,47,48], which is very flexible and which could also treat focusing problems, under sharp size conditions. This method also led to many more developments in this type of problems, in the works of many researchers. For a proof of Theorem 4.2, and its non-radial version in [29], using this new method, see the work of Killip-Visan [52].

5 Conclusion

The work of Jean Bourgain transformed the field of nonlinear dispersive equations by settling old conjectures, introducing new methods and ideas, and posing important problems. The works briefly described in this note are just a small (hopefully representative) sample of Bourgain’s influential contributions to this field. They will continue to inspire researchers for generations to come.