1 Introduction

A classical sub-branch of harmonic analysis, started in the late 1960s, asks to restrict meaningfully the Fourier transform \({\widehat{f}}\) of a certain non-integrable function f to certain curved lower-dimensional subsets of the Euclidean space; see Stein’s book [19, §VIII.4]. A general setting is obtained by taking a \(\sigma \)-finite measure \(\sigma \) on Borel subsets of \({\mathbb {R}}^d\). Also, let \(S\subseteq {\mathbb {R}}^d\) be a Borel set such that \(\sigma ({\mathbb {R}}^d\setminus S)=0\). Typically, S is a closed manifold in \({\mathbb {R}}^d\) and \(\sigma \) is an appropriately weighted surface measure on S. As soon as we have an a priori estimate

$$\begin{aligned} \big \Vert {\widehat{f}}\,\big |_S\big \Vert _{{\text {L}}^q(S,\sigma )} \lesssim _{d,\sigma ,p,q} \Vert f\Vert _{{\text {L}}^p({\mathbb {R}}^d)} \end{aligned}$$
(1.1)

for some \(p\in (1,\infty )\) and \(q\in [1,\infty ]\), we can define the Fourier restriction operator as the unique bounded linear operator

$$\begin{aligned} {\mathcal {R}}:{\text {L}}^p({\mathbb {R}}^d)\rightarrow {\text {L}}^q(S,\sigma ) \end{aligned}$$

such that \({\mathcal {R}}f={\widehat{f}}|_S\) for every function f in the Schwartz space \({\mathcal {S}}({\mathbb {R}}^d)\). Here and in what follows, we write \(A\lesssim _P B\), when the estimate \(A\leqslant C_P B\) holds for some finite (but unimportant) constant \(C_P\) depending on a set of parameters P.

Let us agree to use the following normalization of the Fourier transform:

$$\begin{aligned} ({\mathcal {F}}f)(\xi ) = {\widehat{f}}(\xi ):= \int _{{\mathbb {R}}^d} f(x) e^{-2\pi {i}x\cdot \xi } \,{\text {d}}x \end{aligned}$$

for an integrable function f on \({\mathbb {R}}^d\) and for every \(\xi \in {\mathbb {R}}^d\), so that the inverse Fourier transform is given by

for \(g\in {\text {L}}^1({\mathbb {R}}^d)\) and \(x\in {\mathbb {R}}^d\). We always have the trivial estimate

$$\begin{aligned} \big \Vert {\widehat{f}}\,\big |_S\big \Vert _{{\text {L}}^\infty (S,\sigma )} \leqslant \Vert f\Vert _{{\text {L}}^1({\mathbb {R}}^d)} \end{aligned}$$
(1.2)

for every \(f\in {\text {L}}^1({\mathbb {R}}^d)\), so restriction of the Fourier transform \(f\mapsto {\widehat{f}}|_S\) also gives a bounded linear operator

$$\begin{aligned} {\mathcal {R}}:{\text {L}}^1({\mathbb {R}}^d)\rightarrow {\text {L}}^\infty (S,\sigma ). \end{aligned}$$

Using the Riesz–Thorin theorem to interpolate between (1.1) and (1.2) then gives us a family of bounded linear operators

$$\begin{aligned} {\mathcal {R}}:{\text {L}}^s({\mathbb {R}}^d)\rightarrow {\text {L}}^{qs'/p'}(S,\sigma ) \end{aligned}$$

for every \(1\leqslant s\leqslant p\), where \(p'\) denotes the conjugated exponent of p, i.e., \(1/p+1/p'=1\). All these operators are mutually compatible on their intersections, so they are rightfully denoted by the same letter \({\mathcal {R}}\).

A novel route was taken recently by Müller, Ricci, and Wright [14], who initiated the program of justifying pointwise Fourier restriction,

$$\begin{aligned} \lim _{t\rightarrow 0+} {\widehat{f}}*\chi _{t} = {\mathcal {R}}f \quad \sigma \text {-a.e. on }S \end{aligned}$$

for \(f\in {\text {L}}^p({\mathbb {R}}^d)\), via maximal estimates

$$\begin{aligned} \Big \Vert \sup _{t\in (0,\infty )}\big |{\widehat{f}}*\chi _{t}\big |\Big \Vert _{{\text {L}}^q(S,\sigma )} \lesssim _{d,\sigma ,\chi ,p,q} \Vert f\Vert _{{\text {L}}^p({\mathbb {R}}^d)}. \end{aligned}$$
(1.3)

Here, \(\chi \in {\mathcal {S}}({\mathbb {R}}^d)\) is a Schwartz function with integral 1 and we write \(\chi _t(x):=t^{-d}\chi (t^{-1}x)\) for a given parameter \(t\in (0,\infty )\). Note that the operator on the left-hand side of (1.3) cannot be understood as a composition of the Fourier transform with some maximal function of the Hardy–Littlewood type, since the measure \(\sigma \) can be (and typically is) singular with respect to the Lebesgue measure.

The authors of [14] achieved the aforementioned goal in two dimensions by adapting the proofs of two-dimensional restriction theorems of Carleson and Sjölin [4] and Sjölin [18]. This methodology was later followed by Ramos [16, 17], Jesurum [9], and Fraccaroli [8] to obtain some higher-dimensional or less smooth/regular results. The second approach to the maximal Fourier restriction was suggested by Vitturi [24], soon after the appearance of [14]. He deduced a non-trivial result for higher-dimensional compact hypersurfaces from ordinary restriction estimates (1.1) by inserting the iterated Hardy–Littlewood maximal function in a clever non-obvious way. The idea of using (1.1) as a black box was later also employed by Oliveira e Silva and one of the present authors [12], while the subsequent paper [11] built on this idea to show that the a priori estimate (1.1) implies the maximal estimate (1.3) in a general and abstract way, as soon as \(p<q\). Each of these two approaches has its advantages and its limitations. The present paper builds further upon the second approach and it has been partially motivated by a question posed by Vitturi [23]. In fact, Theorem 1 below answers one of the open questions that appeared in [23, §4].

For a given function \(\chi :{\mathbb {R}}^d\rightarrow {\mathbb {C}}\) and arbitrary parameters \(r_1,\ldots ,r_d\in (0,\infty )\) we define the multi-parameter dilate of \(\chi \) as

$$\begin{aligned} \chi _{r_1,\ldots ,r_d}:{\mathbb {R}}^d\rightarrow {\mathbb {C}}, \quad \chi _{r_1,\ldots ,r_d}(x_1,\ldots ,x_d):= \frac{1}{r_1\cdots r_d} \chi \Big (\frac{x_1}{r_1},\ldots ,\frac{x_d}{r_d}\Big ). \end{aligned}$$

Also let

$$\begin{aligned} B_{r_1,\ldots ,r_d}(y_1,\ldots ,y_d):= \bigg \{ (x_1,\ldots ,x_d)\in {\mathbb {R}}^d: \frac{(x_1-y_1)^2}{r_1^2} + \cdots + \frac{(x_d-y_d)^2}{r_d^2} \leqslant 1 \bigg \} \end{aligned}$$

be the ellipsoid centered at \((y_1,\ldots ,y_d)\in {\mathbb {R}}^d\) with semi-axes of lengths \(r_1,\ldots ,r_d\) in directions of the coordinate axes. Its volume will be written simply as \(|B_{r_1,\ldots ,r_d}|\). The particular case \(B_r(y):= B_{r,\ldots ,r}(y)\) for \(r\in (0,\infty )\) is simply the Euclidean ball.

Theorem 1

Suppose that the measure space \((S,\sigma )\) and the exponents \(1<p<q<\infty \) are such that the a priori restriction estimate (1.1) holds for every Schwartz function f. Let \(\chi \) be a function satisfying

$$\begin{aligned} \big |\big (\partial _1\cdots \partial _d{\widehat{\chi }}\big )(x)\big | \lesssim _{d,\delta } (1+|x|)^{-d-\delta } \end{aligned}$$
(1.4)

for some \(\delta >0\) and every \(x\in {\mathbb {R}}^d\). Then the following hold.

  1. (a)

    For every \(f\in {\text {L}}^p({\mathbb {R}}^d)\) one also has the multi-parameter maximal estimate

    $$\begin{aligned} \Big \Vert \sup _{r_1,\ldots ,r_d\in (0,\infty )}\big |{\widehat{f}}*\chi _{r_1,\ldots ,r_d}\big |\Big \Vert _{{\text {L}}^q(S,\sigma )} \lesssim _{d,\sigma ,\chi ,p,q} \Vert f\Vert _{{\text {L}}^p({\mathbb {R}}^d)}. \end{aligned}$$
    (1.5)
  2. (b)

    For every \(\chi \) that additionally satisfies \(\int _{{\mathbb {R}}^d}\chi =1\) and every \(f\in {\text {L}}^s({\mathbb {R}}^d)\), \(1\leqslant s\leqslant p\), one also has the multi-parameter convergence result

    $$\begin{aligned} \lim _{(0,\infty )^d\ni (r_1,\ldots ,r_d)\rightarrow (0,\ldots ,0)} {\widehat{f}}*\chi _{r_1,\ldots ,r_d} = {\mathcal {R}}f \quad \sigma \text {-a.e. on }S. \end{aligned}$$
    (1.6)
  3. (c)

    Moreover, if \(f\in {\text {L}}^{s}({\mathbb {R}}^d)\), \(1\leqslant s\leqslant 2p/(p+1)\), then we also have the “multi-parameter Lebesgue point property”

    $$\begin{aligned} \lim _{(0,\infty )^d\ni (r_1,\ldots ,r_d)\rightarrow (0,\ldots ,0)} \frac{1}{|B_{r_1,\ldots ,r_d}|} \int _{B_{r_1,\ldots ,r_d}(\xi )} \big | {\widehat{f}}(\eta ) - ({\mathcal {R}}f)(\xi ) \big | \,{\text {d}}\eta = 0 \qquad \end{aligned}$$
    (1.7)

    of \(\sigma \)-almost every point \(\xi \in S\). In particular,

    $$\begin{aligned} \lim _{(0,\infty )^d\ni (r_1,\ldots ,r_d)\rightarrow (0,\ldots ,0)} \frac{1}{|B_{r_1,\ldots ,r_d}|} \int _{B_{r_1,\ldots ,r_d}(\xi )} {\widehat{f}}(\eta ) \,{\text {d}}\eta = ({\mathcal {R}}f)(\xi ) \end{aligned}$$
    (1.8)

    for \(\sigma \)-a.e. \(\xi \in S\).

Since (1.5) is a stronger maximal inequality than (1.3), Theorem 1 can be viewed as a multi-parameter generalization of [11, Theorem 1] suggested by Vitturi [23, §4], even though a bi-parameter two-dimensional case appeared already in [14]. For instance, by (1.6) now we are able to justify the existence of limits for various anisotropic scalings, such as

$$\begin{aligned} \lim _{t\rightarrow 0+} {\widehat{f}}*\chi _{t,t^2,\ldots ,t^d}. \end{aligned}$$

However, the required assumptions on \(\chi \) are more restrictive here, when compared to [11]: condition (1.4) is different from

$$\begin{aligned} \big |\big (\nabla {\widehat{\chi }}\big )(x)\big | \lesssim _{d,\delta } (1+|x|)^{-1-\delta }, \end{aligned}$$

used in [11]. The last condition is satisfied when \(\chi \) is the (normalized) indicator function of the standard unit ball in \(d\geqslant 2\) dimensions, while our standing assumption (1.4) is not. Still, (1.4) certainly holds at least for Schwartz functions \(\chi \).

For similar reasons we conclude the convergence of the Fourier averages over shrinking ellipsoids, (1.7) and (1.8), only in the smaller range \(1\leqslant s\leqslant 2p/(p+1)\), and not in the full range \(1\leqslant s\leqslant p\), as it was the case with averages over balls [11]. This leads us to interesting open questions, like Problem 2 below. We will explain in Remark 2 after the proof of Theorem 1 that (1.7) and (1.8) could have been equally well formulated for axes-parallel rectangles as

$$\begin{aligned} \lim _{r_1\rightarrow 0+,\ldots ,r_d\rightarrow 0+} \frac{1}{2^d r_1\cdots r_d} \int _{\xi +[-r_1,r_1]\times \cdots \times [-r_d,r_d]} \big | {\widehat{f}}(\eta ) - ({\mathcal {R}}f)(\xi ) \big | \,{\text {d}}\eta = 0 \nonumber \\ \end{aligned}$$
(1.9)

and

$$\begin{aligned} \lim _{r_1\rightarrow 0+,\ldots ,r_d\rightarrow 0+} \frac{1}{2^d r_1\cdots r_d} \int _{\xi +[-r_1,r_1]\times \cdots \times [-r_d,r_d]} {\widehat{f}}(\eta ) \,{\text {d}}\eta = ({\mathcal {R}}f)(\xi ), \end{aligned}$$
(1.10)

respectively. This would have been a bit more standard. However, the same observation combined with a counterexample by Ramos [17, Proposition 4] reveals a limitation in obtaining the full range of exponents for (1.7) and (1.9) (see the comments in Remark 2 again), and thus also for (1.8) and (1.10), which are shown here as their consequences. On the other hand, it is still theoretically possible that (1.8) holds in the same range as (1.1). A supporting argument is that the proof of its one-parameter case in [11] actually depended on the geometry of Euclidean balls.

Problem 2

Prove or disprove that the assumptions of Theorem 1 imply (1.8) for every \(f\in {\text {L}}^p({\mathbb {R}}^d)\) and for \(\sigma \)-a.e. \(\xi \in S\).

Another question, related to property (1.7) and stated in Problem 3 below, remained open after [11] and particular cases of it have already been studied by Ramos [16, 17] and Fraccaroli [8]. In words, we do not know how to extend the range \(1\leqslant s\leqslant 2p/(p+1)\) even when we only consider balls instead of arbitrary ellipsoids.

Problem 3

Prove or disprove that, for every \(f\in {\text {L}}^p({\mathbb {R}}^d)\), the assumptions of Theorem 1 imply that \(\sigma \)-almost every point \(\xi \in S\) is the Lebesgue point of \({\widehat{f}}\), in the sense that

$$\begin{aligned} \lim _{t\rightarrow 0+} \frac{1}{|B_{t}|} \int _{B_{t}(\xi )} \big | {\widehat{f}}(\eta ) - ({\mathcal {R}}f)(\xi ) \big | \,{\text {d}}\eta = 0. \end{aligned}$$

The general maximal principle from [11], which yields information about the Lebesgue sets of Fourier transforms \({\widehat{f}}\) from restriction estimates (1.1), has been used by Bilz [2]. He showed that there exists a subset of \({\mathbb {R}}^d\) of full dimension that is “avoided” by every Borel measure that satisfies a nontrivial Fourier restriction estimate (1.1). It would be interesting to find similar applications of the stronger properties (1.7) or (1.9).

The main new ingredient in the proof of Theorem 1 is a multi-parameter variant of the Christ–Kiselev lemma [5]. Even if its generalization is somewhat straightforward, we will argue that it is substantial by using it to deduce the following result on the Fourier transform alone, with no restriction phenomena involved. In what follows the indicator function of a set \(A\subseteq {\mathbb {R}}^d\) is denoted by \(\mathbb {1}_A\).

Theorem 4

  1. (a)

    For \(p\in [1,2)\) and \(f\in {\text {L}}^p({\mathbb {R}}^d)\) we have the maximal estimate

    $$\begin{aligned} \Big \Vert \sup _{R_1,\ldots ,R_d\in (0,\infty )}\big |{\mathcal {F}}\big (f\mathbb {1}_{[-R_1,R_1]\times \cdots \times [-R_d,R_d]}\big ) \big | \Big \Vert _{{\text {L}}^{p'}({\mathbb {R}}^d)} \lesssim _{d,p} \Vert f\Vert _{{\text {L}}^p({\mathbb {R}}^d)} \end{aligned}$$

    and d-parameter convergence

    $$\begin{aligned} \lim _{R_1\rightarrow \infty ,\ldots ,R_d\rightarrow \infty } \int _{[-R_1,R_1]\times \cdots \times [-R_d,R_d]} f(x) e^{-2\pi {i}x\cdot \xi } \,{\text {d}}x = {\widehat{f}}(\xi ) \end{aligned}$$
    (1.11)

    holds for a.e. \(\xi \in {\mathbb {R}}^d\).

  2. (b)

    If \(d\geqslant 2\), then there exist a function \(f\in {\text {L}}^2({\mathbb {R}}^d)\) and a set of positive measure \(Q\subseteq {\mathbb {R}}^d\) such that

    $$\begin{aligned}{} & {} \limsup _{R_1\rightarrow \infty ,\ldots ,R_d\rightarrow \infty } \bigg |\int _{[-R_1,R_1]\times \cdots \times [-R_d,R_d]} f(x) e^{-2\pi {i}x\cdot \xi } \,{\text {d}}x\bigg |\nonumber \\{} & {} \quad =\infty \quad \text {for every } x\in Q. \end{aligned}$$
    (1.12)

    In particular, even the weak \({\text {L}}^2\) estimate

    $$\begin{aligned} \Big \Vert \sup _{R_1,\ldots ,R_d\in (0,\infty )}\big |{\mathcal {F}}\big (f\mathbb {1}_{[-R_1,R_1]\times \cdots \times [-R_d,R_d]}\big ) \big | \Big \Vert _{{\text {L}}^{2,\infty }({\mathbb {R}}^d)} \lesssim _{d} \Vert f\Vert _{{\text {L}}^2({\mathbb {R}}^d)} \end{aligned}$$

    does not hold.

Part (a) can be thought of as a multi-parameter Menshov–Paley–Zygmund theorem, while part (b) gives a counterexample to the corresponding multi-parameter analogue of Carleson’s theorem [3]. The latter is not our original result, but a mere adaptation of the argument by Charles Fefferman [7] to the continuous setting. We include its proof for completeness of the exposition.

Finally, connections between the Fourier restriction problem and PDEs have been known since the work of Strichartz [20]. Let us comment on a certain reformulation of (1.5) in that direction. The following standard setting is taken from the textbook by Tao [21]; see also the lecture notes by Koch, Tataru, and Vişan [10]. Let \(\phi :{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) be a \({\text {C}}^\infty \) function. The self-adjoint operator \(\phi (D)=\phi (\nabla /2\pi {i})\) is defined to be the Fourier multiplier associated with the symbol \(\phi \), i.e.,

$$\begin{aligned} (\widehat{\phi (D) f})(\xi ) = \phi (\xi ) {\widehat{f}}(\xi ). \end{aligned}$$

If \(\phi \) happens to be a polynomial

$$\begin{aligned} \phi (\xi ) = \sum _{|\alpha |\leqslant k} c_\alpha \xi ^\alpha \end{aligned}$$

in n variables \(\xi =(\xi _1,\ldots ,\xi _n)\) of degree k with real coefficients \(c_\alpha \), then \(\phi (D)\) is just the self-adjoint differential operator acting on Schwartz functions,

$$\begin{aligned} \phi (D) = \sum _{|\alpha |\leqslant k} (2\pi {i})^{-|\alpha |} c_\alpha \partial ^\alpha . \end{aligned}$$

The solution of a general scalar constant-coefficient linear dispersive initial value problem

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t u(x,t) = {i} \phi (D) u(x,t) &{} \text {in } {\mathbb {R}}^n\times {\mathbb {R}}, \\ \ \ u(x,0) = f(x) &{} \text {in } {\mathbb {R}}^n \end{array}\right. } \end{aligned}$$
(1.13)

is given explicitly as

$$\begin{aligned} u(x,t) = (e^{{i} t \phi (D)} f)(x):= \int _{{\mathbb {R}}^n} e^{{i} t \phi (\xi ) + 2\pi {i} x\cdot \xi } {\widehat{f}}(\xi ) \,{\text {d}}\xi \end{aligned}$$

for \(x\in {\mathbb {R}}^n\), \(t\in {\mathbb {R}}\), and a Schwartz function \(f\in {\mathcal {S}}({\mathbb {R}}^d)\); see [21, Section 2.1].

Corollary 5

Suppose that a Strichartz-type estimate for (1.13) of the form

$$\begin{aligned} \big \Vert (e^{{i} t \phi (D)} f)(x) \big \Vert _{{\text {L}}^s_{(x,t)}({\mathbb {R}}^n\times {\mathbb {R}})} \lesssim _{n,\phi } \Vert f\Vert _{{\text {L}}^2({\mathbb {R}}^n)} \end{aligned}$$
(1.14)

holds for some exponent \(s\in (2,\infty )\) and every Schwartz function \(f\in {\mathcal {S}}({\mathbb {R}}^n)\). Then for every \(\psi \in {\mathcal {S}}({\mathbb {R}}^{n+1})\) and any choice of measurable functions \(r_1,\ldots ,r_{n+1}:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) the pseudo-differential operator

$$\begin{aligned} (T_{\psi ,r_1,\ldots ,r_{n+1}}f)(x,t):= \int _{{\mathbb {R}}^n} \psi \big (r_1(\xi )x_1,\ldots ,r_n(\xi )x_n,r_{n+1}(\xi )t\big ) \,e^{{i} t \phi (\xi ) + 2\pi {i} x\cdot \xi } {\widehat{f}}(\xi ) \,{\text {d}}\xi \end{aligned}$$

satisfies the analogous bound

$$\begin{aligned} \Vert T_{\psi ,r_1,\ldots ,r_{n+1}} f \Vert _{{\text {L}}^s({\mathbb {R}}^{n+1})} \lesssim _{n,\phi ,\psi ,s} \Vert f\Vert _{{\text {L}}^2({\mathbb {R}}^n)}, \end{aligned}$$
(1.15)

with a constant that is independent of \(r_1,\ldots ,r_{n+1}\).

Note that (1.14) is a particular case of (1.15), as the former inequality can be easily recovered by taking \(r_1,\ldots ,r_{n+1}\) to be identically 0. Specifically for the Schrödinger equation, i.e., when \(\phi (D)=\Delta \), the Strichartz estimate (1.14) holds with \(s=2+4/n\). A larger range of Strichartz estimates is available when one introduces the mixed norms [1], see [21, Theorem 2.3] or the review paper [6], but our proof of Corollary 5 is not well suited for this generalization.

While (1.15) might not have substantial applications in the theory of PDEs, we merely wanted to present a restatement of (1.5) in that language. Note that in the definition of the above pseudo-differential operator it is only meaningful (from the aspect of physical dimensions) to scale the spatial variable x and the time variable t independently. In other words, just writing \(\psi (r(\xi )(x,t))\) would make no sense. This also partly motivates the study of multiparameter maximal Fourier restriction estimates.

2 Multi-Parameter Christ–Kiselev Lemma

This section is devoted to a bound on rather general multi-parameter maximal operators, which generalizes a classical result of Christ and Kiselev [5].

Let \(({\mathbb {X}},{\mathcal {X}},\mu )\) and \(({\mathbb {Y}},{\mathcal {Y}},\nu )\) be measure spaces. Let d be a positive integer, which we interpret as the number of “parameters.” For every \(1\leqslant j\leqslant d\) we are also given a countable totally ordered set \(I_j\) and an increasing system \((E_j(i): i\in I_j)\) of sets from \({\mathcal {Y}}\), i.e., an increasing function \(E_j:I_j\rightarrow {\mathcal {Y}}\) with respect to the order on \(I_j\) and the set inclusion on \({\mathcal {Y}}\).

Lemma 6

(Multi-parameter Christ–Kiselev Lemma) Take exponents \(1\leqslant p<q\leqslant \infty \) and a bounded linear operator \(T:{\text {L}}^p({\mathbb {Y}},{\mathcal {Y}},\nu )\rightarrow {\text {L}}^q({\mathbb {X}},{\mathcal {X}},\mu )\). The maximal operator

$$\begin{aligned} (T_{\star }f)(x):= \sup _{(i_1,\ldots ,i_d)\in I_1\times \cdots \times I_d} \big |T\big (f\mathbb {1}_{E_1(i_1)\cap \cdots \cap E_d(i_d)}\big )(x)\big | \end{aligned}$$

is also bounded from \({\text {L}}^p({\mathbb {Y}},{\mathcal {Y}},\nu )\) to \({\text {L}}^q({\mathbb {X}},{\mathcal {X}},\mu )\) with the operator norm satisfying

$$\begin{aligned} \Vert T_{\star }\Vert _{{\text {L}}^p({\mathbb {Y}})\rightarrow {\text {L}}^q({\mathbb {X}})} \leqslant \big (1-2^{1/q-1/p}\big )^{-d} \Vert T\Vert _{{\text {L}}^p({\mathbb {Y}})\rightarrow {\text {L}}^q({\mathbb {X}})}. \end{aligned}$$
(2.1)

The particular case \(d=1\) is precisely [5, Theorem 1.1]. The proof given below is a d-parameter modification of the approach from [5], incorporating a simplification due to Tao [22, Note #2], who used an induction on the cardinality of \(I_1\) to immediately handle general measure spaces with atoms.

Proof

By the Monotone Convergence Theorem it is sufficient to prove the claim when the ordered sets \(I_1,\ldots ,I_d\) are finite. Note that it is crucial that the desired bound does not depend on their sizes. The exponents p and q, the two measure spaces, and the operator T are fixed throughout the proof. We will use a nested mathematical induction, first on d and then on the cardinality of \(I_d\), to prove (2.1) for all finite increasing systems of sets \((E_j(i):i\in I_j)\), \(1\leqslant j\leqslant d\). The induction basis \(d=1=|I_1|\) is trivial, since then \(T_{\star }\) satisfies the same bound as T.

We turn to the induction step. By relabeling the indices we can achieve that \(I_j=\{1,2,\ldots ,n_j\}\) for each \(1\leqslant j\leqslant d\) and some positive integers \(n_1,\ldots ,n_d\). Denote

$$\begin{aligned} F(i):= E_1(n_1)\cap \cdots \cap E_{d-1}(n_{d-1})\cap E_d(i) \quad \text {for } 1\leqslant i\leqslant n_d. \end{aligned}$$

Take a function \(f\in {\text {L}}^p({\mathbb {Y}},{\mathcal {Y}},\nu )\). By the assumption that the system \((E_d(i):i\in I_d)\) is increasing, we have

$$\begin{aligned} 0 \leqslant \Vert f\Vert _{{\text {L}}^p(F(1))} \leqslant \Vert f\Vert _{{\text {L}}^p(F(2))} \leqslant \cdots \leqslant \Vert f\Vert _{{\text {L}}^p(F(n_d))}. \end{aligned}$$

Let \(1\leqslant l\leqslant n_d\) be the smallest integer such that

$$\begin{aligned} \Vert f\Vert _{{\text {L}}^p(F(l))}^p \geqslant \frac{1}{2} \Vert f\Vert _{{\text {L}}^p(F(n_d))}^p. \end{aligned}$$

If \(l\geqslant 2\), then

$$\begin{aligned} \Vert f\Vert _{{\text {L}}^p(F(l-1))}^p < \frac{1}{2} \Vert f\Vert _{{\text {L}}^p(F(n_d))}^p \leqslant \frac{1}{2} \Vert f\Vert _{{\text {L}}^p({\mathbb {Y}})}^p, \end{aligned}$$

so applying the induction hypothesis with the last system of sets replaced with the subsystem

$$\begin{aligned} (E_{d}(i_d): i_d\in \{1,\ldots ,l-1\}), \end{aligned}$$

we get

$$\begin{aligned}&\Big \Vert \max _{\begin{array}{c} i_1,\ldots ,i_{d}\\ 1\leqslant i_d\leqslant l-1 \end{array}} \big |T\big (f\mathbb {1}_{E_1(i_1)\cap \cdots \cap E_d(i_d)}\big )\big | \Big \Vert _{{\text {L}}^q({\mathbb {X}})} \nonumber \\&\quad = \Big \Vert \max _{\begin{array}{c} i_1,\ldots ,i_{d}\\ 1\leqslant i_d\leqslant l-1 \end{array}} \big |T\big (f\mathbb {1}_{F(l-1)}\mathbb {1}_{E_1(i_1)\cap \cdots \cap E_d(i_d)}\big )\big | \Big \Vert _{{\text {L}}^q({\mathbb {X}})} \nonumber \\&\quad \leqslant \big (1-2^{1/q-1/p}\big )^{-d} \Vert T\Vert _{{\text {L}}^p({\mathbb {Y}})\rightarrow {\text {L}}^q({\mathbb {X}})} \Vert f\mathbb {1}_{F(l-1)}\Vert _{{\text {L}}^p({\mathbb {Y}})} \nonumber \\&\quad \leqslant 2^{-1/p} \big (1-2^{1/q-1/p}\big )^{-d} \Vert T\Vert _{{\text {L}}^p({\mathbb {Y}})\rightarrow {\text {L}}^q({\mathbb {X}})} \Vert f\Vert _{{\text {L}}^p({\mathbb {Y}})}. \end{aligned}$$
(2.2)

Also,

$$\begin{aligned} \Vert f\Vert _{{\text {L}}^p(F(n_d)\setminus F(l))}^p&= \Vert f\Vert _{{\text {L}}^p(F(n_d))}^p - \Vert f\Vert _{{\text {L}}^p(F(l))}^p \leqslant \frac{1}{2} \Vert f\Vert _{{\text {L}}^p(F(n_d))}^p \leqslant \frac{1}{2} \Vert f\Vert _{{\text {L}}^p({\mathbb {Y}})}^p, \end{aligned}$$

so, if \(l\leqslant n_d-1\), then applying the induction hypothesis with the last system of sets replaced with the subsystem,

$$\begin{aligned} (E_{d}(i_d): i_d\in \{l+1,\ldots ,n_d\}), \end{aligned}$$

we obtain

$$\begin{aligned}&\Big \Vert \max _{\begin{array}{c} i_1,\ldots ,i_{d}\\ l+1\leqslant i_d\leqslant n_d \end{array}} \big |T\big (f\mathbb {1}_{E_1(i_1)\cap \cdots \cap E_{d-1}(i_{d-1})\cap (E_d(i_d)\setminus E_d(l))}\big )\big | \Big \Vert _{{\text {L}}^q({\mathbb {X}})} \nonumber \\&\quad = \Big \Vert \max _{\begin{array}{c} i_1,\ldots ,i_{d}\\ l+1\leqslant i_d\leqslant n_d \end{array}} \big |T\big (f\mathbb {1}_{F(n_d)\setminus F(l)}\mathbb {1}_{E_1(i_1)\cap \cdots \cap E_{d-1}(i_{d-1})\cap (E_d(i_d)\setminus E_d(l))}\big )\big | \Big \Vert _{{\text {L}}^q({\mathbb {X}})} \nonumber \\&\quad \leqslant \big (1-2^{1/q-1/p}\big )^{-d} \Vert T\Vert _{{\text {L}}^p({\mathbb {Y}})\rightarrow {\text {L}}^q({\mathbb {X}})} \Vert f\mathbb {1}_{F(n_d)\setminus F(l)}\Vert _{{\text {L}}^p({\mathbb {Y}})} \nonumber \\&\quad \leqslant 2^{-1/p} \big (1-2^{1/q-1/p}\big )^{-d} \Vert T\Vert _{{\text {L}}^p({\mathbb {Y}})\rightarrow {\text {L}}^q({\mathbb {X}})} \Vert f\Vert _{{\text {L}}^p({\mathbb {Y}})}. \end{aligned}$$
(2.3)

Finally, if \(d\geqslant 2\), then we can also apply the induction hypothesis with the same first \(d-1\) systems of sets, to conclude

$$\begin{aligned}&\Big \Vert \max _{i_1,\ldots ,i_{d-1}} \big |T\big (f\mathbb {1}_{E_1(i_1)\cap \cdots \cap E_{d-1}(i_{d-1})\cap E_d(l)}\big )\big | \Big \Vert _{{\text {L}}^q({\mathbb {X}})} \nonumber \\&\quad =\Big \Vert \max _{i_1,\ldots ,i_{d-1}} \big |T\big (f\mathbb {1}_{E_d(l)}\mathbb {1}_{E_1(i_1)\cap \cdots \cap E_{d-1}(i_{d-1})}\big )\big | \Big \Vert _{{\text {L}}^q({\mathbb {X}})} \nonumber \\&\quad \leqslant \big (1-2^{1/q-1/p}\big )^{-d+1} \Vert T\Vert _{{\text {L}}^p({\mathbb {Y}})\rightarrow {\text {L}}^q({\mathbb {X}})} \Vert f\mathbb {1}_{E_d(l)}\Vert _{{\text {L}}^p({\mathbb {Y}})} \nonumber \\&\quad \leqslant \big (1-2^{1/q-1/p}\big )^{-d+1} \Vert T\Vert _{{\text {L}}^p({\mathbb {Y}})\rightarrow {\text {L}}^q({\mathbb {X}})} \Vert f\Vert _{{\text {L}}^p({\mathbb {Y}})}. \end{aligned}$$
(2.4)

The last bound also holds in the case \(d=1\), with the maximum disappearing from the left-hand side, and it is a consequence of the mere boundedness of T.

Now denote

$$\begin{aligned} S := \big \{x\in {\mathbb {X}} :\,&(T_{\star }f)(x)=\big |\big (T\big (f\mathbb {1}_{E_1(i_1)\cap \cdots \cap E_d(i_d)}\big )\big )(x)\big | \\ {}&\text {for some } (i_1,\ldots ,i_d)\in I_1\times \cdots \times I_d \text { such that } i_d\leqslant l-1 \big \}, \end{aligned}$$

so that, by linearity of T,

$$\begin{aligned} T_{\star }f&\leqslant \, \mathbb {1}_S \max _{\begin{array}{c} i_1,\ldots ,i_d\\ 1\leqslant i_d\leqslant l-1 \end{array}} \big |T\big (f\mathbb {1}_{E_1(i_1)\cap \cdots \cap E_{d-1}(i_{d-1})\cap E_d(i_d)}\big )\big | \\&\quad + \mathbb {1}_{\mathbb {X}\setminus S} \max _{\begin{array}{c} i_1,\ldots ,i_d\\ l+1\leqslant i_d\leqslant n_d \end{array}} \big |T\big (f\mathbb {1}_{E_1(i_1)\cap \cdots \cap E_{d-1}(i_{d-1})\cap (E_d(i_d)\setminus E_d(l))}\big )\big | \\&\quad + \mathbb {1}_{\mathbb {X}\setminus S} \max _{i_1,\ldots ,i_{d-1}} \big |T\big (f\mathbb {1}_{E_1(i_1)\cap \cdots \cap E_{d-1}(i_{d-1})\cap E_d(l)}\big )\big |. \end{aligned}$$

Here, maximum over an empty set is understood to be 0. When \(q<\infty \) we conclude

$$\begin{aligned} \Vert T_{\star }f\Vert _{{\text {L}}^q({\mathbb {X}})}&\leqslant \bigg (\Big \Vert \max _{\begin{array}{c} i_1,\ldots ,i_d\\ 1\leqslant i_d\leqslant l-1 \end{array}} \big |T\big (f\mathbb {1}_{E_1(i_1)\cap \cdots \cap E_d(i_d)}\big )\big |\Big \Vert _{{\text {L}}^q(S)}^q \\&\quad + \Big \Vert \max _{\begin{array}{c} i_1,\ldots ,i_d\\ l+1\leqslant i_d\leqslant n_d \end{array}} \big |T\big (f\mathbb {1}_{E_1(i_1)\cap \cdots \cap E_{d-1}(i_{d-1})\cap (E_d(i_d)\setminus E_d(l))}\big )\big |\Big \Vert _{{\text {L}}^q({\mathbb {X}}\setminus S)}^q\bigg )^{1/q} \\&\quad + \Big \Vert \max _{i_1,\ldots ,i_{d-1}} \big |T\big (f\mathbb {1}_{E_1(i_1)\cap \cdots \cap E_{d-1}(i_{d-1})\cap E_d(l)}\big )\big |\Big \Vert _{{\text {L}}^q({\mathbb {X}}\setminus S)}, \end{aligned}$$

while in the endpoint case \(q=\infty \) we instead have

$$\begin{aligned} \Vert T_{\star }f\Vert _{{\text {L}}^\infty ({\mathbb {X}})}&\leqslant \max \bigg \{\Big \Vert \max _{\begin{array}{c} i_1,\ldots ,i_d\\ 1\leqslant i_d\leqslant l-1 \end{array}} \big |T\big (f\mathbb {1}_{E_1(i_1)\cap \cdots \cap E_d(i_d)}\big )\big |\Big \Vert _{{\text {L}}^\infty (S)}, \\&\quad \Big \Vert \max _{\begin{array}{c} i_1,\ldots ,i_d\\ l+1\leqslant i_d\leqslant n_d \end{array}} \big |T\big (f\mathbb {1}_{E_1(i_1)\cap \cdots \cap E_{d-1}(i_{d-1})\cap (E_d(i_d)\setminus E_d(l))}\big )\big |\Big \Vert _{{\text {L}}^\infty ({\mathbb {X}}\setminus S)}\bigg \} \\&\quad + \Big \Vert \max _{i_1,\ldots ,i_{d-1}} \big |T\big (f\mathbb {1}_{E_1(i_1)\cap \cdots \cap E_{d-1}(i_{d-1})\cap E_d(l)}\big )\big |\Big \Vert _{{\text {L}}^\infty ({\mathbb {X}}\setminus S)}. \end{aligned}$$

Applying (2.2), (2.3), and (2.4) we complete the induction step. \(\square \)

Remark 1

An alternative proof of Lemma 6 can be obtained as follows. We can generalize the claim further to general sublinear operators T, i.e., operators satisfying

$$\begin{aligned} |T(\alpha f)| = |\alpha | |Tf|, \quad |T(f+g)| \leqslant |Tf| + |Tg| \end{aligned}$$

for all \(\alpha \in {\mathbb {C}}\) and all \(f,g\in {\text {L}}^p({\mathbb {Y}},{\mathcal {Y}},\nu )\). The advantage of doing this is that various maximal operators are always sublinear. Then we can write the operator \(T_\star \) as a composition of d maximal truncations, each one with respect to a single increasing system \((E_j(i):i\in I_j)\), namely

$$\begin{aligned} T_{\star }f = \sup _{i_1 \in I_1} \sup _{i_2 \in I_2} \cdots \sup _{i_d \in I_d} \Big |T\Big (\cdots \big ((f\mathbb {1}_{E_1(i_1)})\mathbb {1}_{E_2(i_2)}\big )\cdots \mathbb {1}_{E_d(i_d)}\Big )\Big |, \end{aligned}$$

so the claim is reduced merely to the one-parameter case. Finally, one can notice that the known proofs of the particular case \(d=1\), both the one by Christ and Kiselev [5, Theorem 1.1] and the one by Tao [22, Note #2], clearly remain valid for merely sublinear operators T. We leave the details to the reader.

Now assume that the second measurable space splits as a product

$$\begin{aligned} ({\mathbb {Y}},{\mathcal {Y}}) = ({\mathbb {Y}}_1\times \cdots \times {\mathbb {Y}}_d,\, {\mathcal {Y}}_1\otimes \cdots \otimes {\mathcal {Y}}_d) \end{aligned}$$

of \(d\geqslant 1\) measurable spaces \(({\mathbb {Y}}_j,{\mathcal {Y}}_j)\). The product \(\sigma \)-algebra \({\mathcal {Y}}_1\otimes \cdots \otimes {\mathcal {Y}}_d\) is defined to be the smallest \(\sigma \)-algebra on the set \({\mathbb {Y}}_1\times \cdots \times {\mathbb {Y}}_d\) that contains all Cartesian products \(A_1\times \cdots \times A_d\) with \(A_j\in {\mathcal {Y}}_j\) for every index \(1\leqslant j\leqslant d\). Also suppose that for each \(1\leqslant j\leqslant d\) we have a countable totally ordered set \(I_j\) and an increasing system \((A_{j}^{i}:i\in I_j)\) of sets from \({\mathcal {Y}}_j\).

Corollary 7

Take exponents \(1\leqslant p<q\leqslant \infty \) and a bounded linear operator \(T:{\text {L}}^p({\mathbb {Y}},{\mathcal {Y}},\nu )\rightarrow {\text {L}}^q({\mathbb {X}},{\mathcal {X}},\mu )\). The maximal operator

$$\begin{aligned} (T_{\star }f)(x):= \sup _{(i_1,\ldots ,i_d)\in I_1\times \cdots \times I_d} \Big |T\Big (f\mathbb {1}_{A^{i_1}_{1}\times \cdots \times A^{i_d}_{d}}\Big )(x)\Big | \end{aligned}$$
(2.5)

is also bounded from \({\text {L}}^p({\mathbb {Y}},{\mathcal {Y}},\nu )\) to \({\text {L}}^q({\mathbb {X}},{\mathcal {X}},\mu )\) with the operator norm satisfying

$$\begin{aligned} \Vert T_{\star }\Vert _{{\text {L}}^p({\mathbb {Y}})\rightarrow {\text {L}}^q({\mathbb {X}})} \leqslant \big (1-2^{1/q-1/p}\big )^{-d} \Vert T\Vert _{{\text {L}}^p({\mathbb {Y}})\rightarrow {\text {L}}^q({\mathbb {X}})}. \end{aligned}$$

Proof

This result is an immediate consequence of Lemma 6, obtained by taking

$$\begin{aligned} E_j(i) = {\mathbb {Y}}_1 \times \cdots \times {\mathbb {Y}}_{j-1}\times A^{i}_{j} \times {\mathbb {Y}}_{j+1}\times \cdots \times {\mathbb {Y}}_d. \end{aligned}$$

\(\square \)

The constants blow up as q approaches p. An easy modification of the proof of Lemma 6 gives the following endpoint result with logarithmic losses when the sets \(I_j\) are finite.

Corollary 8

Take an exponent \(p\in [1,\infty ]\) and a bounded linear operator \(T:{\text {L}}^p({\mathbb {Y}},{\mathcal {Y}},\nu ) \rightarrow {\text {L}}^p({\mathbb {X}},{\mathcal {X}},\mu )\). The maximal operator given by (2.5) satisfies

$$\begin{aligned} \Vert T_{\star }\Vert _{{\text {L}}^p({\mathbb {Y}})\rightarrow {\text {L}}^p({\mathbb {X}})} \leqslant (\lceil \log _2 |I_1|\rceil +1)\cdots (\lceil \log _2 |I_d|\rceil +1) \,\Vert T\Vert _{{\text {L}}^p({\mathbb {Y}})\rightarrow {\text {L}}^p({\mathbb {X}})}. \end{aligned}$$

Formulation of Corollary 8 is motivated by Tao’s [22, Note #2, Q14]. The particular case in which \(p=2\) and T is the Fourier transform is a multi-parameter version of the Rademacher–Menshov theorem. We will not need Corollary 8 in the later text and we formulated it only for comparison with a different method by Krause, Mirek, and Trojan [13, Section 3]. The authors of [13] established their two-parameter Rademacher–Menshov theorem by a certain greedy selection algorithm, which also leads to logarithmic losses.

3 Proof of Theorem 1

Denote by \({\mathcal {M}}\) the maximal operator

$$\begin{aligned} {\mathcal {M}}f:= \sup _{r_1,\ldots ,r_d\in (0,\infty )}\big |{\widehat{f}}*\chi _{r_1,\ldots ,r_d}\big |. \end{aligned}$$

In the proof of (1.5) we can assume that f is a Schwartz function, since afterwards one can simply use the density of \({\mathcal {S}}({\mathbb {R}}^d)\) in \({\text {L}}^p({\mathbb {R}}^d)\). We begin with the observation that \({\widehat{f}}*\chi _{r_1,\ldots ,r_d}\) is the Fourier transform of

Using (1.4) and the Fundamental Theorem of Calculus we expand, for any \((x_1,\ldots ,x_d)\in ({\mathbb {R}}\setminus \{0\})^d\) and \((r_1,\ldots ,r_d)\in (0,\infty )^d\),

Here \(Q(\epsilon )\) is the open coordinate “quadrant” determined by \(\epsilon =(\epsilon _1,\ldots ,\epsilon _d)\in \{-1,1\}^d\), i.e.,

$$\begin{aligned} Q(\epsilon ):= \{(x_1,\ldots ,x_d)\in {\mathbb {R}}^d :\, \mathop {{\text {sgn}}}x_j=\epsilon _j \text { for } 1\leqslant j\leqslant d \}, \end{aligned}$$

\(\#\epsilon \) denotes the number of 1’s among the coordinates of \(\epsilon \), and we also denote

$$\begin{aligned} R(\epsilon ;s_1,\ldots ,s_d):= Q(\epsilon )\cap ([-s_1,s_1]\times \cdots \times [-s_d,s_d]) \end{aligned}$$
(3.1)

for any \(s_1,\ldots ,s_d\in (0,\infty )\). Multiplying by f and taking Fourier transforms we obtain the pointwise identity

so that

Note that each of the sets (3.1) is a d-dimensional rectangle in \({\mathbb {R}}^d\), so invoking Corollary 7 with \(T={\mathcal {F}}\), which is assumed to satisfy (1.1), gives

$$\begin{aligned} \Big \Vert \sup _{r_1,\ldots ,r_d\in (0,\infty )\cap {\mathbb {Q}}}\big |{\mathcal {F}}\big (f\,\mathbb {1}_{R(\epsilon ;|t_1|/r_1,\ldots ,|t_d|/r_d)}\big )\big |\Big \Vert _{{\text {L}}^q(S,\sigma )} \lesssim _{d,\sigma ,p,q} \Vert f\Vert _{{\text {L}}^p({\mathbb {R}}^d)}. \end{aligned}$$

The last implicit constant is independent of \(t_1,\ldots ,t_d\), so integrability of , thanks to (1.4) again, establishes

$$\begin{aligned} \Vert {\mathcal {M}}f\Vert _{{\text {L}}^q(S,\sigma )} \lesssim _{d,\sigma ,\chi ,p,q} \Vert f\Vert _{{\text {L}}^p({\mathbb {R}}^d)}, \end{aligned}$$
(3.2)

which is precisely (1.5).

The proof of (1.6) is now standard. The claim is clear for \(f\in {\text {L}}^1({\mathbb {R}}^d)\). By

$$\begin{aligned} {\text {L}}^s({\mathbb {R}}^d) \subseteq {\text {L}}^1({\mathbb {R}}^d) + {\text {L}}^p({\mathbb {R}}^d) \end{aligned}$$

it is sufficient to verify it when \(f\in {\text {L}}^p({\mathbb {R}}^d)\). For any \(\varepsilon >0\) define the exceptional set

$$\begin{aligned} E_\varepsilon := \Big \{ \xi \in {\mathbb {R}}^d :\, \inf _{r\in (0,\infty )} \sup _{r_1,\ldots ,r_d\in (0,r]} \big | \big ({\widehat{f}}*\chi _{r_1,\ldots ,r_d}\big )(\xi ) - ({\mathcal {R}}f)(\xi ) \big | \geqslant \varepsilon \Big \}, \end{aligned}$$

and observe that (1.6) holds for every point outside of \(\cup _{\varepsilon \in (0,\infty )}E_\varepsilon \). It is easy to see that for every \(g\in {\mathcal {S}}({\mathbb {R}}^d)\) by the mere continuity of \({\widehat{g}}\) we have

$$\begin{aligned} \lim _{(0,\infty )^d\ni (r_1,\ldots ,r_d)\rightarrow (0,\ldots ,0)} {\widehat{g}}*\chi _{r_1,\ldots ,r_d} = {\mathcal {R}}g \end{aligned}$$

pointwise on S and, consequently,

$$\begin{aligned} E_\varepsilon \subseteq \Big \{\xi \in S :\, {\mathcal {M}}(f-g)(\xi )\geqslant \frac{\varepsilon }{2} \Big \} \cup \Big \{\xi \in S :\, {\mathcal {R}}(f-g)(\xi )\geqslant \frac{\varepsilon }{2} \Big \}. \end{aligned}$$

Thus, estimates (3.2), (1.1) and the Markov–Chebyshev inequality give

$$\begin{aligned} \sigma (E_\varepsilon ) \lesssim \varepsilon ^{-q} \Vert f-g\Vert _{{\text {L}}^p({\mathbb {R}}^d)}^q. \end{aligned}$$

By the density of \({\mathcal {S}}({\mathbb {R}}^d)\) in \({\text {L}}^p({\mathbb {R}}^d)\) we conclude \(\sigma (E_\varepsilon )=0\) and nestedness of these sets also gives \(\sigma (\cup _{\varepsilon \in (0,\infty )}E_\varepsilon )=0\). Thus, (1.6) really holds for \(\sigma \)-almost every \(\xi \in S\).

Turning to (1.7), we define the ellipsoid maximal function of the Fourier transform as

$$\begin{aligned} \big (\widetilde{{\mathcal {M}}}f\big )(\xi ):= \sup _{r_1,\ldots ,r_d\in (0,\infty )} \frac{1}{|B_{r_1,\ldots ,r_d}|} \int _{B_{r_1,\ldots ,r_d}(\xi )} \big | {\widehat{f}}(\eta ) \big | \,{\text {d}}\eta \end{aligned}$$

and repeat a trick from [14]. It is again sufficient to verify the claim in the endpoint case \(f\in {\text {L}}^{2p/(p+1)}({\mathbb {R}}^d)\). Define

$$\begin{aligned} g(x):= \int _{{\mathbb {R}}^d} f(y) \overline{f(y-x)} \,{\text {d}}y, \end{aligned}$$

so that \(g\in {\text {L}}^p({\mathbb {R}}^d)\) and \({\widehat{g}}(\xi ) = \big |{\widehat{f}}(\xi )\big |^2\). Choose any non-negative \(\chi \in {\mathcal {S}}({\mathbb {R}}^d)\) with integral 1 that is strictly positive on the closed unit ball \(B_1(0,\ldots ,0)\). Then, by the Cauchy–Schwartz inequality,

$$\begin{aligned} \frac{1}{|B_{r_1,\ldots ,r_d}|} \int _{B_{r_1,\ldots ,r_d}(\xi )} \big | {\widehat{f}}(\eta ) \big | \,{\text {d}}\eta&\leqslant \bigg ( \frac{1}{|B_{r_1,\ldots ,r_d}|} \int _{B_{r_1,\ldots ,r_d}(\xi )} \big | {\widehat{f}}(\eta ) \big |^2 \,{\text {d}}\eta \bigg )^{1/2} \\&\lesssim _{\chi } \big ({\widehat{g}}*\chi _{r_1,\ldots ,r_d}\big )(\xi )^{1/2}, \end{aligned}$$

so the bound (3.2) applied to g gives

$$\begin{aligned} \big \Vert \widetilde{{\mathcal {M}}}f\big \Vert _{{\text {L}}^{2q}(S,\sigma )} \lesssim _{\chi } \Vert {\mathcal {M}}g\Vert _{{\text {L}}^q(S,\sigma )}^{1/2} \lesssim _{d,\sigma ,\chi ,p,q} \Vert g\Vert _{{\text {L}}^p({\mathbb {R}}^d)}^{1/2} \leqslant \Vert f\Vert _{{\text {L}}^{2p/(p+1)}({\mathbb {R}}^d)}. \end{aligned}$$

Now we can repeat exactly the same density argument as before to conclude that (1.7) holds for \(\sigma \)-almost every \(\xi \in S\). Finally, (1.8) is an obvious consequence of (1.7) and the triangle inequality.

Remark 2

Note that (1.9) and (1.10) now also follow, only by observing that the maximal function \(\widetilde{{\mathcal {M}}}\) is pointwise comparable to the rectangular maximal function,

$$\begin{aligned} \big ({\mathcal {M}}_{{\text {rect}}}f\big )(\xi )&:= \sup _{r_1,\ldots ,r_d\in (0,\infty )} \frac{1}{2^d r_1\cdots r_d} \int _{\xi +[-r_1,r_1]\times \cdots \times [-r_d,r_d]} \big | {\widehat{f}}(\eta ) \big | \,{\text {d}}\eta \\&\,= \sup _{\begin{array}{c} {\mathcal {R}}\text { is an axes-parallel rectangle}\\ {\mathcal {R}}\ni \xi \end{array}} \frac{1}{|{\mathcal {R}}|} \int _{{\mathcal {R}}} \big | {\widehat{f}}(\eta ) \big | \,{\text {d}}\eta , \end{aligned}$$

so the latter one satisfies the same bound as before. In the other direction, Ramos [17, Proposition 4] showed that, in the case of spheres \(S={\mathbb {S}}^{d-1}\) in dimensions \(d\geqslant 4\), the operator \({\mathcal {M}}_{{\text {rect}}}\) does not satisfy estimates in the full conjectural range of (1.1).

4 Proof of Theorem 4

The maximal operator appearing in part (a) is simply \(T_\star \) from (2.5), where \({\mathbb {X}}={\mathbb {R}}^d\), \({\mathbb {Y}}_j={\mathbb {R}}\), \(T={\mathcal {F}}\), \(q=p'\), \(I_j=(0,\infty )\cap {\mathbb {Q}}\), and \(A_j^R=[-R,R]\). Note that we use \(p<2\) in the condition \(p<p'=q\), so that Corollary 7 applies and yields the desired estimate from the well-known fact that the Fourier transform \({\mathcal {F}}\) is bounded from \({\text {L}}^p({\mathbb {R}}^d)\) to \({\text {L}}^{p'}({\mathbb {R}}^d)\). The convergence result is then proved via exactly the same density argument as the one given in the previous section.

We turn to part (b), i.e., we construct a function in \({\text {L}}^2({\mathbb {R}}^d)\) for which the limit (1.11) does not exist. This will be an adaptation of Fefferman’s argument [7] to the continuous case. It is necessary for us to construct the counterexample explicitly, instead of just disproving \({\text {L}}^2({\mathbb {R}}^d)\rightarrow {\text {L}}^{2,\infty }({\mathbb {R}}^d)\) boundedness, because Stein’s Maximal Principle does not apply in the case of non-compact groups, such as \({\mathbb {R}}^d\).

We define \(D_R(t):=\sin (2\pi Rt)/\pi t\). The operator \(S_{R_1,\dots , R_d}\) is defined on \({\text {L}}^2({\mathbb {R}}^d)\) as

Here \(u_1\otimes \cdots \otimes u_d\) denotes the elementary tensor made of one-dimensional functions, defined as

$$\begin{aligned} (u_1\otimes \cdots \otimes u_d)(x_1,\ldots ,x_d):= u_1(x_1) \cdots u_d(x_d). \end{aligned}$$

Observe that Young’s convolution inequality implies

$$\begin{aligned} \Vert {S_{R_1,\dots , R_d}}\Vert _{{\text {L}}^2({\mathbb {R}}^d)\rightarrow {\text {L}}^{\infty }({\mathbb {R}}^d)}\lesssim (R_1\cdots R_d)^{1/2}. \end{aligned}$$

Following Fefferman’s example, we use the following definition throughout the remainder of this section. For \(\lambda \in {\mathbb {R}}\) we define

$$\begin{aligned}f_{\lambda }(x_1,x_2):=e^{2\pi {i} \lambda x_1x_2}\mathbb {1}_{[-2,2]^2}(x_1,x_2).\end{aligned}$$

The next lemma gives bounds that are crucial for the proof.

Lemma 9

  1. (a)

    There exists \(C>0\) such that for all \(x_1,x_2\in [2/3,1]\) the following holds:

    $$\begin{aligned}|{S_{\lambda x_2, \lambda x_1} f_{\lambda }(x_1,x_2)}| \geqslant C\log \lambda \end{aligned}$$

    whenever \(\lambda \) is large enough.

  2. (b)

    There exists \(C>0\) such that for all \(x_1,x_2\in [2/3,1]\) and \(\lambda '\geqslant 3 \lambda >0\) the following holds:

    $$\begin{aligned}|{S_{\lambda ' x_2, \lambda ' x_1} f_\lambda (x_1,x_2)}| \leqslant C.\end{aligned}$$

The reader should note the reverse order of subscripts in the previous lemma. Before proving the lemma, we prove that it implies part (b) of Theorem 4. We will prove that there exist a function \(f\in {\text {L}}^2({\mathbb {R}}^d)\) and a number \(\delta >0\) such that

$$\begin{aligned}{} & {} \limsup _{R_1\rightarrow \infty ,\dots , R_d\rightarrow \infty }|{S_{R_1,\dots , R_d}f(x_1,\dots , x_d)}| \nonumber \\{} & {} \quad =\infty \quad \text {for every } (x_1,\dots , x_d)\in \left[ 2/3, 1\right] ^2 \times [-\delta , \delta ]^{d-2}, \end{aligned}$$
(4.1)

so the function will be the one for which (1.12) holds.

Let \(\psi \in {\mathcal {S}}({\mathbb {R}})\) be a real-valued Schwartz function such that \(\psi (0)>0\) and . For the function \(F(x_1,\dots , x_d):= f(x_1,x_2)\prod _{j=3}^{d}\psi (x_j)\), because of the assumption on the support of , we have

$$\begin{aligned}{} & {} \limsup _{R_1\rightarrow \infty ,\dots , R_d\rightarrow \infty }|{S_{R_1,\dots , R_d} F(x_1,\dots , x_d)}|\\{} & {} \quad = \limsup _{R_1\rightarrow \infty , R_2\rightarrow \infty }\Big |{S_{R_1,R_2}f(x_1,x_2) \prod _{j=3}^{d}\psi (x_j)}\Big |. \end{aligned}$$

Furthermore, since \(\psi (0)>0\), there exists some \(\delta >0\) such that \(\psi (x)>0\) for all \(x\in [-\delta , \delta ]\), so it is enough to prove (4.1) for \(d=2\).

We define the very rapidly decreasing sequence of positive real numbers \((a_k)_{k=1}^{\infty }\) recursively as \(a_1=1\), \(a_{k+1}=2^{-k/a_k}\) and the sequence of positive real numbers \((\lambda _k)_{k=1}^{\infty }\) with \(\lambda _k=a_{k+1}^{-1}\). Observing that \(\sum _{k=1}^{\infty }a_k< \infty \), it follows that the function

$$\begin{aligned}f(x_1,x_2):= \sum _{k=0}^{\infty }a_k f_{\lambda _k}(x_1,x_2)\end{aligned}$$

is well-defined and in \({\text {L}}^2({\mathbb {R}}^2)\)

We claim that there exist real numbers \(C_i>0\), \(i=1,2,3\) such that the following inequalities hold for all \(x_1,x_2\in [2/3,1]\) and \(n\in {\mathbb {N}}\):

  1. (1)

    \(|{S_{\lambda _n x_2, \lambda _n x_1}f_{\lambda _n}(x_1,x_2)}|\geqslant C_1 \log \lambda _n\),

  2. (2)

    \(|{S_{\lambda _n x_2, \lambda _n x_1}f_{\lambda _k}(x_1,x_2)}|\leqslant C_2\) when \(k<n\),

  3. (3)

    \(|{S_{\lambda _n x_2, \lambda _n x_1}f_{\lambda _k}(x_1,x_2)}|\leqslant C_3\lambda _n\) when \(k>n\).

Indeed, since \(\lambda _{k+1}\geqslant 4\lambda _k\) for all \(k\in {\mathbb {N}}\), the first two inequalities follow from Lemma 9, while the third one follows from Young’s convolution inequality. Therefore, observing that the sequences satisfy \(\lambda _{n}\sum _{k>n}a_k\lesssim 1\) for all \(n\in {\mathbb {N}}\) and \(a_n\log \lambda _n \sim n\), for \(x_1,x_2\in [2/3,1]\) and n large enough it follows that

$$\begin{aligned} \begin{aligned} |{S_{\lambda _n x_2, \lambda _n x_1}f(x_1,x_2)}|&\geqslant a_n|{S_{\lambda _n x_2, \lambda _n x_1}f_{\lambda _n}(x_1,x_2)}| - \sum _{k\ne n} a_k|{S_{\lambda _n x_2, \lambda _n x_1}f_{\lambda _k}(x_1,x_2)}|\\&\geqslant C_1 a_n\log \lambda _n - C_2\sum _{k<n}a_k - C_3\lambda _n\sum _{k>n}a_k > rsim n. \end{aligned} \end{aligned}$$

Finally, noting that \(\lambda _nx_2,\lambda _nx_1\rightarrow \infty \) as \(n\rightarrow \infty \) finishes the proof of (4.1) in the case \(d=2\) and therefore also part (b) of the theorem.

The following technical lemma will be needed in the proof of Lemma 9.

Lemma 10

  1. (a)

    There exist \(C,\lambda _0>0\) such that

    $$\begin{aligned}\bigg |{{\text {p.v.}}\int _{-1}^{1}\int _{-1}^{1} \frac{e^{2\pi {i} \lambda x_1x_2}}{x_1x_2} \,{\text {d}}x_1 \,{\text {d}}x_2}\bigg | \geqslant C\log \lambda \quad \text {for every } \lambda \geqslant \lambda _0.\end{aligned}$$
  2. (b)

    There exists \(C>0\) such that for all \(c_1,c_2\in {\mathbb {R}}\) for which \(\max \{|{c_1}|,|{c_2}|\} \geqslant 4/3\), the following holds:

    $$\begin{aligned}\bigg |{{\text {p.v.}}\int _{-1}^{1}\int _{-1}^{1} \frac{e^{2\pi {i}\lambda (x_1x_2 + c_1x_1+ c_2 x_2)}}{x_1x_2} \,{\text {d}}x_1 \,{\text {d}}x_2 }\bigg | \leqslant C \quad \text {for every } \lambda >0.\end{aligned}$$

Proof

(a) This was proved in [15], but we repeat the short proof for the completeness. Since \(\int _{0}^{\infty }\sin t \,{\text {d}}t /t= \pi /2\), there exists \(\lambda _1>0\) such that \(\int _{0}^{x}\sin t \,{\text {d}}t /t \in \left[ \pi /4, 3\pi /4\right] \), for all \(x\geqslant \lambda _1\). Now, using symmetries of the integrand and a change of variables, it follows that

$$\begin{aligned} \begin{aligned} {\text {p.v.}}\int _{-1}^{1}\int _{-1}^{1} \frac{e^{2\pi {i}\lambda x_1x_2}}{x_1x_2} \,{\text {d}}x_1\,{\text {d}}x_2&= 4{i}\int _{0}^{1}\int _{0}^{1} \frac{\sin (2\pi \lambda x_1x_2)}{x_1x_2} \,{\text {d}}x_1 \,{\text {d}}x_2\\&= 4{i}\int _{0}^{2\pi }\frac{1}{x_2}\int _{0}^{\lambda x_2}\frac{\sin t}{t} \,{\text {d}}t\,{\text {d}}x_2 \\&= 4{i}\int _{0}^{{\lambda _1}/{\lambda }}\frac{1}{x_2} \int _{0}^{\lambda x_2}\frac{\sin t}{t} \,{\text {d}}t \,{\text {d}}x_2\\&\quad + 4{i}\int _{{\lambda _1}/{\lambda }}^{2\pi } \int _{0}^{\lambda x_2}\frac{\sin t}{t} \,{\text {d}}t \,\frac{{\text {d}}x_2}{x_2}. \end{aligned} \end{aligned}$$

For the first integral observe that \(t\mapsto (\sin t)/t\) is absolutely bounded by 1, so the integral is absolutely bounded by \(\lambda _1\). For the second integral we use fact that \(\lambda x_2\geqslant \lambda _1\) so:

$$\begin{aligned} \int _{{\lambda _1}/{\lambda }}^{2\pi } \int _{0}^{\lambda x_2}\frac{\sin t}{t} \,{\text {d}}t \,\frac{{\text {d}}x_2}{x_2} > rsim \int _{{\lambda _1}/{\lambda }}^{2\pi } \frac{{\text {d}}x_2}{x_2} = \log \lambda +\log (2\pi ) - \log \lambda _1. \end{aligned}$$

Finally, adding the two integrals and choosing \(\lambda _0\) large enough compared to \(\lambda _1\), the statement holds.

(b) Assume, without loss of generality, that \(c_1\geqslant 4/3\). Using symmetries of the integrand, it follows that

$$\begin{aligned} \begin{aligned}&{\text {p.v.}}\int _{-1}^{1}\int _{-1}^{1} \frac{e^{2\pi \lambda {i}(x_1x_2+c_1x_1+c_2x_2)}}{x_1x_2} \,{\text {d}}x_1 {\text {d}}x_2 \\&\quad = 2{i} {\text {p.v.}}\int _{-1}^{1}\int _{0}^{1} \frac{\sin (2\pi \lambda x_1(x_2+c_1))}{x_1x_2} e^{2\pi {i}\lambda c_2 x_2}\,{\text {d}}x_1{\text {d}}x_2. \end{aligned} \end{aligned}$$

If we define

$$\begin{aligned}g_{\varepsilon }(x_2):= 2\int _{\varepsilon }^{1} \frac{\sin (2\pi \lambda x_1( x_2+c_1))}{x_1} \,{\text {d}}x_1,\end{aligned}$$

from the assumption \(c_1\geqslant 4/3\), it follows that \(|{g_\varepsilon '(x_2)}| \lesssim (x_2+c_1)^{-1}\lesssim 1\) for all \(x_2\in [-1,1]\), where the implicit constant is independent of both \(\lambda \) and \(\varepsilon \). Therefore,

$$\begin{aligned} \begin{aligned}&\bigg |{\int _{([-1,1]\setminus [-\varepsilon ,\varepsilon ])^2} \frac{\sin (2\pi \lambda x_1(x_2+c_1))}{x_1} \frac{e^{2\pi {i}\lambda c_2x_2}}{x_2}\,{\text {d}}x_1{\text {d}}x_2}\bigg | \\&\quad = \bigg |{\int _{[-1,1]\setminus [-\varepsilon ,\varepsilon ]} g_{\varepsilon }(x_2)\frac{e^{2\pi {i}\lambda c_2x_2}}{x_2}\,{\text {d}}x_2}\bigg |\\&\quad \leqslant \bigg |{\int _{[-1,1]\setminus [-\varepsilon ,\varepsilon ]} \frac{g_{\varepsilon }(x_2)-g_{\varepsilon }(0)}{x_2} e^{2\pi {i}\lambda c_2x_2}\,{\text {d}}x_2}\bigg | \\&\qquad + \bigg |{g_{\varepsilon }(0)\int _{[-1,1]\setminus [-\varepsilon ,\varepsilon ]} \frac{e^{2\pi {i}\lambda c_2x_2}}{x_2}\,{\text {d}}x_2}\bigg |\\&\quad \lesssim \int _{-1}^{1} \Big |{\sup _{t\in [-1,1]} g'(t)}\Big | \,{\text {d}}x_2 + \sup _{N>0}\bigg |{\int _{0}^{N} \frac{\sin t}{t} dt }\bigg |^2 \lesssim 1. \end{aligned} \end{aligned}$$

Letting \(\varepsilon \rightarrow 0\), the statement follows. \(\square \)

We proceed to the proof of Lemma 9.

Proof of Lemma 9

Observe that:

$$\begin{aligned} S_{R_1,R_2}f_{\lambda } = T_{R_1,R_2}f_{\lambda }-T_{-R_1,R_2}f_{\lambda }-T_{R_1,-R_2}f_{\lambda }+T_{-R_1,-R_2}f_{\lambda }, \end{aligned}$$
(4.2)

where

$$\begin{aligned} T_{r_1,r_2}f(x_1,x_2) = -\frac{1}{4\pi ^2} {\text {p.v.}}\int _{{\mathbb {R}}^2}\frac{e^{2\pi {i}(r_1x_1'+r_2x_2')}}{x_1'x_2'}f(x_1-x_1',x_2-x_2') \,{\text {d}}x_1'{\text {d}}x_2'. \end{aligned}$$

We prove the following two observations for part (a) of the lemma.

  1. (1)

    There exists \(C>0\) such that for \(\lambda \) large enough and \(x_1,x_2\in [0,1]\):

    $$\begin{aligned}|{T_{\lambda x_2,\lambda x_1}f_{\lambda }(x_1,x_2)}|\geqslant C\log \lambda .\end{aligned}$$
  2. (2)

    For \(\lambda >0\) and \(x_1,x_2 \in [2/3,1]\), all of the expressions

    $$\begin{aligned} |{T_{-\lambda x_2,\lambda x_1}f_{\lambda }(x_1,x_2)}|, \quad |{T_{\lambda x_2,-\lambda x_1}f_{\lambda }(x_1,x_2)}|, \quad |{T_{-\lambda x_2,-\lambda x_1}f_{\lambda }(x_1,x_2)}| \end{aligned}$$

    are bounded by a constant independent of \(\lambda \).

In order to prove the observation (1), we note that because

$$\begin{aligned} x_2x_1'+x_1x_2' + (x_1-x_1')(x_2-x_2')=x_1x_2+x_1'x_2', \end{aligned}$$

the following holds

$$\begin{aligned}|{T_{\lambda x_2,\lambda x_1}f_{\lambda }(x_1,x_2)}| = \frac{1}{4\pi ^2} \bigg |{{\text {p.v.}} \int _{[x_1-2,x_1+2]}\int _{[x_2-2,x_2+2]} \frac{e^{2\pi {i} \lambda x_1'x_2'}}{x_1'x_2'} \,{\text {d}}x_2'{\text {d}}x_1'}\bigg |. \end{aligned}$$

We decompose \({\mathbb {R}}^2\) into 4 regions:

$$\begin{aligned} {[}-1,1]^2,\quad [-1,1]\times ({\mathbb {R}}\setminus [-1,1]),\quad ({\mathbb {R}}\setminus [-1,1])\times [-1,1],\quad ({\mathbb {R}}\setminus [-1,1])^2. \end{aligned}$$

By the first part of Lemma 10, there exists \(C>0\) such that the integral over the first region is at least \(C\log \lambda \) whenever \(\lambda \) is large enough. Integrals over the second and third regions are all O(1) because of the following calculation:

$$\begin{aligned} \bigg |{\int _{1}^{x_2+2}\int _{-1}^{1} \frac{\sin (2\pi \lambda x_1'x_2')}{x_1'x_2'} \,{\text {d}}x_1'{\text {d}}x_2'}\bigg |= & {} \int _{1}^{x_2+2} \frac{1}{x_2'} \bigg |{\int _{-2\pi \lambda x_2'}^{2\pi \lambda x_2'} \frac{\sin t}{t} \,{\text {d}}t}\bigg | \,{\text {d}}x_2' \\\lesssim & {} \int _{1}^{3} \frac{1}{x_2'} \,{\text {d}}x_2' \lesssim 1. \end{aligned}$$

Finally, the integral over the last region is bounded using the triangle inequality by:

$$\begin{aligned} \int _{x_1-2}^{x_1+2}\int _{x_2-2}^{x_2+2} \frac{1}{x_1'x_2'}\mathbb {1}_{\{|x_1'|, |x_2'|>1\}}\,{\text {d}}x_1'{\text {d}}x_2' \lesssim 1. \end{aligned}$$

Summing all the bounds proves observation (1).

We turn to the proof of observation (2). First note that for \(\epsilon _1,\epsilon _2\in \{-1,1\}\),

$$\begin{aligned}&|{T_{\epsilon _1\lambda x_2,\epsilon _2\lambda x_1}f_{\lambda }(x_1,x_2)}|\\&\quad = \frac{1}{4\pi ^2} \bigg |{{\text {p.v.}} \int _{[x_1-2,x_1+2]}\int _{[x_2-2,x_2+2]} \frac{e^{2\pi {i}\lambda (x_1'x_2' + (\epsilon _1-1)x_1'x_2 + (\epsilon _2-1)x_2'x_1})}{x_1'x_2'} \,{\text {d}}x_2'{\text {d}}x_1'}\bigg |. \end{aligned}$$

Assume, without loss of generality that \(\epsilon _1=-1\). From the assumption on \(x_2\) it follows that \(|{(\epsilon _1-1)x_2}| \geqslant {4}/{3}\), so using the second part of Lemma 10, the integral over the first region (where the integral is being decomposed into the same regions as before) is bounded by a constant. The integral over the fourth region is bounded as in the previous observation. Integrals over the second and the third region can be bounded using the following calculation

$$\begin{aligned} \bigg |{\int _{1}^{x_2+2}\int _{-1}^{1} \frac{\sin (2\pi \lambda x_1'(x_2'+(\epsilon _1 - 1)x_2))}{x_1'}\,{\text {d}}x_1' \frac{e^{2\pi {i}\lambda (\epsilon _2-1)x_2'x_1}}{x_2'}\,{\text {d}}x_2'}\bigg |\lesssim \int _{1}^{3}\frac{{\text {d}}x_2'}{x_2'}\lesssim 1. \end{aligned}$$

Combining observations (1) and (2) with (4.2), we conclude the proof of part (a) of the lemma.

For part (b), we observe that for \(\epsilon _1,\epsilon _2\in \{-1,1\}\) the following holds:

$$\begin{aligned}&|{T_{\epsilon _1\lambda 'x_2,\epsilon _2\lambda 'x_1}f_{\lambda }(x_1,x_2)}| \\&\quad = \frac{1}{4\pi ^2}\bigg |{{\text {p.v.}}\int _{[x_1-2,x_1+2]}\int _{[x_2-2,x_2+2]} \frac{e^{2\pi {i}\lambda ( x_1'x_2' + ({\epsilon _1\lambda '}/{\lambda }-1)x_1'x_2+ ({\epsilon _2\lambda '}/{\lambda }-1)x_2'x_1)}}{x_1'x_2'} \,{\text {d}} x_2'{\text {d}}x_1'}\bigg |. \end{aligned}$$

We then decompose the area of integration in the same four regions as before. For the first region, since \(|{(\epsilon _1\lambda '/\lambda -1)x_2}|\geqslant 4/3\), we use the second part of Lemma 10 to get the upper bound and we treat the other regions as in part (a) of the lemma. \(\square \)

Remark 3

It is obvious that the function f in the proof of part (b) is in \({\text {L}}^1({\mathbb {R}}^d)\), so the function , for which the convergence (1.11) fails, is also continuous and therefore the counterexample belongs to the class \(C({\mathbb {R}}^d)\cap {\text {L}}^2({\mathbb {R}}^d)\).

5 Proof of Corollary 5

Let

$$\begin{aligned} S:= \Big \{\Big (\xi , \frac{\phi (\xi )}{2\pi }\Big ): \xi \in {\mathbb {R}}^n\Big \} \subseteq {\mathbb {R}}^{n+1} \end{aligned}$$

be the hypersurface naturally associated with (1.13). Equip S with the measure \({\text {d}}\sigma (\xi ,\tau )={\text {d}}\xi \). For every \(g\in {\text {L}}^2(S,\sigma )\) there exists a unique \(f\in {\text {L}}^2({\mathbb {R}}^n)\) such that

$$\begin{aligned} g\Big (\xi , \frac{\phi (\xi )}{2\pi }\Big ) = {\widehat{f}}(\xi ) \end{aligned}$$
(5.1)

for a.e. \(\xi \in {\mathbb {R}}^n\). By the assumption (1.14) and Plancherel’s identity we then know that \({\mathcal {E}}\) given by the formula

$$\begin{aligned} ({\mathcal {E}}g)(x,t):= (e^{{i} t \phi (D)} f)(x) \end{aligned}$$

extends to a bounded linear operator \({\mathcal {E}}:{\text {L}}^2(S,\sigma )\rightarrow {\text {L}}^s({\mathbb {R}}^{n+1})\). When \(f\in {\mathcal {S}}({\mathbb {R}}^{n})\), we can write

$$\begin{aligned} ({\mathcal {E}}g)(x,t) = \int _{{\mathbb {R}}^n} e^{{i}t \phi (\xi ) + 2\pi {i}x\cdot \xi } g\Big (\xi , \frac{\phi (\xi )}{2\pi }\Big ) \,{\text {d}}\xi = \int _S e^{2\pi {i}(x,t)\cdot (\xi ,\tau )} g(\xi ,\tau ) \,{\text {d}}\sigma (\xi ,\tau ) \end{aligned}$$

and, taking another Schwartz function \(h\in {\mathcal {S}}({\mathbb {R}}^{n+1})\),

$$\begin{aligned} \int _{{\mathbb {R}}^{n+1}} h(x,t) \,\overline{({\mathcal {E}}g)(x,t)} \,{\text {d}}x \,{\text {d}}t = \int _{S} {\widehat{h}}(\xi ,\tau ) \,\overline{g(\xi ,\tau )} \,{\text {d}}\sigma (\xi ,\tau ). \end{aligned}$$

By duality we now see that the a priori restriction estimate (1.1) holds with \(d=n+1\), \(p=s'\), \(q=2\). In fact, the so-called Fourier extension operator \({\mathcal {E}}\) is precisely the adjoint of the Fourier restriction operator \({\mathcal {R}}:{\text {L}}^{s'}({\mathbb {R}}^{n+1})\rightarrow {\text {L}}^2(S,\sigma )\).

Note that \(p=s'<2=q\). Now Theorem 1 applies, so that the maximal estimate (1.5) gives

$$\begin{aligned} \Big \Vert \sup _{r_1,\ldots ,r_{n+1}\in (0,\infty )}\big |{\widehat{h}}*\chi _{r_1,\ldots ,r_{n+1}}\big |\Big \Vert _{{\text {L}}^2(S,\sigma )} \lesssim _{n,\phi ,\chi ,s} \Vert h\Vert _{{\text {L}}^{s'}({\mathbb {R}}^{n+1})} \end{aligned}$$
(5.2)

for any given Schwartz function \(\chi \in {\mathcal {S}}({\mathbb {R}}^{n+1})\). If we extend the definition of dilates as

$$\begin{aligned} \chi _{r_1,\ldots ,r_d}(x_1,\ldots ,x_d):= \frac{1}{|r_1\cdots r_d|} \chi \Big (\frac{x_1}{r_1},\ldots ,\frac{x_d}{r_d}\Big ) \end{aligned}$$

for \(r_1,\ldots ,r_d\in {\mathbb {R}}\setminus \{0\}\), then (5.2) implies

$$\begin{aligned} \Big \Vert \sup _{r_1,\ldots ,r_{n+1}\in {\mathbb {R}}\setminus \{0\}}\big |{\widehat{h}}*\chi _{r_1,\ldots ,r_{n+1}}\big |\Big \Vert _{{\text {L}}^2(S,\sigma )} \lesssim _{n,\phi ,\chi ,s} \Vert h\Vert _{{\text {L}}^{s'}({\mathbb {R}}^{n+1})}, \end{aligned}$$
(5.3)

by considering \(2^{n+1}\) quadrants of \({\mathbb {R}}^{n+1}\), flipping \(\chi \) as necessary, and increasing the implicit constant by the factor \(2^{n+1}\). Linearizing and dualizing (5.3) we obtain

$$\begin{aligned} \bigg | \int _S \big (\,{\widehat{h}}*\chi _{r_1(\xi ),\ldots ,r_{n+1}(\xi )}\big )(\xi ,\tau ) \,\overline{g(\xi ,\tau )} \,{\text {d}}\sigma (\xi ,\tau ) \bigg | \lesssim _{n,\phi ,\chi ,s} \Vert g\Vert _{{\text {L}}^2(S,\sigma )} \Vert h\Vert _{{\text {L}}^{s'}({\mathbb {R}}^{n+1})} \end{aligned}$$

for any choice of measurable functions \(r_1,\ldots ,r_{n+1}:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\setminus \{0\}\). If we further substitute (5.1) and choose \(\chi \) such that , then we can rewrite the last bilinear estimate as

$$\begin{aligned} \bigg | \int _{{\mathbb {R}}^{n+1}} h(x,t) \,\overline{(T_{\psi ,r_1,\ldots ,r_{n+1}}f)(x,t)} \,{\text {d}}x \,{\text {d}}t \bigg | \lesssim _{n,\phi ,\psi ,s} \Vert f\Vert _{{\text {L}}^2({\mathbb {R}}^n)} \Vert h\Vert _{{\text {L}}^{s'}({\mathbb {R}}^{n+1})}, \end{aligned}$$

which is just the dualized formulation of the desired bound (1.15). The case of general measurable functions \(r_1,\ldots ,r_{n+1}:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) now easily follows in the limit, by approximating pointwise each \(r_j\) with a real measurable function with no zeros.