1 Introduction

Let us consider \(\mathbb {R}^n\) equipped with the dilations given by

$$\begin{aligned} \delta _{\lambda }(x) = \delta ^\alpha _\lambda (x)= (\lambda ^{\alpha _1} x_1, \dots , \lambda ^{\alpha _n} x_n), \end{aligned}$$
(1.1)

where \(\alpha =(\alpha _1,\dots ,\alpha _n)\) with \(\alpha _i\ge 1\) for \(i=1,\dots ,n\). We write \(|\alpha |=\sum _{i=1}^n \alpha _i\) and fix

$$\begin{aligned} \rho _\alpha ({x}) = \max \left\{ |x_i|^{\frac{1}{\alpha _i}}\,:\,i=1,\dots ,n\right\} . \end{aligned}$$

For an integer \(\nu \ge 0\), we say that a function m on \(\mathbb {R}^n\) is in the class \(\mathscr {M}_\alpha ^\nu \) if

  1. (a)

    m is bounded and contained in \(C^{\nu }(\mathbb {R}^n\setminus \{0\})\), and

  2. (b)

    \(m(\delta _{\lambda }(\xi ))=m(\xi )\) for all \(\xi \not =0\) and \(\lambda >0\).

Let us denote

$$\begin{aligned} \Vert m\Vert _{\mathscr {M}_\alpha ^\nu } = \sup _{|\beta | \le \nu } \sup _{\rho _\alpha ({\xi })=1} |\partial ^{\beta } m(\xi )|. \end{aligned}$$

Define the Carleson operator associated with the multiplier m as

$$\begin{aligned} \mathscr {C}_m f(x) = \sup _{N\in \mathbb {R}^n} \left| \int _{\mathbb {R}^n} \widehat{f}(\xi ) e^{ix\xi } m(\xi -N) d\xi \right| . \end{aligned}$$

Replacing \(\alpha \) by \(c \alpha \) for some scalar c does not modify the classes \(\mathscr {M}_\alpha ^\nu \). Thus we make the assumption \(\alpha _1=1\) for normalization. For technical reasons we also assume that the \(\alpha _i\) are positive integers.

Then we can state our main result as follows.

Theorem 1.1

There exists a positive integer \(\nu _0\) depending only on \(\alpha \) and a constant \(C\in (0,\infty )\) depending only on \(\alpha \) such that for all \(m\in \mathscr {M}_\alpha ^{\nu _0}\) we have

$$\begin{aligned} \Vert \mathscr {C}_m f\Vert _{2,\infty } \le C \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} \Vert f\Vert _2. \end{aligned}$$
(1.2)

The proof of this theorem is based on the time-frequency techniques of Lacey and Thiele [9]. In the one-dimensional case \(n=1, \alpha =1\) we recover the weak (2, 2) bound for Carleson’s operator, which immediately implies Carleson’s theorem on almost everywhere convergence of Fourier series [1] (up to a standard transference argument, see [7]). In the isotropic case \(\alpha =(1,\dots ,1)\) the theorem follows from a result of Sjölin [13]. Pramanik and Terwilleger [12] study weak (2, 2) bounds for the isotropic case in \(\mathbb {R}^n\) using the method of Lacey and Thiele. This is extended to strong (pp) for \(1<p<\infty \) by Grafakos, Tao and Terwilleger in [5]. We speculate that Theorem 1.1 could also be extended to strong (pp) for \(1<p<\infty \) using the methods from [5]. However we don’t pursue this here to keep the exposition simple.

Our present proof shows that (1.2) holds for all \(\nu _0\ge 3|\alpha |+2\). It would be interesting to know what the minimum regularity requirement is for (1.2) to hold. The same question is also open in the isotropic case. It seems plausible that (1.2) should at least hold for all \(\nu _0\ge |\alpha |+1\), because curiously, the only place in the current proof that requires more than that is a tail estimate in the single tree lemma (see (7.5)). All the main terms can be bounded using only \(\nu _0\ge |\alpha |+1\).

The method of Lacey and Thiele involves several ingredients. The first step is a reduction to a discrete dyadic model operator that involves summation over certain regions in phase space which are called tiles. This is detailed in Sect. 3. In this step we encounter a complication which is caused by the absence of rotation invariance in the anisotropic case. We resolve this using an anisotropic cone decomposition (see Lemma 3.3). The next step is a certain procedure of combinatorial nature the purpose of which is to organize the tiles into certain collections (which are called trees) each of which is associated with a component of the operator that behaves more like a classical singular integral operator (see Lemma 4.4). The combinatorial part of the argument requires only little modification compared to the original procedure in [9] (see Sects. 46). The core component and most difficult part of the proof is the single tree estimate (see Sect. 7). Due to an extra dependence on the linearizing function, the estimate is more technical than the corresponding estimate in [9]. It is complicated further by the presence of anisotropic dilations.

In Sect. 2 we discuss a related open problem on Carleson operators along monomial curves, which served as a main motivation for this work. We demonstrate how Theorem 1.1 can be applied to a certain family of rougher multipliers that can be seen as a toy model for the Carleson operators along monomial curves. We also discuss the particular case of the parabolic Carleson operator (2.4), which exhibits some additional symmetries and is related to Lie’s quadratic Carleson operator [10]. Some partial progress on Carleson operators along monomial curves was obtained in [6]. Also see the related work [11] on Carleson operators along paraboloids in dimensions \(n\ge 3\).

2 Carleson Operators Along Monomial Curves

In this section we set \(\alpha =(1,d)\) for a positive integer \(d\ge 2\). Let us consider the multiplier of the Hilbert transform along the curve \((t,t^d)\) in the form

$$\begin{aligned} m_{d}(\xi ,\eta ) = p.v.\int _\mathbb {R}e^{i\xi t-i\eta t^d} \frac{dt}{t},\,\,(\xi ,\eta )\in \mathbb {R}^2. \end{aligned}$$
(2.1)

It is currently an open problem to decide whether \(\mathscr {C}_{m_d}\) satisfies any \(L^p\) bounds. The multiplier \(m_d\) satisfies the anisotropic dilation symmetry

$$\begin{aligned} m_d(\lambda \xi , \lambda ^d \eta ) = m_d(\xi ,\eta ) \end{aligned}$$

for \(\lambda >0\) and \((\xi ,\eta )\not =0\). However, Theorem 1.1 does not apply because \(m_d\) is too rough to be in the class \(\mathscr {M}_\alpha ^{\nu }\) for any positive integer \(\nu \).

Next we discuss a family of toy model operators. In the following discussion we focus on the intersection of the quadrant \(\{\xi \ge 0, \eta \ge 0\}\) with the region \(\eta ^{\frac{1}{d}} \le 2\xi \). The other quadrants can be treated similarly (though depending on the parity of d the phase might not have a critical point in each quadrant; this is an inconsequential subtlety that we will ignore). Our restriction to the region \(\eta ^{\frac{1}{d}} \le 2\xi \) is natural because stationary phase considerations show that \(m_d\) is smooth away from the axis \(\eta =0\). For \(\xi >0\) and \(\eta >0\) we define

$$\begin{aligned} m_{d,1}(\xi ,\eta ) =\left( \eta ^{\frac{1}{d}}\xi ^{-1}\right) ^{\frac{d'}{2}} e^{i \left( \eta ^{\frac{1}{d}}\xi ^{-1}\right) ^{-d'}} \psi (\eta ^{\frac{1}{d}}\xi ^{-1}), \end{aligned}$$
(2.2)

where \(\frac{1}{d}+\frac{1}{d'}=1\) and \(\psi \) is a smooth cutoff function supported in \([-2, 2]\) and equal to one on a slightly smaller interval. We extend \(m_{d,1}\) continuously by setting it equal to zero on the remainder of \(\mathbb {R}^2\).

From a standard computation using the stationary phase principle (see [14, Ch. VIII.1, Prop. 3]) we can see that, up to a negligible constant rescaling, this term constitutes the main contribution to the oscillatory integral in (2.1). The remainder term from stationary phase is smoother in the variables \(\xi \) and \(\eta \) and is therefore simpler to handle. We will ignore it for the purpose of this discussion.

From the definition we see directly that \(m_{d,1}\) (and therefore also \(m_d\)) is only Hölder continuous of class \(C^{\frac{1}{2(d-1)}}\) along the axis \(\eta =0\) while it is infinitely differentiable away from that axis.

Let us from here on denote

$$\begin{aligned} \zeta =\zeta (\xi ,\eta )=\eta ^{\frac{1}{d}}\xi ^{-1} \end{aligned}$$

for \(\xi>0, \eta >0\). Since we do not know how to handle \(m_{d,1}\) we introduce a family of modified, less oscillatory multipliers which is defined on the quadrant \(\{\xi>0, \eta >0\}\) by

$$\begin{aligned} m_{d,\delta }(\xi ,\eta ) =\zeta ^{\frac{d'}{2}} e^{i \zeta ^{-d'\delta }} \psi (\zeta ), \end{aligned}$$
(2.3)

where \(0\le \delta \le 1\) is a parameter (and we again extend \(m_{d,\delta }\) to the rest of \(\mathbb {R}^2\) by zero). The multiplier \(m_{d,\delta }\) still fails to be in \(\mathscr {M}_\alpha ^{\nu }\) for every positive integer \(\nu \) and \(\delta >0\). However, we can nevertheless apply Theorem 1.1 to bound \(\mathscr {C}_{m_{d,\delta }}\) for small enough \(\delta \). For this purpose we assume that \(\psi \) takes the form

$$\begin{aligned} \psi = \sum _{j\le 0} \varphi _j, \end{aligned}$$

where \(\varphi _j(x)=\varphi (2^{-j} x)\) and \(\varphi \) is a smooth bump function supported in [1 / 2, 2] and satisfying \(\sum _{j\in \mathbb {Z}} \varphi _j(x) = 1\) for all \(x\not =0\). Then we have the following consequence of Theorem 1.1.

Corollary 2.1

There exists \(\delta _0>0\) such that for all \(0\le \delta < \delta _0\) we have

$$\begin{aligned} \Vert \mathscr {C}_{m_{d,\delta }} f\Vert _{2,\infty } \lesssim \Vert f\Vert _2. \end{aligned}$$

Proof

Let us write

$$\begin{aligned} m_{d,\delta ,j}(\xi ,\eta )= & {} \zeta ^{\frac{d'}{2}} e^{i \zeta ^{-d'\delta }} \varphi _j(\zeta )\quad \text {(for}\;\xi>0, \eta >0),\\ T_j f(x,y)= & {} \int _{\mathbb {R}^2} \widehat{f}(\xi ,\eta ) m_{d,\delta ,j}(\xi ,\eta ) e^{ix\xi +iy\eta } d(\xi ,\eta ). \end{aligned}$$

By the change of variables \(\eta \mapsto 2^{jd} \eta \), we see that

$$\begin{aligned} T_j f(x,y)= & {} 2^{j\frac{d'}{2}} 2^{jd} \int _{\mathbb {R}^2} \widehat{f}(\xi ,2^{jd}\eta ) \widetilde{m}_{d,\delta ,j}(\xi ,\eta ) e^{ix\xi +i2^{jd}y\eta } d(\xi ,\eta ) \\= & {} 2^{j\frac{d'}{2}} \mathrm {D}_{2^{jd}} \widetilde{T}_j \mathrm {D}_{2^{-jd}} f(x,y), \end{aligned}$$

where \(\mathrm {D}_\lambda f(x,y) = f(x,\lambda y)\) and

$$\begin{aligned} \widetilde{m}_{d,\delta ,j} (\xi ,\eta )= & {} \zeta ^{\frac{d'}{2}} e^{i 2^{-jd'\delta } \zeta ^{-d'\delta }} \varphi (\zeta )\quad \text { (for }\xi>0, \eta >0),\\ \widetilde{T}_j f(x,y)= & {} \int _{\mathbb {R}^2} \widehat{f}(\xi ,\eta ) \widetilde{m}_{d,\delta ,j}(\xi ,\eta ) e^{ix\xi +iy\eta } d(\xi ,\eta ). \end{aligned}$$

We have

$$\begin{aligned} \Vert \widetilde{m}_{d,\delta ,j}\Vert _{\mathscr {M}_\alpha ^\nu } \lesssim 2^{-jd'\delta \nu } \end{aligned}$$

for every integer \(\nu \ge 0\) (where the implied constant depends on \(\nu \), d, \(\delta \) and \(\varphi \)). Using Theorem 1.1 we therefore obtain

$$\begin{aligned} \Vert \mathscr {C}_{m_{d,\delta }}\Vert _{L^2\rightarrow L^{2,\infty }} \lesssim \sum _{j\le 0} 2^{j\frac{d'}{2}} \Vert \mathscr {C}_{\widetilde{m}_{d,\delta ,j}} \Vert _{L^2\rightarrow L^{2,\infty }} \lesssim \sum _{j\le 0} 2^{jd'(\frac{1}{2}-\delta \nu _0)}. \end{aligned}$$

Thus, setting \(\delta _0 = \frac{1}{2} \nu _0^{-1}\) yields the claim. \(\square \)

An improvement for the bound on \(\nu _0\) in Theorem 1.1 will give an improvement for \(\delta _0\). However, even if we could show Theorem 1.1 for all \(\nu _0>|\alpha |/2\), we would still have \(\delta _0<\frac{1}{|\alpha |}<1\). Therefore additional insight is likely to be required to bound the operator \(\mathscr {C}_{m_{d,1}}\) (and thus \(\mathscr {C}_{m_d}\)).

For the case of the parabola, \(d=2\), there are some additional obstructions in bounding \(\mathscr {C}_{m_2}\). Let us write the parabolic Carleson operator as

$$\begin{aligned} \mathscr {C}^{\mathrm {par}} f(x,y) = \sup _{N\in \mathbb {R}^2} \left| p.v.\int _\mathbb {R}f(x-t, y-t^2) e^{iN_1 t+ i N_2 t^2} \frac{dt}{t}\right| . \end{aligned}$$
(2.4)

Apart from the linear modulation symmetries given by

$$\begin{aligned} \mathscr {C}^{\mathrm {par}} f = \mathscr {C}^{\mathrm {par}} \mathrm {M}_{N} f \end{aligned}$$

for \(N\in \mathbb {R}^2\), there are additional modulation symmetries. For a polynomial in two variables, \(P=P(x,y)\), we write the corresponding polynomial modulation as

$$\begin{aligned} \mathrm {M}_{P} f(x,y) = e^{iP(x,y)} f(x,y). \end{aligned}$$

Then we have that

$$\begin{aligned} \mathscr {C}^{\mathrm {par}} \mathrm {M}_{N x^2} f= & {} \mathscr {C}^{\mathrm {par}} f,\nonumber \\ \mathscr {C}^{\mathrm {par}} \mathrm {M}_{N x(y+x^2)} f= & {} \mathscr {C}^{\mathrm {par}} f,\nonumber \\ \mathscr {C}^{\mathrm {par}} \mathrm {M}_{N (y+x^2)^2} f= & {} \mathscr {C}^{\mathrm {par}} f \end{aligned}$$
(2.5)

hold for all \(N\in \mathbb {R}\). The quadratic modulation symmetry (2.5) suggests a connection to Lie’s quadratic Carleson operator [10]. Indeed, even a certain partial \(L^2\) bound for \(\mathscr {C}^{\mathrm {par}}\) would immediately imply an \(L^2\) bound for the quadratic Carleson operator (see [6]). These are all the polynomial modulation symmetries of the operator \(\mathscr {C}^{\mathrm {par}}\) (up to linear combination). To see that, introduce the change of variables \(\tau (x,y)=(x,y+x^2)\) and observe that

$$\begin{aligned} \mathscr {C}^{\mathrm {par}}f=\mathscr {C}^{\mathrm {sh}} (f\circ \tau ^{-1})\circ \tau , \end{aligned}$$

where

$$\begin{aligned} \mathscr {C}^{\mathrm {sh}}f(x,y)=\sup _{N\in \mathbb {R}^2} \left| p.v.\int _\mathbb {R}f(x-t,y-2xt) e^{iN_1 t + iN_2 t^2} \frac{dt}{t} \right| . \end{aligned}$$

It is easy to check that we have

$$\begin{aligned} \mathscr {C}^{\mathrm {sh}} \mathrm {M}_P f = \mathscr {C}^{\mathrm {sh}}f \end{aligned}$$

for a polynomial P if and only if P is of degree at most two. This shows that the list of polynomial modulation symmetries that we gave for \(\mathscr {C}^{\mathrm {par}}\) is complete. Also since \(\tau \) is measure preserving, \(L^p\) bounds for \(\mathscr {C}^{\mathrm {par}}\) and \(\mathscr {C}^{\mathrm {sh}}\) are equivalent.

3 Reduction to a Model Operator

Before we begin we need to introduce some more notation and definitions. Denote

$$\begin{aligned} \mathrm {dist}_\alpha (A,B) = \inf _{x\in A, y\in B} \rho _\alpha ({x-y}) \end{aligned}$$

and \(\mathrm {dist}_\alpha (A,x)=\mathrm {dist}_\alpha (A,\{x\})\). For \(a,b\in \mathbb {R}^n\) we write

$$\begin{aligned} {[}a,b] = \prod _{i=1}^n [a_i, b_i] \end{aligned}$$

and similarly (ab), [ab). We will refer to all such sets as rectangles. For a rectangle \(I\subset \mathbb {R}^n\) we define c(I) to be its center. By an anisotropic cube we mean a rectangle [ab] such that \(b_i-a_i=\lambda ^{\alpha _i}\) holds for all \(i=1,\dots ,n\) and some \(\lambda >0\). We define the collection of anisotropic dyadic cubes by

$$\begin{aligned} \mathcal {D}^{\alpha } = \left\{ [\delta _{2^k}(\ell ), \delta _{2^k}(\ell +1) )\,:\,\ell \in \mathbb {Z}^n,\,k\in \mathbb {Z}\right\} . \end{aligned}$$

Every two anisotropic dyadic cubes have the property that they are either disjoint or contained in one another. Moreover, for every \(I\in \mathcal {D}^\alpha \) there exists a unique dyadic cube \({I}^+\in \mathcal {D}^\alpha \) such that \(|{I}^+|=2^{|\alpha |}|I|\) and \(I\subset {I}^+\). We call \({I}^+\) the parent of I and say that I is a child of \({I}^+\).

Definition 3.1

A tileP is a rectangle in \(\mathbb {R}^{n}\times \mathbb {R}^n\) of the form

$$\begin{aligned} P = I_P \times \omega _P, \end{aligned}$$

where \(I_P,\omega _P\in \mathcal {D}^{\alpha }\) and \(|I_P|\cdot |\omega _P|=1\).

The set of tiles is denoted by \(\overline{\mathcal {P}}\). Given a tile P we denote its scale by \(k(P) = |I_P|^{1/|\alpha |}\). For \(r\in \{0,1\}^n\) and a tile P with \(\omega _P = {[}\delta _{2^{-k(P)}}(\ell ), \delta _{2^{-k(P)}}(\ell +1)]\) we define the semi-tileP(r) by

$$\begin{aligned} P(r) = I_P \times \omega _{P(r)},\; \text {where}\; \omega _{P(r)} = \Big [\delta _{2^{-k(P)}}\Big (\ell +\frac{1}{2} r\Big ), \delta _{2^{-k(P)}}\Big (\ell +\frac{1}{2} (r+1)\Big )\Big ]. \end{aligned}$$

The model operator is built up using a large family of wave packets adapted to tiles. It is convenient to generate this family by letting the symmetry group of our operator act on a single bump function. For this purpose, let \(\phi \) be a Schwartz function on \(\mathbb {R}^n\) such that \(0\le \widehat{\phi }\le 1\) with \(\widehat{\phi }\) being supported in \([-\frac{b_0}{2}, \frac{b_0}{2}]^n\) and equal to 1 on \([-\frac{b_1}{2}, \frac{b_1}{2}]^n\), where \(0<b_1<b_0\ll 1\) are some fixed, small numbers whose ratio is not too large (it becomes clear what precisely is required in Sect. 7). For example, we may set \(b_0=\frac{1}{10}\), \(b_1=\frac{9}{100}\). We denote translation, modulation and dilation of a function f by

$$\begin{aligned} \mathrm {T}_y f(x)= & {} f(x-y),\quad \quad (y\in \mathbb {R}^n)\\ \mathrm {M}_\xi f(x)= & {} e^{ix\xi } f(x),\quad \quad (\xi \in \mathbb {R}^n)\\ \mathrm {D}_\lambda ^p f(x)= & {} \lambda ^{-\frac{|\alpha |}{p}} f\left( \delta _{\lambda ^{-1}}(x)\right) ,\quad \quad (\lambda ,p>0), \end{aligned}$$

where \(|\alpha |=\sum _{i=1}^n \alpha _i\).

Given a tile P and \(N\in \mathbb {R}^n\) we define the wave packets \(\phi _P, \psi ^N_P\) on \(\mathbb {R}^n\) by

$$\begin{aligned} \phi _P(x)&= \mathrm {M}_{c(\omega _{P(0)})} \mathrm {T}_{c(I_P)} \mathrm {D}^2_{2^{k(P)}} \phi (x) \end{aligned}$$
(3.1)
$$\begin{aligned} \widehat{\psi ^N_P}(\xi )&= \mathrm {T}_{N}m(\xi )\cdot \widehat{\phi _P}(\xi ) \end{aligned}$$
(3.2)

We think of \(\phi _P\) as being essentially time-frequency supported in the semi-tile P(0). More precisely, we have that \(\widehat{\phi _P}\) is compactly supported in (a small cube centrally contained in) \(\omega _{P(0)}\) and \(|\phi _P|\) decays rapidly outside of \(I_P\).

For \(N\in \mathbb {R}^n\) and \(r\not =0\) we introduce the dyadic model sum operator

$$\begin{aligned} A^{r,m}_N f(x)=\sum _{P\in \mathcal {\overline{P}}} \langle f,\phi _P\rangle \psi ^N_P(x) \mathbf {1}_{\omega _{P(r)}}(N). \end{aligned}$$
(3.3)

This reduces to the model sum of Lacey and Thiele [9] in the case \(n=\alpha =1\) and to that of Pramanik and Terwilleger [12] in the isotropic case \(\alpha =(1,\dots ,1)\).

Theorem 3.2

For every large enough integer \(\nu _0\) there exists \(C>0\) depending only on \(\nu _0\), \(\alpha \) and the choice of \(\phi \) such that for all multipliers \(m\in \mathscr {M}_\alpha ^{\nu _0}\) we have

$$\begin{aligned} \Vert \sup _{N\in \mathbb {R}^n} |A^{r,m}_{N} f| \Vert _{2,\infty } \le C \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} \Vert f\Vert _2. \end{aligned}$$
(3.4)

The proof of the theorem is contained in Sects. 47. We conclude this section by showing that Theorem 3.2 implies Theorem 1.1. For this purpose we employ the averaging procedure of Lacey and Thiele [9] combined with an anisotropic cone decomposition of the multiplier m. By an anisotropic cone we mean a subset \(\Theta \subsetneq \mathbb {R}^n\) of the form

$$\begin{aligned} \Theta =\{\delta _{t}(\xi )\,:\,t>0,\,\xi \in Q\} \end{aligned}$$

for some cube \(Q\subset \mathbb {R}^n\). Let us denote \(\mathcal {B}_s=\{x\,:\,\rho _\alpha ({x})\le s\}\). Let

$$\begin{aligned} \mathbf {A}^{r,m} f(x) = \lim _{R\rightarrow \infty } \frac{1}{R^{2|\alpha |}} \int _{\mathcal {B}_R} \int _{\mathcal {B}_R} \int _0^1 \mathrm {M}_{-\eta }\mathrm {T}_{-y}\mathrm {D}^2_{2^{-s}} A^{r,m}_{2^{-s}\eta } \mathrm {D}^2_{2^s} \mathrm {T}_y \mathrm {M}_\eta f(x) ds dy d\eta . \end{aligned}$$
(3.5)

Lemma 3.3

For every \(r\in \{0,1\}^n\) and every test function f, the function \(\mathbf {A}^{r,m} f(x)\) is well-defined and also a test function. We have

$$\begin{aligned} \widehat{\mathbf {A}^{r,m} f}(\xi ) = \theta _r(\xi ) m(\xi ) \widehat{f}(\xi ) \end{aligned}$$

for some smooth function \(\theta _r\) that is independent of m. Moreover, there exists a constant \(\varepsilon _0>0\) and an anisotropic cone \(\Theta _r\) such that

$$\begin{aligned} \theta _r(\xi ) > \varepsilon _0\quad \text {for all}\;\xi \in \Theta _r. \end{aligned}$$

and

$$\begin{aligned} (-\infty ,\varepsilon _0]^n \subset \bigcup _{r\in \{0,1\}^n\setminus \{0\}} \Theta _r. \end{aligned}$$
(3.6)

Proof

By expanding definitions we see that

$$\begin{aligned} \left( \mathrm {M}_{-\eta }\mathrm {T}_{-y}\mathrm {D}^2_{2^{-s}} A^{r,m}_{2^{-s}\eta } \mathrm {D}^2_{2^s} \mathrm {T}_y \mathrm {M}_\eta f\right) ^\wedge (\xi ) \end{aligned}$$

is equal to a universal constant times

$$\begin{aligned}&m(\xi ) \sum _{P\in \overline{\mathcal {P}}} \left\langle \widehat{f}, \mathrm {T}_{-\eta +\delta _{2^s}(c(\omega _{P(0)}))} \mathrm {M}_{y-\delta _{2^{-s}}(c(I_P))} \mathrm {D}^2_{2^{s-k(P)}} \widehat{\phi } \right\rangle \\&\quad \times \mathrm {T}_{-\eta +\delta _{2^s}(c(\omega _{P(0)}))} \mathrm {M}_{y-\delta _{2^{-s}}(c(I_P))} \mathrm {D}^2_{2^{s-k(P)}} \widehat{\phi }(\xi ) \mathbf {1}_{\omega _{P(r)}}(\delta _{2^{-s}}(\eta )), \end{aligned}$$

where we have used that \(m(\delta _{2^{-s}}(\xi ))=m(\xi )\). The previous display equals

$$\begin{aligned}&m(\xi )\sum _{k\in \mathbb {Z}} \sum _{\ell \in \mathbb {Z}^n} \sum _{u\in \mathbb {Z}^n} 2^{-|\alpha |(s-k)} \int _{\mathbb {R}^n} \widehat{f}(\zeta ) e^{i(y-\delta _{2^{-s+k}}(u+\frac{1}{2}))(\xi -\zeta )}\\&\quad \times \overline{\widehat{\phi }} \Big (\delta _{2^{-s+k}}(\zeta +\eta )-\Big (\ell +\frac{1}{4}\Big )\Big ) d\zeta \\&\quad \times \widehat{\phi } \Big (\delta _{2^{-s+k}}(\xi +\eta )-\Big (\ell +\frac{1}{4}\Big )\Big ) \mathbf {1}_{\omega _{P(r)}}(\delta _{2^{-s}}(\eta )). \end{aligned}$$

Applying the Poisson summation formula to the summation in u and using the Fourier support information of the function \(\phi \) we see that the previous display equals (up to a universal constant)

$$\begin{aligned} m(\xi )\widehat{f}(\xi ) \sum _{k\in \mathbb {Z}} \sum _{\ell \in \mathbb {Z}^n} |\widehat{\phi }|^2 \Big (\delta _{2^{-s+k}}(\xi +\eta )-\Big (\ell +\frac{1}{4}\Big )\Big ) \mathbf {1}_{\omega _{P(r)}}(\delta _{2^{-s}}(\eta )). \end{aligned}$$

Observe that the expression no longer depends on the variable y. It remains to compute the function \(\theta _r(\xi )=c\cdot \lim _{R\rightarrow \infty } I_R(\xi )\), where c is a universal constant and

$$\begin{aligned} I_R(\xi ) = \frac{1}{R^{|\alpha |}} \int _{\mathcal {B}_R} \int _0^1 \sum _{k\in \mathbb {Z}} \sum _{\ell \in \mathbb {Z}^n} |\widehat{\phi }|^2 \Big (\delta _{2^{-s+k}}(\xi +\eta )-\Big (\ell +\frac{1}{4}\Big )\Big ) \mathbf {1}_{\omega _{P(r)}}(\delta _{2^{-s}}(\eta )) ds d\eta . \end{aligned}$$

Note the formula

$$\begin{aligned} \int _0^1 \sum _{k\in \mathbb {Z}} F(2^{k-s}) ds = \frac{1}{\log 2} \int _0^\infty F(t) \frac{dt}{t}, \end{aligned}$$

which follows from a change of variables \(2^{k-s}\rightarrow t\). Using this we have

$$\begin{aligned} I_R(\xi ) = \frac{c}{R^{|\alpha |}} \int _{\mathcal {B}_R} \int _0^\infty \sum _{\ell \in \mathbb {Z}^n} |\widehat{\phi }|^2 \Big (\delta _{t}(\xi +\eta )-\Big (\ell +\frac{1}{4}\Big )\Big ) \mathbf {1}_{Q_r}(\delta _{t}(\eta )-\ell ) \frac{dt}{t} d\eta , \end{aligned}$$

where \(Q_r=\Big [\frac{1}{2} r,\frac{1}{2} (r+1)\Big ]=\prod _{i=1}^n\Big [\frac{1}{2} r_i, \frac{1}{2} (r_i+1)\Big ]\) and \(c=(\log 2)^{-1}\) (c may change from line to line in this proof). To simplify our expression further we perform the change of variables

$$\begin{aligned} \delta _{t}(\xi +\eta )-\ell \rightarrow \zeta \end{aligned}$$

in the integration in \(\eta \). This yields

$$\begin{aligned} I_R(\xi ) = c\int _{\mathbb {R}^n} \int _0^\infty \chi (\zeta ) \mathbf {1}_{Q_r}(\zeta -\delta _{t}(\xi )) \left( \sum _{\ell \in \mathbb {Z}^n} \frac{\mathbf {1}_{\rho _\alpha ({\zeta +\ell -\delta _{t}(\xi )})\le tR}}{(tR)^{|\alpha |}} \right) \frac{dt}{t} d\zeta \end{aligned}$$
(3.7)

where we have set

$$\begin{aligned} \chi (\zeta ) = |\widehat{\phi }|^2 \Big ( \zeta - \frac{1}{4}\Big ). \end{aligned}$$

Observe that the integrand in (3.7) is supported in a compact subset of \(\mathbb {R}^n\times (0,\infty )\) (which depends on \(\xi \)). By counting the \(\ell \) for which the summand is non-zero we see that for every fixed \(\zeta , \xi \in \mathbb {R}^n\) and \(t>0\) the sum

$$\begin{aligned} \sum _{\ell \in \mathbb {Z}^n} \frac{\mathbf {1}_{\rho _\alpha ({\zeta +\ell -\delta _{t}(\xi )})\le tR}}{(tR)^{|\alpha |}} \end{aligned}$$

converges to a universal constant as \(R\rightarrow \infty \). Thus, from Lebesgue’s dominated convergence theorem we conclude that

$$\begin{aligned} \theta _r(\xi ) = c \int _{\mathbb {R}^n} \int _0^\infty \chi (\zeta ) \mathbf {1}_{Q_r}(\zeta -\delta _{t}(\xi ))\frac{dt}{t} d\zeta . \end{aligned}$$
(3.8)

Evidently we have \(\theta _r(\delta _{t}(\xi ))=\theta _r(\xi )\) for every \(t>0\) and \(\xi \in \mathbb {R}^n\). From our choice of \(\phi \) we get that \(\chi \) is supported on \(Q^{(0)}\) and equal to one on \(Q^{(1)}\), where

$$\begin{aligned} Q^{(j)} = \Big [\frac{1}{4} - \frac{b_j}{2},\frac{1}{4} + \frac{b_j}{2}\Big ] \end{aligned}$$

for \(j=0,1\). Let us set

$$\begin{aligned} \Theta _r = \{ \delta _{t}(\xi )\,:\, \xi \in Q^{(1)}-Q_r \}. \end{aligned}$$

Then we can read off (3.8) that \(\theta _r\) is greater than some positive constant on \(\Theta ^{(1)}_r\). Note that

$$\begin{aligned} Q^{(j)}-Q_r = \Big [-\frac{1}{2}r - \Big (\frac{1}{4} + \frac{b_j}{2}\Big ), -\frac{1}{2}r + \Big (\frac{1}{4} + \frac{b_j}{2}\Big )\Big ]. \end{aligned}$$

Looking at the anisotropic cone generated by each of the regions \(Q^{(1)}-Q_r\) we see that (3.6) is satisfied for sufficiently small \(\varepsilon _0\). \(\square \)

In the isotropic case \(\alpha =(1,\dots ,1)\) we can assume without loss of generality that the multiplier m is supported in some arbitrarily chosen cone. Due to the lack of rotation invariance this assumption becomes invalid in the anisotropic setting.

Proof of Theorem 1.1

Let \(m\in \mathscr {M}_\alpha ^{\nu _0}\). Without loss of generality we may assume that m is supported in the “quadrant” \((-\infty ,0]^n\). By (3.6) we can choose smooth functions \((\varrho _r)_r\) such that \(\varrho _r\) is supported in \(\Theta _r\) and

$$\begin{aligned} \sum _{r\in \{0,1\}^n\setminus \{0\}} \varrho _r(\xi ) = 1 \end{aligned}$$

for \(\xi \in (-\infty ,0]^n\). By the triangle inequality and Lemma 3.3, we have

$$\begin{aligned} \Vert \mathscr {C}_m f\Vert _{2,\infty } \le \sum _{r\in \{0,1\}^n\setminus \{0\}} \Vert \sup _{N\in \mathbb {R}^n} |\mathbf {A}^{r,\theta _r^{-1}\varrho _r m} \mathrm {M}_{N} f| \Vert _{2,\infty }. \end{aligned}$$

Here \(\theta _r^{-1}\) refers to the function \(\xi \mapsto (\theta _r(\xi ))^{-1}\), which is bounded on \(\Theta _r\). By (3.5) and Minkowski’s integral inequality, the previous is no greater than

$$\begin{aligned} \sum _{r\in \{0,1\}^n\setminus \{0\}} \limsup _{R\rightarrow \infty } \frac{1}{R^{2|\alpha |}} \int _{\mathcal {B}_R}\int _{\mathcal {B}_R} \int _0^1 \Vert \sup _{N\in \mathbb {R}^n} |A_N^{r,\theta _r^{-1}\varrho _r m} \mathrm {D}^2_{2^s} \mathrm {T}_y \mathrm {M}_\eta f | \Vert _{2,\infty } ds dy d\eta , \end{aligned}$$

which by Theorem 3.2 is bounded by

$$\begin{aligned} C \sum _{r\in \{0,1\}^n\setminus \{0\}} \Vert \theta _r^{-1}\varrho _r m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} \Vert f\Vert _2\lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} \Vert f\Vert _2. \end{aligned}$$

\(\square \)

4 Boundedness of the Model Operator

In this section we describe the proof of Theorem 3.2. We follow [9]. First, we perform some preliminary reductions. Given a measurable function \(N:\mathbb {R}^n\rightarrow \mathbb {R}^n\) we define

$$\begin{aligned} Tf(x) = A^r_{N(x)} f(x). \end{aligned}$$

Note that the estimate (3.4) is equivalent to showing

$$\begin{aligned} \Vert T f\Vert _{2,\infty } \le C \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} \Vert f\Vert _2 \end{aligned}$$

with C not depending on the choice of the measurable function N. By duality, it is equivalent to show

$$\begin{aligned} |\langle Tf, \mathbf {1}_E \rangle | \lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} |E|^{\frac{1}{2}} \Vert f\Vert _2, \end{aligned}$$

where E is an arbitrary measurable set. By scaling, we may assume without loss of generality that \(\Vert f\Vert _2=1\) and \(|E|\le 1\). Thus, by the triangle inequality, it suffices to show that

$$\begin{aligned} \sum _{P\in \mathcal {P}} |\langle f,\phi _P\rangle \langle \mathbf {1}_{E\cap N^{-1}(\omega _{P(r)})}, \psi ^{N(\cdot )}_P\rangle | \lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}}, \end{aligned}$$
(4.1)

for all finite sets of tiles \(\mathcal {P}\subset \overline{\mathcal {P}}\), with the implied constant being independent of \(f,E,N,\mathcal {P}\). Throughout this and the following sections we fix \(r\in \{0,1\}^n\setminus \{0\}\). Before we continue we need to introduce certain collections of tiles called trees. There is a partial order on tiles defined by

$$\begin{aligned} P\le P'\quad \text {if}\quad I_P\subset I_{P'}\quad \text {and}\quad c(\omega _{P'})\in \omega _P. \end{aligned}$$

Observe that two tiles are comparable with respect to \(\le \) if and only if they have a non-empty intersection.

Definition 4.1

A finite collection \(\mathbf {T}\subset \overline{\mathcal {P}}\) of tiles is called a tree if there exists \(P\in \mathbf {T}\) such that \(P'\le P\) for every \(P'\in \mathbf {T}\). In that case, P is uniquely determined and referred to as the top of the tree \(\mathbf {T}\). We denote the top of a tree \(\mathbf {T}\) by \(P_\mathbf {T}=I_\mathbf {T}\times \omega _\mathbf {T}\) and write \(k_{\mathbf {T}}=|I_\mathbf {T}|^{1/|\alpha |}\).

A tree \(\mathbf {T}\) is called a 1-tree if \(c(\omega _\mathbf {T})\not \in \omega _{P(r)}\) for all \(P\in \mathbf {T}\) and it is called a 2-tree if \(c(\omega _\mathbf {T})\in \omega _{P(r)}\) for all \(P\in \mathbf {T}\). These names are due to historical reasons (see [9]).

The notion of a tree was first introduced in this context by Fefferman [4]. For a tile \(P\in \overline{\mathcal {P}}\) we write

$$\begin{aligned} E_{P}=E\cap N^{-1}(\omega _{P})\;\text {and}\; E_{P(r)}=E\cap N^{-1}(\omega _{P(r)}). \end{aligned}$$

The mass of a single tile P is defined as

$$\begin{aligned} \mathcal {M}(P)=\sup _{P^\prime \ge P} \int _{E_{P^\prime }} w^{\nu _1}_{P^\prime }(x)dx, \end{aligned}$$
(4.2)

where \(\nu _1\) is a fixed large positive number depending only on \(|\alpha |\) that is to be determined later and

$$\begin{aligned} w^{\nu }_P(x) = \mathrm {T}_{c(I_P)} \mathrm {D}^1_{2^{k(P)}} w^{\nu }(x), \end{aligned}$$

where the weight \(w^{\nu }\) takes the form

$$\begin{aligned} w^{\nu }(x)=(1+\rho _\alpha ({x}))^{-\nu }. \end{aligned}$$

For convenience we also write \(w_P=w^{\nu _1}_P\). For a collection of tiles \(\mathcal {P}\subset \mathcal {\overline{P}}\) we define their mass as

$$\begin{aligned} \mathcal {M}(\mathcal {P})=\sup _{P\in \mathcal {P}} \mathcal {M}(P). \end{aligned}$$
(4.3)

The energy of a collection of tiles \(\mathcal {P}\) is defined as

$$\begin{aligned} \mathcal {E}(\mathcal {P})=\sup _{\mathbf {T}\subset \mathcal {P} \,\,2\mathrm {-tree}} \left( \frac{1}{|I_T|} \sum _{P\in \mathbf {T}} |\langle f,\phi _P\rangle |^2\right) ^{1/2}. \end{aligned}$$
(4.4)

These quantities and the following lemmas originate in [9].

Lemma 4.2

(Mass lemma) If \(\nu _1>|\alpha |+1\), then there exists \(C>0\) depending only on \(\alpha \) such that for every finite set of tiles \(\mathcal {P}\subset \overline{\mathcal {P}}\) there is a decomposition \(\mathcal {P}=\mathcal {P}_{\mathrm {light}}\cup \mathcal {P}_{\mathrm {heavy}}\) such that

$$\begin{aligned} \mathcal {M}(\mathcal {P}_{\mathrm {light}})\le 2^{-2}\,\mathcal {M}(\mathcal {P}) \end{aligned}$$
(4.5)

and \(\mathcal {P}_{\mathrm {heavy}}\) is a union of a set \(\mathcal {T}\) of trees such that

$$\begin{aligned} \sum _{\mathbf {T}\in \mathcal {T}} |I_{\mathbf {T}}| \le \frac{C}{\mathcal {M}(\mathcal {P})}. \end{aligned}$$
(4.6)

Lemma 4.3

(Energy lemma) There exists \(C>0\) depending only on \(\alpha \) such that for every finite set of tiles \(\mathcal {P}\subset \overline{\mathcal {P}}\) there is a decomposition \(\mathcal {P}=\mathcal {P}_{\mathrm {low}}\cup \mathcal {P}_{\mathrm {high}}\) such that

$$\begin{aligned} \mathcal {E}(\mathcal {P}_{\mathrm {low}})\le 2^{-1}\,\mathcal {E}(\mathcal {P}) \end{aligned}$$
(4.7)

and \(\mathcal {P}_{\mathrm {high}}\) is a union of a set \(\mathcal {T}\) of trees such that

$$\begin{aligned} \sum _{\mathbf {T}\in \mathcal {T}} |I_\mathbf {T}| \le \frac{C}{\mathcal {E}(\mathcal {P})^2}. \end{aligned}$$
(4.8)

Lemma 4.4

(Tree estimate) There exists \(C>0\) depending only on \(\alpha \) such that if \(m\in \mathscr {M}_\alpha ^{\nu _0}\), then the following inequality holds for every tree \(\mathbf {T}\):

$$\begin{aligned} \sum _{P\in \mathbf {T}} |\langle f,\phi _P\rangle \langle \psi _P^{N(\cdot )}, \mathbf {1}_{E_{P(r)}} \rangle | \le C \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} |I_\mathbf {T}|\mathcal {E}(\mathbf {T})\mathcal {M}(\mathbf {T}) \end{aligned}$$
(4.9)

The proofs of these lemmas are contained in Sects. 5, 6, and 7, respectively. By iterated application of these lemmas we obtain a proof of (4.1). This argument is literally the same as in [9], but we include it here for convenience of the reader. Let \(\mathcal {P}\) be a finite collection of tiles. We will decompose \(\mathcal {P}\) into disjoint sets \((\mathcal {P}_\ell )_{\ell \in \mathcal {N}}\) (where \(\mathcal {N}\) is some finite set of integers) such that each \(\mathcal {P}_\ell \) satisfies

$$\begin{aligned} \mathcal {M}(\mathcal {P}_\ell ) \le 2^{2\ell }\quad \text {and}\quad \mathcal {E}(\mathcal {P}_\ell )\le 2^\ell \end{aligned}$$
(4.10)

and is equal to the union of a set of trees \(\mathcal {T}_\ell \) such that

$$\begin{aligned} \sum _{\mathbf {T}\in \mathcal {T}_\ell } |I_\mathbf {T}| \le C 2^{-2\ell }. \end{aligned}$$
(4.11)

This is achieved by the following procedure:

  1. (1)

    Initialize \(\mathcal {P}^{\mathrm {stock}}:=\mathcal {P}\) and choose an initial \(\ell \) that is large enough such that

    $$\begin{aligned} \mathcal {M}(\mathcal {P}^{\mathrm {stock}}) \le 2^{2\ell }\quad \text {and}\quad \mathcal {E}(\mathcal {P}^{\mathrm {stock}})\le 2^\ell . \end{aligned}$$
    (4.12)
  2. (2)

    If \(\mathcal {M}(\mathcal {P}^{\mathrm {stock}})>2^{2(\ell -1)}\), then apply Lemma 4.2 to decompose \(\mathcal {P}^{\mathrm {stock}}\) into \(\mathcal {P}_{\mathrm {light}}\) and \(\mathcal {P}_{\mathrm {heavy}}\). We addFootnote 1\(\mathcal {P}_{\mathrm {heavy}}\) to \(\mathcal {P}_\ell \) and update \(\mathcal {P}^{\mathrm {stock}}:=\mathcal {P}_{\mathrm {light}}\) (thus, we now have \(\mathcal {M}(\mathcal {P}^{\mathrm {stock}})\le 2^{2(\ell -1)}\)).

  3. (3)

    If \(\mathcal {E}(\mathcal {P}^{\mathrm {stock}})>2^{\ell -1}\), then apply Lemma 4.3 to decompose \(\mathcal {P}^{\mathrm {stock}}\) into \(\mathcal {P}_{\mathrm {low}}\) and \(\mathcal {P}_{\mathrm {high}}\). We add \(\mathcal {P}_{\mathrm {high}}\) to \(\mathcal {P}_\ell \) and update \(\mathcal {P}^{\mathrm {stock}}:=\mathcal {P}_{\mathrm {low}}\) (thus, we now have \(\mathcal {E}(\mathcal {P}^{\mathrm {stock}})\le 2^{\ell -1}\)).

  4. (4)

    If \(\mathcal {P}^{\mathrm {stock}}\) is not empty, then replace \(\ell \) by \(\ell -1\) and go to Step (2).

Then we can finish the proof of (4.1) by using (4.10), (4.11), (4.9) and keeping in mind that we always have \(\mathcal {M}(\mathcal {P})\le \Vert w^{\nu _1}\Vert _1\):

$$\begin{aligned}&\sum _{P\in \mathcal {P}} |\langle f,\phi _P\rangle \langle \mathbf {1}_{E\cap N^{-1}(\omega _{P(r)})}, \psi ^{N(\cdot )}_P\rangle | = \sum _{\ell \in \mathcal {N}} \sum _{\mathbf {T}\in \mathcal {T}_\ell } \sum _{P\in \mathbf {T}} |\langle f,\phi _P\rangle \langle \mathbf {1}_{E\cap N^{-1}(\omega _{P(r)})}, \psi ^{N(\cdot )}_P\rangle |\\&\quad \lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} \sum _{\ell \in \mathcal {N}} 2^\ell \mathrm {min}(1,2^{2\ell }) \sum _{\mathbf {T}\in \mathcal {T}_\ell } |I_\mathbf {T}| \lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}}\sum _{\ell \in \mathbb {Z}} 2^{-\ell } \mathrm {min}(1,2^{2\ell })\lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}}. \end{aligned}$$

To conclude this section we collect several standard auxiliary estimates for \(m,K,\phi _P,\psi _P^N\) which are used during the remainder of the proof. First, from the definition of \(\mathscr {M}_\alpha ^{\nu }\) we have the symbol estimate

$$\begin{aligned} |\partial _i^\nu m(\xi )| \le \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu }} \rho _\alpha ({\xi })^{-\nu \alpha _i} \end{aligned}$$
(4.13)

for every integer \(0\le \nu \le \nu _0\) and \(i=1,\dots ,n\). If we let K denote the corresponding kernel (that is, \(\widehat{K}=m\)), we have

$$\begin{aligned} |K(x)| \lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\lfloor \frac{|\alpha |}{2}\rfloor +1}} \rho _\alpha ({x})^{-|\alpha |} \end{aligned}$$
(4.14)

for \(x\not =0\). This is a consequence of the anisotropic Hörmander–Mikhlin theorem (see [3]). For every integer \(\nu \ge 0\) and \(N\not \in \omega _{P(0)}\) we have

$$\begin{aligned} |\psi _P^N(x)| \lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu }} |I_P|^{1/2} w_P^{\nu }(x), \end{aligned}$$
(4.15)

where the implicit constant depends only on \(\nu , \alpha \) and the choice of \(\phi \). We defer the proof of this estimate to Sect. 8.

The next estimates concern the interaction of two wave packets associated with distinct tiles. Let \(P,P'~\in ~\overline{\mathcal {P}}\) be tiles. The idea is that if \(P,P'\) are disjoint (or equivalently, incomparable with respect to \(\le \)) then their associated wave packets are almost orthogonal, i.e. \(\langle \phi _P, \phi _{P'}\rangle \) is negligibly small. Indeed, if \(\omega _{P}\) and \(\omega _{P'}\) are disjoint, then we even have \(\langle \phi _P, \phi _{P'}\rangle =0\). However, as an artifact of the Heisenberg uncertainty principle, in the case that only \(I_P\) and \(I_{P'}\) are disjoint, we need to deal with tails. The precise estimate we need is as follows. Assume that \(|I_P|\ge |I_{P'}|\). Then for every integer \(\nu \ge 0\) we have that

$$\begin{aligned} |\langle \phi _P,\phi _{P'}\rangle |\lesssim |I_P|^{-\frac{1}{2}} |I_{P'}|^{\frac{1}{2}} (1+2^{-k(P)}\rho _\alpha ({c(I_P)-c(I_{P'})}))^{-\nu }, \end{aligned}$$
(4.16)

where the implicit constant depends only on \(\nu \) and \(\phi \). See [15, Lemma 2.1] for the version of this estimate for one-dimensional wave packets. Similarly, we have

$$\begin{aligned} |\langle \psi ^N_P,\psi ^N_{P'}\rangle |\lesssim \Vert m\Vert ^2_{\mathscr {M}_\alpha ^{\nu }} |I_P|^{-\frac{1}{2}} |I_{P'}|^{\frac{1}{2}} (1+2^{-k(P)}\rho _\alpha ({c(I_P)-c(I_{P'})}))^{-\nu }, \end{aligned}$$
(4.17)

for every integer \(\nu \ge 0\) provided that \(N\not \in \omega _{P(0)}\cup \omega _{P'(0)}\). We prove (4.16) and (4.17) in Sect. 8.

5 Proof of the Mass Lemma

In this section we prove Lemma 4.2. The proof is in essence the same as in [9, Prop. 3.1]. Let \(\mathcal {P}\) be a finite set of tiles and set \(\mu =\mathcal {M}(\mathcal {P})\). We define the set of heavy tiles by

$$\begin{aligned} \mathcal {P}_{\text {heavy}}=\left\{ P\in \mathcal {P} \,:\,\mathcal {M}(P)>\frac{\mu }{4}\right\} \end{aligned}$$

and accordingly \(\mathcal {P}_{\text {light}}=\mathcal {P} \backslash \mathcal {P}_{\text {heavy}}\). Then (4.5) is automatically satisfied. It remains to show (4.6). By the definition of mass (4.2) we know that for every \(P\in \mathcal {P}_{\text {heavy}}\) there exists a \(P^\prime =P^\prime (P)\in \overline{\mathcal {P}}\) with \(P^\prime \ge P\) such that

$$\begin{aligned} \int _{E_{P^\prime }} w_{P^\prime }(x) dx>\frac{\mu }{4} \end{aligned}$$
(5.1)

Note that \(P^\prime \) need not be in \(\mathcal {P}\). Let \(\mathcal {P}^\prime \) be the maximal elements in

$$\begin{aligned} \{P^\prime (P)\,:\,P\in \mathcal {P}_{\text {heavy}}\} \end{aligned}$$

with respect to the partial order \(\le \) of tiles. Then \(\mathcal {P}_{\text {heavy}}\) is a union of trees with tops in \(\mathcal {P}^\prime \). Therefore it suffices to show

$$\begin{aligned} \sum _{P\in \mathcal {P}^\prime } |I_{P}| \le \frac{C}{\mu }\end{aligned}$$
(5.2)

First we rewrite (5.1) as

$$\begin{aligned} \sum _{j=0}^\infty \int _{\begin{array}{c} E_{P}\cap (\delta _{2^j}(I_{P})\backslash \delta _{2^{j-1}}(I_{P})) \end{array}} w_{P}(x) dx>C \mu \sum _{j=0}^\infty 2^{-j}. \end{aligned}$$
(5.3)

where we adopt the temporary convention that \(\delta _{2^{-1}}(I_{P}) =\emptyset \) and for \(j\ge 0\),

$$\begin{aligned} \delta _{2^j}(I_P) = \prod _{i=1}^n \Big [c(I_P)_i-2^{(k(P)+j)\alpha _i-1}, c(I_P)_i+2^{(k(P)+j)\alpha _i-1}\Big ). \end{aligned}$$

Thus, for every \(P\in \mathcal {P}^\prime \) there exists a \(j\ge 0\) such that

$$\begin{aligned} \int _{\begin{array}{c} E_{P}\cap (\delta _{2^j}(I_{P})\backslash \delta _{2^{j-1}}(I_{P})) \end{array}} \frac{dx}{\left( 1+2^{-k(P)}\rho _\alpha ({x-c(I_{P}}))\right) ^{\nu _1}} >C |I_{P}| \mu 2^{-j}. \end{aligned}$$
(5.4)

Note that for \(x\in \delta _{2^j}(I_{P})\backslash \delta _{2^{j-1}}(I_{P})\) we have

$$\begin{aligned} 1+2^{-k(P)}\rho _\alpha ({x-c(I_{P})})\ge C2^j. \end{aligned}$$

Using this we obtain from (5.4),

$$\begin{aligned} |I_{P}|<C\mu ^{-1} |E_{P}\cap \delta _{2^j}(I_{P})| 2^{-(\nu _1-1)j}. \end{aligned}$$
(5.5)

Summarizing, we have shown that for every \(P\in \mathcal {P}'\) there exists \(j\ge 0\) such that (5.5) holds. This leads us to define for every \(j\ge 0\), a set of tiles \(\mathcal {P}_j\) by

$$\begin{aligned} \mathcal {P}_j=\{P\in \mathcal {P}^\prime \,:\,|I_{P}|<C\mu ^{-1} |E_{P}\cap \delta _{2^j}(I_{P})| 2^{-j(\nu _1-1)}\}. \end{aligned}$$

The estimate (5.2) will follow by summing over j if we can show that

$$\begin{aligned} \sum _{P\in \mathcal {P}_j} |I_P| \le C 2^{-j} \mu ^{-1} \end{aligned}$$
(5.6)

for all \(j\ge 0\). To show (5.6) we use a covering argument reminiscent of Vitali’s covering lemma. Fix \(j\ge 0\). For every tile \(P=I_P\times \omega _P\) we have an enlarged tile \(\delta _{2^j}(I_{P}) \times \omega _P\) (this is not a tile anymore). We inductively choose \(P_i\in \mathcal {P}_j\) such that \(|I_{P_i}|\) is maximal among the \(P\in \mathcal {P}_j\backslash \{P_0,\dots ,P_{i-1}\}\) and the enlarged tile of \(P_i\) is disjoint from the enlarged tiles of \(P_0,\dots ,P_{i-1}\). Since \(\mathcal {P}_j\) is finite, this process terminates after finitely many steps, so that we have selected a subset \(\mathcal {P}^\prime _j=\{P_0,P_1,\dots \}\subset \mathcal {P}_j\) of tiles whose enlarged tiles are pairwise disjoint. By construction, for every \(P\in \mathcal {P}_j\) there exists a unique \(P^\prime \in \mathcal {P}^\prime _j\) such that \(|I_P|\le |I_{P^\prime }|\) and the enlarged tiles of P and \(P^\prime \) intersect. We call Passociated with \(P^\prime \).

Now the claim is that if two tiles \(P,Q\in \mathcal {P}_j\) are associated with the same \(P^\prime \in \mathcal {P}^\prime _j\), then \(I_P\) and \(I_{Q}\) are disjoint. To see this note that \(\omega _P\) intersects \(\omega _{P^\prime }\) by definition. Thus, since \(|I_P|\le |I_{P^\prime }|\), we have \(\omega _{P^\prime }\subset \omega _P\). The same holds for Q. Therefore we have \(\omega _{P^\prime }\subset \omega _P\cap \omega _{Q}\). But \(P,Q\in \mathcal {P}_j\subset \mathcal {P}'\) are disjoint tiles, so we must have \(I_P\cap I_{Q}=\emptyset \). Moreover, all tiles P associated with \(P^\prime \) satisfy \(I_P\subset \delta _{2^{j+2}}(I_{P'})\). Therefore we get

$$\begin{aligned} \sum _{P\in \mathcal {P}_j} |I_P|&= \sum _{P^\prime \in \mathcal {P}^\prime _j} \sum _{\begin{array}{c} P\in \mathcal {P}_j\\ \text {assoc. with}\,P^\prime \end{array}} |I_P|=\sum _{P^\prime \in \mathcal {P}_j^\prime } \Bigg | \bigcup _{\begin{array}{c} P\in \mathcal {P}_j\\ \text {assoc. with}\,P^\prime \end{array}} I_P\Bigg | \\&\le \sum _{P^\prime \in \mathcal {P}^\prime _j} 2^{(j+2)|\alpha |} |I_{P^\prime }|\le C \mu ^{-1} 2^{-j(\nu _1-|\alpha |-1)} \\&\quad \times \sum _{P^\prime \in \mathcal {P}^\prime _j} |E\cap N^{-1}(\omega _{P^\prime })\cap \delta _{2^j}(I_{P'})|\\&\le C 2^{-j} \mu ^{-1}, \end{aligned}$$

using that \(\nu _1>|\alpha |+1\). The penultimate inequality is a consequence of (5.5) and the last inequality follows, because the sets \(N^{-1}(\omega _{P^\prime })\cap \delta _{2^j}(I_{P'})\) are disjoint and \(|E|\le 1\).

6 Proof of the Energy Lemma

In this section we prove Lemma 4.3. We adapt the argument of Lacey and Thiele [9, Prop. 3.2]. The tree selection algorithm of Lacey and Thiele relies on the natural ordering of real numbers. In our situation this can be replaced by any functional on \(\mathbb {R}^n\) that separates \(\omega _{P(0)}\) from \(\omega _{P(r)}\) for every tile \(P\in \overline{\mathcal {P}}\) (this was already observed in [12]). Let \(i_0\) be such that \(r_{i_0}=1\) (exists because \(r\not =0\)). Let us introduce the projection to the \(i_0\hbox {th}\) coordinate: \(\pi _0:\mathbb {R}^n\rightarrow \mathbb {R}\), \(x\mapsto x_{i_0}\). Then we have that

$$\begin{aligned} \pi _0(\xi )<\pi _0(\eta ) \end{aligned}$$
(6.1)

holds for every \(\xi \in \omega _{P(0)}, \eta \in \omega _{P(r)}, P\in \overline{\mathcal {P}}\).

Let \(\varepsilon =\mathcal {E}(\mathcal {P})\). For a 2-tree \(\mathbf {T}_2\) we define

$$\begin{aligned} \Delta (\mathbf {T}_2) = \left( \frac{1}{|I_{\mathbf {T}_2}|} \sum _{P\in \mathbf {T}_2} |\langle f,\phi _P\rangle |^2\right) ^{1/2}. \end{aligned}$$

We will now describe an algorithm to choose the desired collection of trees \(\mathcal {T}\) and also an auxiliary collection of 2-trees \(\mathcal {T}_2\):

  1. (1)

    Initialize \(\mathcal {T}:=\mathcal {T}_2:=\emptyset \) and \(\mathcal {P}^{\mathrm {stock}}:=\mathcal {P}\).

  2. (2)

    Choose a 2-tree \(\mathbf {T}_2\subset \mathcal {P}^{\mathrm {stock}}\) such that

    1. (a)

      \(\Delta (\mathbf {T}_2)\ge \varepsilon /2\), and

    2. (b)

      \(\pi _0(c(\omega _{\mathbf {T}_2}))\) is minimal among all the 2-trees in \(\mathcal {P}^{\mathrm {stock}}\) satisfying (a).

    If no such \(\mathbf {T}_2\) exists, then terminate.

  3. (3)

    Let \(\mathbf {T}\) be the maximal tree in \(\mathcal {P}^{\mathrm {stock}}\) with top \(P_{\mathbf {T}_2}\) (with respect to set inclusion).

  4. (4)

    Add \(\mathbf {T}\) to \(\mathcal {T}\) and \(\mathbf {T}_2\) to \(\mathcal {T}_2\). Also, remove all the elements of \(\mathbf {T}\) from \(\mathcal {P}^{\mathrm {stock}}\). Then continue again with Step (2).

Since \(\mathcal {P}\) is finite it is clear that the algorithm terminates after finitely many steps. Also note for every \(\mathbf {T}\in \mathcal {T}\) there exists a unique \(\mathbf {T}_2\in \mathcal {T}_2\) with \(\mathbf {T}_2\subset \mathbf {T}\), and vice versa. After the algorithm terminates we set \(\mathcal {P}_{\mathrm {low}}=\mathcal {P}^{\mathrm {stock}}\) and \(\mathcal {P}_{\mathrm {high}}\) to be the union of the trees in \(\mathcal {T}\). Then, (4.7) is automatically satisfied and it only remains to show

$$\begin{aligned} \sum _{\mathbf {T}_2\in \mathcal {T}_2} |I_{\mathbf {T}_2}| \lesssim \varepsilon ^{-2}. \end{aligned}$$
(6.2)

Before we do that we establish a geometric property of the selected trees that will be crucial in the following.

Lemma 6.1

Let \(\mathbf {T}_2\not =\mathbf {T}'_2\in \mathcal {T}_2\) and \(P\in \mathbf {T}_2, P'\in \mathbf {T}'_2\). If \(\omega _P\subset \omega _{1P'}\), then \(I_{P'}\cap I_{\mathbf {T}_2}=\emptyset \).

Proof

Note that \(c(\omega _{\mathbf {T}_2})\in \omega _P\subset \omega _{P'(0)}\) while \(c(\omega _{\mathbf {T_2'}})\in \omega _{P'(r)}\). By (6.1) and condition (b) in Step (2) we therefore conclude that \(\mathbf {T}_2\) was chosen before \(\mathbf {T}_2'\) during the above algorithm. Let \(\mathbf {T}\) be the tree in \(\mathcal {T}\) such that \(\mathbf {T}_2\subset \mathbf {T}\). Thus, if \(I_{P'}\) was not disjoint from \(I_{\mathbf {T}_2}=I_\mathbf {T}\), then it would be contained in \(I_\mathbf {T}\) and therefore \(P'\le P_\mathbf {T}\) which means it would have been included into \(\mathbf {T}\) during Step (3). That is a contradiction.\(\square \)

The sum in (6.2) equals

$$\begin{aligned} \sum _{T_2\in \mathcal {T}_2} \Delta (\mathbf {T}_2)^{-2} \sum _{P\in \mathbf {T}_2} |\langle f,\phi _P\rangle | \le 4 \varepsilon ^{-2} \sum _{P\in \bigcup \mathcal {T}_2} |\langle f,\phi _P\rangle |^2, \end{aligned}$$

where \(\bigcup \mathcal {T}_2=\bigcup _{\mathbf {T}_2\in \mathcal {T}_2} \mathbf {T}_2\). Let us write

$$\begin{aligned} \sum _{P\in \bigcup \mathcal {T}_2} |\langle f,\phi _P\rangle |^2=\Bigg \langle \sum _{P\in \bigcup \mathcal {T}_2}\langle f,\phi _P\rangle \phi _P, f\Bigg \rangle \end{aligned}$$
(6.3)

and use the Cauchy–Schwarz inequality to estimate this by

$$\begin{aligned} \Bigg \Vert \sum _{P\in \bigcup \mathcal {T}_2}\langle f,\phi _P\rangle \phi _P\Bigg \Vert _2, \end{aligned}$$
(6.4)

where we used that \(\Vert f\Vert _2=1\). So far we have shown that

$$\begin{aligned} \varepsilon ^2 \sum _{\mathbf {T}_2\in \mathcal {T}_2} |I_{\mathbf {T}_2}| \lesssim \Bigg \Vert \sum _{P\in \bigcup \mathcal {T}_2}\langle f,\phi _P\rangle \phi _P\Bigg \Vert _2. \end{aligned}$$
(6.5)

Thus if we can show that

$$\begin{aligned} \Bigg \Vert \sum _{P\in \bigcup \mathcal {T}_2}\langle f,\phi _P\rangle \phi _P\Bigg \Vert _2^2 \lesssim \varepsilon ^2 \sum _{\mathbf {T}_2\in \mathcal {T}_2} |I_{\mathbf {T}_2}|, \end{aligned}$$
(6.6)

then (6.2) follows. Expanding the \(L^2\) norm in (6.6) we get that the left hand side is bounded by

$$\begin{aligned} \sum _{\begin{array}{c} P,P'\in \bigcup \mathcal {T}_2,\\ \omega _P=\omega _{P'} \end{array}} |\langle f,\phi _P\rangle \langle f,\phi _{P'}\rangle \langle \phi _P,\phi _{P'}\rangle | + 2 \sum _{\begin{array}{c} P,P'\in \bigcup \mathcal {T}_2,\\ \omega _P\subset \omega _{P'(0)} \end{array}} |\langle f,\phi _P\rangle \langle f,\phi _{P'}\rangle \langle \phi _P,\phi _{P'}\rangle |. \end{aligned}$$
(6.7)

Here we have used that \(\langle \phi _P,\phi _P'\rangle =0\) if \(\omega _{P(0)}\cap \omega _{P'(0)}=\emptyset \) and therefore either \(\omega _P=\omega _{P'}\), \(\omega _P\subset \omega _{P'(0)}\), or \(\omega _{P'}\subset \omega _{P(0)}\) (the last two cases are symmetric). We treat both sums in this term separately. Estimating the smaller one of \(|\langle f,\phi _P\rangle |\) and \(|\langle f,\phi _{P'}\rangle \) by the larger one, we obtain that the first sum in (6.7) is

$$\begin{aligned} \lesssim \sum _{P\in \bigcup \mathcal {T}_2} |\langle f,\phi _P\rangle |^2 \sum _{\begin{array}{c} P'\in \bigcup \mathcal {T}_2,\\ \omega _P=\omega _{P'} \end{array}} |\langle \phi _P,\phi _{P'}\rangle |. \end{aligned}$$

Using (4.16) we estimate this by

$$\begin{aligned} \sum _{P\in \bigcup \mathcal {T}_2} |\langle f,\phi _P\rangle |^2 \sum _{\begin{array}{c} P'\in \bigcup \mathcal {T}_2,\\ \omega _P=\omega _{P'} \end{array}} (1+2^{-k(P)}\rho _\alpha ({c(I_P)-c(I_{P'})}))^{-\nu }. \end{aligned}$$
(6.8)

Notice that \(I_P\cap I_{P'}=\emptyset \) for \(P\not =P'\) in the inner sum. This implies

$$\begin{aligned} \sum _{\begin{array}{c} P'\in \bigcup \mathcal {T}_2,\\ \omega _P=\omega _{P'} \end{array}} (1+2^{-k(P)}\rho _\alpha ({c(I_P)-c(I_{P'})}))^{-\nu }\lesssim \int _{\mathbb {R}^n} (1+\rho _\alpha ({x}))^{-\nu } dx \lesssim 1, \end{aligned}$$

provided that \(\nu >|\alpha |\). Therefore (6.8) is

$$\begin{aligned} \lesssim \sum _{\mathbf {T}_2\in \mathcal {T}_2} \sum _{P\in \mathbf {T}_2} |\langle f,\phi _P\rangle |^2 \le \varepsilon ^2 \sum _{\mathbf {T}_2\in \mathcal {T}_2} |I_{\mathbf {T}_2}|, \end{aligned}$$
(6.9)

as desired. It remains to estimate the second sum in (6.7). To that end it suffices to show that

$$\begin{aligned} \sum _{P\in \mathbf {T}_2} \sum _{P'\in \mathcal {S}_P} |\langle f,\phi _P\rangle \langle f,\phi _{P'}\rangle \langle \phi _P,\phi _{P'}\rangle | \lesssim \varepsilon ^2 |I_{\mathbf {T}_2}|, \end{aligned}$$
(6.10)

for every \(\mathbf {T}_2\in \mathcal {T}_2\), where

$$\begin{aligned} \mathcal {S}_P=\Big \{P'\in \bigcup \mathcal {T}_2\,:\, \omega _P\subset \omega _{P'(0)}\Big \}. \end{aligned}$$

Here we follow the argument given in [8]. Observe that if \(P\in \mathbf {T}_2\), then \(\mathcal {S}_P\cap \mathbf {T}_2=\emptyset \). Interpreting the singleton \(\{P\}\) as a 2-tree we obtain

$$\begin{aligned} {|}\langle f,\phi _P\rangle |\le \varepsilon |I_P|^{1/2} \end{aligned}$$
(6.11)

for all \(P\in \mathcal {P}\). Combining this with (4.16) we can estimate the left hand side of (6.10) by

$$\begin{aligned} \varepsilon ^2 \sum _{P\in \mathbf {T}_2} \sum _{P'\in \mathcal {S}_P} |I_{P'}| (1+2^{-k(P)}\rho _\alpha ({c(I_P)-c(I_{P'})}))^{-\nu }. \end{aligned}$$
(6.12)

Indeed, Lemma 6.1 implies that \(I_{\mathbf {T}_2}\cap I_{P'}=\emptyset \) for every \(P\in \mathbf {T}_2, P'\in \mathcal {S}_P\). Moreover, it also implies that for \(P'\not =P''\in \mathcal {S}_P\) we have \(I_{P'}\cap I_{P''}=\emptyset \). These facts facilitate the following estimate:

$$\begin{aligned}&\sum _{P\in \mathbf {T}_2} \sum _{P'\in \mathcal {S}_P} |I_{P'}| (1+2^{-k(P)}\rho _\alpha ({c(I_P)-c(I_{P'})}))^{-\nu } \\&\quad \lesssim \sum _{P\in \mathbf {T}_2} \sum _{P'\in \mathcal {S}_P} \int _{I_{P'}} (1+2^{-k(P)}\rho _\alpha ({c(I_P)-x}))^{-\nu } dx\\&\quad \lesssim \sum _{P\in \mathbf {T}_2} \int _{(I_{\mathbf {T}_2})^c} (1+2^{-k(P)}\rho _\alpha ({c(I_P)-x}))^{-\nu }. \end{aligned}$$

Since \(\mathbf {T}_2\) is a tree, the last quantity can be estimated by

$$\begin{aligned} \sum _{k\le k_{\mathbf {T}_2}} \sum _{u\in Q_k\cap (\mathbb {Z}^n+\frac{1}{2})} \int _{(I_{\mathbf {T}_2})^c} (1+\rho _\alpha ({u-\delta _{2^{-k}}(x)}))^{-\nu } dx, \end{aligned}$$

where \(Q_k\in \mathcal {D}^\alpha \) is an anisotropic dyadic rectangle of scale \(k_{\mathbf {T}_2}-k\) that is given by a rescaling of \(I_{\mathbf {T}_2}\). The previous display is no greater than a constant times

$$\begin{aligned} \sum _{k\le k_{\mathbf {T}_2}} 2^{k|\alpha |} \left( \sum _{u\in Q_k\cap (\mathbb {Z}^n+\frac{1}{2})} (1+\mathrm {dist}_\alpha ((Q_k)^c, u)^{-|\alpha |-\gamma }\right) \left( \int _{\mathbb {R}^n} (1+\rho _\alpha ({x}))^{-(\nu -|\alpha |-\gamma )} dx\right) , \end{aligned}$$
(6.13)

where \(\nu >2|\alpha |\) and \(\gamma \) is a fixed and sufficiently small positive constant. The integral over x in the previous display is bounded by a constant depending on \(\nu -|\alpha |-\gamma >|\alpha |\). To estimate the sum over u we note that for every u in the indicated range there exists a lattice point \(v\in \partial Q_k\cap \mathbb {Z}^n\) such that \(\mathrm {dist}_\alpha ((Q_k)^c,u)\ge \frac{1}{2} \rho _\alpha ({v-u})\). Thus we may bound the sum over u by

$$\begin{aligned} \sum _{v\in \partial Q_k\cap \mathbb {Z}^n} \sum _{u\in \mathbb {Z}^n+\frac{1}{2}} (1+\rho _\alpha ({v-u}))^{-|\alpha |-\gamma }\lesssim |\partial Q_k\cap \mathbb {Z}^n|\lesssim 2^{(k_{\mathbf {T}_2}-k)|\alpha |_\infty }. \end{aligned}$$

Thus, (6.13) is bounded by a constant times

$$\begin{aligned} 2^{k_{\mathbf {T}_2}|\alpha |_\infty } \sum _{k\le k_{\mathbf {T}_2}} 2^{k(|\alpha |-|\alpha |_\infty )} \lesssim 2^{k_{\mathbf {T}_2}|\alpha |}=|I_{\mathbf {T}_2}|. \end{aligned}$$

This proves (6.10).

7 Proof of the Tree Estimate

In this section we prove Lemma 4.4. This is the core of the proof. For a rectangle \(I=\prod _{i=1}^n I_i\in \mathcal {D}^\alpha \) we denote by \(\widetilde{I}\) the enlarged rectangle defined by

$$\begin{aligned} \widetilde{I} = \prod _{i=1}^n (2^{\alpha _i+1}-1) I_i. \end{aligned}$$

Here \(\lambda I_i\) is the interval of length \(\lambda |I_i|\) with the same center as \(I_i\). Let \(\mathcal {J}\) be the partition of \(\mathbb {R}^n\) that is given by the collection of maximal anisotropic dyadic rectangles \(J\in \mathcal {D}^\alpha \) such that \(\widetilde{J}\) does not contain any \(I_P\) with \(P\in \mathbf {T}\) (maximal with respect to inclusion). Set \(\varepsilon =\mathcal {E}(\mathbf {T})\) and \(\mu =\mathcal {M}(\mathbf {T})\). Choose phase factors \((\epsilon _P)_P\) of modulus 1 such that

$$\begin{aligned}&\sum _{P\in \mathbf {T}} |\langle f,\phi _P\rangle \langle \psi _P^{N(\cdot )}, \mathbf {1}_{E_{P(r)}} \rangle |\\&\quad =\int _{\mathbb {R}^n}\sum _{P\in \mathbf {T}} \epsilon _P \langle f,\phi _P\rangle \psi _P^{N(x)}(x) \mathbf {1}_{E_{P(r)}}(x) dx\\&\quad \le \left\| \sum _{P\in \mathbf {T}} \epsilon _P \langle f,\phi _P\rangle \psi _P^N \mathbf {1}_{E_{P(r)}}\right\| _{1} \le \mathcal {K}_1 + \mathcal {K}_2, \end{aligned}$$

where

$$\begin{aligned} \mathcal {K}_1= & {} \sum _{J\in \mathcal {J}} \sum _{P\in \mathbf {T}, |I_P|\le |{J}^+|}\Vert \langle f,\phi _P\rangle \psi _P^{N(\cdot )} \mathbf {1}_{E_{P(r)}} \Vert _{L^1(J)},\\ \mathcal {K}_2= & {} \sum _{J\in \mathcal {J}} \left\| \sum _{P\in \mathbf {T}, |I_P|>|{J}^+|} \epsilon _P\langle f,\phi _P\rangle \psi _P^{N(\cdot )} \mathbf {1}_{E_{P(r)}} \right\| _{L^1(J)}. \end{aligned}$$

We first estimate \(\mathcal {K}_1\). This is the easy part, since in the sum defining \(\mathcal {K}_1\) we have that \(I_P\) is disjoint from \(\widetilde{J}\). Again, interpreting the singleton \(\{P\}\) as a 2-tree we see that (6.11) holds for all \(P\in \mathbf {T}\). This gives

$$\begin{aligned} \mathcal {K}_1 \le \varepsilon \sum _{J\in \mathcal {J}} \sum _{\begin{array}{c} P\in \mathbf {T}\\ |I_P|\le |{J}^+| \end{array}} 2^{|\alpha |k(P)/2} \Vert \psi _P^{N(\cdot )} \mathbf {1}_{E_{P(r)}} \Vert _{L^1(J)}. \end{aligned}$$

Using (4.15) the previous display is seen to be no larger than a constant times

$$\begin{aligned}&\Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} \varepsilon \sum _{J\in \mathcal {J}} \sum _{\begin{array}{c} P\in \mathbf {T}\\ |I_P|\le |{J}^+| \end{array}} \int _{J\cap E_{P(r)}} w^{\nu _0}(2^{-k(P)} (x-c(I_P))) dx\nonumber \\&\quad \le \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} \varepsilon \mu \sum _{J\in \mathcal {J}} \sum _{\begin{array}{c} P\in \mathbf {T}\\ |I_P|\le |{J}^+| \end{array}} 2^{|\alpha |k(P)} \sup _{x\in J} w^{2|\alpha |+\frac{2}{3}}(2^{-k(P)}(x-c(I_P))), \end{aligned}$$
(7.1)

where we have set \(\nu _1=|\alpha |+\frac{4}{3}\). Since \(I_P\) is disjoint from \(\widetilde{J}\) we can estimate (7.1) as

$$\begin{aligned} \lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} \varepsilon \mu \sum _{J\in \mathcal {J}} \sum _{\begin{array}{c} k\in \mathbb {Z},\\ 2^{k|\alpha |}\le |{J}^+| \end{array}}2^{|\alpha |k} \sum _{\begin{array}{c} P\in \mathbf {T},\\ k(P)=k \end{array}} w^{2|\alpha |+\frac{2}{3}}(2^{-k}\mathrm {dist}_\alpha (J,I_P)). \end{aligned}$$
(7.2)

Before we proceed, we claim that for every \(\nu >|\alpha |\), \(k\in \mathbb {Z}\) and fixed \(J\in \mathcal {J}\) with \(2^{k|\alpha |}\le |{J}^+|\) we have

$$\begin{aligned} \sum _{\begin{array}{c} P\in \mathbf {T},\\ k(P)=k \end{array}} w^{\nu }(2^{-k}\mathrm {dist}_\alpha (J,I_P)) \lesssim 1, \end{aligned}$$
(7.3)

where the implicit constant blows up as \(\nu \) approaches \(|\alpha |\). To verify the claim, let us assume for simpler notation that J is centered at the origin. Then by disjointness of \(I_P\) and \(\widetilde{J}\) we have

$$\begin{aligned} \mathrm {dist}_\alpha (J,I_P)\gtrsim \mathrm {dist}_\alpha (0,I_P)\gtrsim 2^k \rho (m), \end{aligned}$$

where \(m=(m_1,\dots ,m_n)\in \mathbb {Z}^n\) is such that \(I_P=\prod _{i=1}^n [2^{k\alpha _i} m_i, 2^{k\alpha _i} (m_i+1))\). Thus the sum in (7.3) is

$$\begin{aligned} \lesssim \sum _{m\in \mathbb {Z}^n} (1+\rho (m))^{-\nu }, \end{aligned}$$

which implies the claim.

Estimate (7.2) by

$$\begin{aligned} \lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} \varepsilon \mu \sum _{J\in \mathcal {J}}w^{|\alpha |+\frac{1}{3}}(2^{-k_\mathbf {T}} \mathrm {dist}_\alpha (J,I_T)) \sum _{\begin{array}{c} k\in \mathbb {Z},\\ 2^{k|\alpha |}\le |{J}^+| \end{array}}2^{|\alpha |k}, \end{aligned}$$
(7.4)

Here \(k_\mathbf {T}\) is the scale of \(I_\mathbf {T}\) and we have used (7.3) and

$$\begin{aligned} 2^{-k} \mathrm {dist}_\alpha (J,I_P)\ge 2^{-k_\mathbf {T}} \mathrm {dist}_\alpha (J,I_\mathbf {T}). \end{aligned}$$

Summing the geometric series, (7.4) is

$$\begin{aligned} \lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} \varepsilon \mu \sum _{J\in \mathcal {J}} w^{|\alpha |+\frac{1}{3}}(2^{-k_\mathbf {T}} \mathrm {dist}_\alpha (J,I_P)) |J|. \end{aligned}$$

The sum in that expression can be estimated as follows:

$$\begin{aligned} \sum _{J\in \mathcal {J}} w^{|\alpha |+\frac{1}{3}}(2^{-k_\mathbf {T}} \mathrm {dist}_\alpha (J,I_P)) |J| \lesssim \sum _{J\in \mathcal {J}} \int _J (1+2^{-k_\mathbf {T}}\rho (x-c(I_\mathbf {T})))^{-(|\alpha |+\frac{1}{3})} dx. \end{aligned}$$

By disjointness of the J we can bound this by

$$\begin{aligned} \int _{\mathbb {R}^n} (1+2^{-k_\mathbf {T}} \rho (x-c(I_\mathbf {T})))^{-(|\alpha |+\frac{1}{3})} dx = |I_\mathbf {T}| \int _{\mathbb {R}^n} (1+\rho (x))^{-(|\alpha |+\frac{1}{3})} dx\lesssim |I_\mathbf {T}|. \end{aligned}$$

To summarize, we showed that

$$\begin{aligned} \mathcal {K}_1\lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} \varepsilon \mu |I_\mathbf {T}|, \end{aligned}$$
(7.5)

using that \(\nu _0\ge 3|\alpha |+2\).

Let us proceed to estimating \(\mathcal {K}_2\). This is more difficult. We may assume that the sum runs only over those J for which there is a \(P\in \mathbf {T}\) such that \(|I_P|>|{J}^+|\). Then \(|I_\mathbf {T}|>|{J}^+|\) and \(J\subset \widetilde{I_\mathbf {T}}\). From now on let such a J be fixed. Define

$$\begin{aligned} G_{J}=J\cap \bigcup _{P\in \mathbf {T}, |I_{P}|>|{J}^+|} E_{P(r)} \end{aligned}$$
(7.6)

Before proceeding we prove the following.

Lemma 7.1

There exists a constant \(C>0\) independent of J such that

$$\begin{aligned} |G_{J}|\le C\mu |J| \end{aligned}$$
(7.7)

Proof

By definition of J, there exists \(P_0\in \mathbf {T}\) such that \(I_{P_0}\) is contained in \(\widetilde{{J}^+}\). We claim that there exists a tile \(P_0<P'<P_\mathbf {T}\) such that \(|I_{P'}|=|J^{++}|\). Indeed, note \(|I_{P_0}|\le |J^{++}|\). If there is equality, we simply take \(P'=P_0\). Otherwise we take \(I_{P'}\in \mathcal {D}^\alpha \) to be the unique dyadic ancestor of \(I_{P_0}\) such that \(|I_{P'}|=|J^{++}|\) and choose \(\omega _{P'}\) accordingly such that it contains \(c(\omega _\mathbf {T})\). Now we have

$$\begin{aligned} |\omega _P|=|I_P|^{-1}\le |{J}^{++}|^{-1}= |I_{P'}|^{-1} = |\omega _{P'}| \end{aligned}$$

for every tile \(P\in \mathbf {T}\) with \(|I_P|>|J^+|\). This implies \(\omega _P\subset \omega _{P'}\) and thus

$$\begin{aligned} G_J \subset J\cap E_{P'}. \end{aligned}$$

As a consequence,

$$\begin{aligned} |G_J| \le \int _{E_{P'}} \mathbf {1}_J(x) dx \lesssim |I_{P'}| \int _{E_{P'}} w_{P'}(x) dx\lesssim \mu |J|. \end{aligned}$$

\(\square \)

Let us define

$$\begin{aligned} F_{J}=\sum _{\begin{array}{c} P\in \mathbf {T},\\ |I_{P}|>|{J}^+| \end{array}} \epsilon _{P} \langle f,\phi _{P}\rangle \psi ^{N(\cdot )}_{P} \mathbf {1}_{E_{P(r)}}. \end{aligned}$$
(7.8)

Since every tree can be written as the union of a 1-tree and a 2-tree, we may treat each of these cases separately.

7.1 The Case of 1-Trees

Assume that \(\mathbf {T}\) is a 1-tree. This is the easier case. The reason is that for every \(P,P'\in \mathbf {T}\), \(\omega _P\not =\omega _{P'}\) we have that \(\omega _{P(r)}\) and \(\omega _{P'(r)}\) are disjoint and thus we have good orthogonality of the summands in (7.8). Using (6.11) and (4.15) we see that

$$\begin{aligned} |F_J(x)| \lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} \varepsilon \sum _{\begin{array}{c} P\in \mathbf {T},\\ |I_{P}|>|{J}^+| \end{array}} (1+2^{-k(P)}\rho _\alpha ({x-c(I_P)}))^{-\nu _0} \mathbf {1}_{E_{P(r)}}(x). \end{aligned}$$

Using disjointness of the \(E_{P(r)}\) this can be estimated by

$$\begin{aligned} \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} \varepsilon \cdot \sup _{k\in \mathbb {Z}} \sum _{m\in \mathbb {Z}^n+\frac{1}{2}} (1+2^{-k}\rho _\alpha ({x-\delta _{2^k}(m)}))^{-\nu _0}. \end{aligned}$$

By an index shift we see that

$$\begin{aligned} \sum _{m\in \mathbb {Z}^n+\frac{1}{2}} (1+2^{-k}\rho _\alpha ({x-\delta _{2^k}(m)}))^{-\nu _0} = \sum _{m\in \mathbb {Z}^n+\frac{1}{2}} (1+\rho _\alpha ({m+\gamma }))^{-\nu _0}, \end{aligned}$$

where \(\gamma \in [0,1]^n\) depends on k and x. The last sum is \(\lesssim 1\) independently of \(\gamma \). Thus we proved the pointwise estimate

$$\begin{aligned} |F_{J}(x)|\lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} \varepsilon . \end{aligned}$$
(7.9)

Combining this with the support estimate (7.7) we obtain

$$\begin{aligned} \Vert F_{J}\Vert _{L^{1}(J)}\lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} \varepsilon \mu |J|. \end{aligned}$$
(7.10)

Summing over the pairwise disjoint \(J\subset \widetilde{I_\mathbf {T}}\) we obtain

$$\begin{aligned} \mathcal {K}_2 \lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} \varepsilon \mu |I_\mathbf {T}| \end{aligned}$$

as desired. Note that we only needed \(\nu _0>|\alpha |\) to obtain this estimate.

7.2 The Case of 2-Trees

Here we assume that \(\mathbf {T}\) is a 2-tree. The additional x-dependence present in the wave packets \(\psi ^{N(x)}_P\) makes this part more difficult than the congruent argument in [9]. This problem arises already in the isotropic case [12]. The goal is again to obtain a pointwise estimate for \(F_J\). In the following we fix \(x\in J \) such that \(F_{J}(x)\not =0\). Observe that the \(\omega _{P(r)}\), \(P\in \mathbf {T}\) are nested. Let us denote the smallest (resp. largest) \(\omega _{P(r)}\) (resp. \(\omega _P\)) such that \(x\in N^{-1}(\omega _{P(r)})\cap E\) by \(\omega _-\) (resp. \(\omega _+\)). Let \(k_+\in \mathbb {Z}\) be such that \(|\omega _+|=2^{k_+|\alpha |}\) and \(k_-\in \mathbb {Z}\) such that \(|\omega _-|=2^{-k_-|\alpha |-n}\) (note from the definition that \(\omega _-\not \in \mathcal {D}^\alpha \) if \(\alpha \not =(1,\dots ,1)\)). Then the nestedness property implies

$$\begin{aligned} F_J(x) = \sum _{\begin{array}{c} P\in \mathbf {T},\\ k_+ \le k(P)\le k_- \end{array}}\epsilon _P \langle f,\phi _P\rangle \psi ^{N(x)}_P(x) \end{aligned}$$

Define

$$\begin{aligned} h_x = \mathrm {M}_{c(\omega _+)}\mathrm {D}^1_{2^{k_+}}\phi _+ -M_{c(\omega _-)} D^1_{ 2^{k_-}}\phi _-, \end{aligned}$$

where \(\phi _+(x) = b_1^{-n} \phi (b_1^{-1} x)\) and \(\phi _-\) is a Schwartz function satisfying \(0\le \widehat{\phi _-}\le 1\) such that \(\widehat{\phi _-}\) is supported on \([-\frac{b_2}{2},\frac{b_2}{2}]\) and equals to one on \([-\frac{b_3}{2},\frac{b_3}{2}]\), where \(b_{j+2}=\frac{1}{2}+b_j\) for \(j=0,1\). From the definition we see that \(\widehat{h_x}\) is supported on \(b_0 b_1^{-1}\omega _+ \cap (2b_3 \omega _-)^c\) and equal to one on \(\omega _+\cap (2b_2\omega _-)^c\). In particular, \(\widehat{h_x}(\xi )\) equals to one if \(\xi \in \mathrm {supp}\,\phi _P\) and \(k_+\le k(P)\le k_-\) and vanishes if k(P) is outside this range. For technical reasons that become clear further below we need the support of \(\widehat{h_x}\) to keep a certain distance to \(\omega _-\). We obtain

$$\begin{aligned} F_J(x) = \sum _{P\in \mathbf {T}} \epsilon _P \langle f,\phi _P\rangle (\psi ^{N(x)}_P*h_x)(x). \end{aligned}$$

Fix \(\xi _0\in \omega _{\mathbf {T}}\). We decompose

$$\begin{aligned} F_{J}(x)&=\sum _{P\in \mathbf {T}} \epsilon _P\langle f,\phi _P\rangle (\psi _P^{\xi _0}*h_x)(x) + \sum _{P\in \mathbf {T}} \epsilon _P\langle f,\phi _P\rangle (\psi _P^{N(x)}-\psi _P^{\xi _0})*h_x)(x) \end{aligned}$$
(7.11)
$$\begin{aligned}&= G * \mathrm {M}_{\xi _0} K * h_x (x) + G*(\mathrm {M}_{N(x)} K-\mathrm {M}_{\xi _0} K)*h_x(x) \end{aligned}$$
(7.12)

where

$$\begin{aligned} G=\sum _{P\in \mathbf {T}}\epsilon _P\langle f,\phi _P\rangle \phi _P. \end{aligned}$$
(7.13)

Before proceeding with the proof we record the following simple variant of a standard fact about maximal functions (see [2]).

Lemma 7.2

Let \(\lambda >0\) and w be an integrable function on \(\mathbb {R}^n\) which is constant on \(\{\rho (y)\le \lambda \}\) and radial and decreasing with respect to \(\rho \), i.e.

$$\begin{aligned} w(x)\le w(y) \end{aligned}$$

if \(\rho _\alpha ({x})\ge \rho _\alpha ({y})\), with equality if \(\rho _\alpha ({x})=\rho _\alpha ({y})\). Let \(x\in \mathbb {R}^n\) and \(J\subset \mathbb {R}^n\) be such that \(J\subset \{y\,:\,\rho _\alpha ({x-y})\le \lambda \}\). Then we have

$$\begin{aligned} |F*w|(x) \le \Vert w\Vert _1 \sup _{J\subset I} \frac{1}{|I|} \int _I |F(y)| dy, \end{aligned}$$

where the supremum is taken over all anisotropic cubes \(I\subset \mathbb {R}^n\).

Proof

First we assume that w is a step function. That is,

$$\begin{aligned} w(y) = \sum _{j=1}^\infty c_j \mathbf {1}_{\rho _\alpha ({y})\le r_j} \end{aligned}$$

with \(\lambda \le r_1<r_2<\cdots \). Then we have

$$\begin{aligned} F* w (x) = \sum _j r_j^{|\alpha |} c_j \frac{1}{r_j^{|\alpha |}}\int _{\rho _\alpha ({x-y})\le r_j} |F(y)| dy \le \Vert w\Vert _1 \sup _{J\subset I} \frac{1}{|I|} \int _I |F(y)| dy. \end{aligned}$$

The general case follows by approximation of w by step functions and an application of Lebesgue’s dominated convergence theorem. \(\square \)

Since

$$\begin{aligned} |h_x(y)|\lesssim 2^{-k_+|\alpha |} |\phi _+|(\delta _{2^{-k_+}}(y))+2^{-k_-|\alpha |} |\phi _-|(\delta _{2^{-k_-}}(y)) \end{aligned}$$
(7.14)

and \(x\in J\), \(|J|\le 2^{k_+|\alpha |}\le 2^{k_-|\alpha |}\) we have from Lemma 7.2 that

$$\begin{aligned} |G*\mathrm {M}_{\xi _0}K*h_x(x)|\lesssim \sup _{J\subset I}\frac{1}{|I|}\int _I |G*\mathrm {M}_{\xi _0}K(y)| dy. \end{aligned}$$
(7.15)

Let us assume for the moment that we also have the estimate

$$\begin{aligned} |G*(\mathrm {M}_{N(x)} K-\mathrm {M}_{\xi _0} K)*h_x(x)| \lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}}\sup _{J\subset I} \frac{1}{|I|}\int _I |G(y)| dy. \end{aligned}$$
(7.16)

We will first show how to finish the proof from here. At the end of the section we will then show that (7.16) indeed holds.

From (7.11), (7.15), (7.16) and Lemma 7.1 we see that

$$\begin{aligned}&\sum _{\begin{array}{c} J\in \mathcal {J},\\ J\subset \widetilde{I_\mathbf {T}} \end{array}} \Vert F_J\Vert _{L^1(J)} \\&\quad \lesssim \mu \sum _{\begin{array}{c} J\in \mathcal {J},\\ J\subset \widetilde{I_\mathbf {T}} \end{array}} |J| \left( \sup _{J\subset I}\frac{1}{|I|}\int _I |G*\mathrm {M}_{\xi _0}K(y)| dy + \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}}\sup _{J\subset I} \frac{1}{|I|}\int _I |G(y)| dy\right) \end{aligned}$$

By disjointness of the \(J\in \mathcal {J}\) this is no greater than

$$\begin{aligned} \mu \left( \left\| \mathcal {M}(G*\mathrm {M}_{\xi _0}K)\right\| _{L^1 (\widetilde{I_\mathbf {T}})} +\Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}}\left\| \mathcal {M}(G) \right\| _{L^1(\widetilde{I_\mathbf {T}})}\right) , \end{aligned}$$
(7.17)

where \(\mathcal {M}\) denotes the maximal function defined by

$$\begin{aligned} \mathcal {M} F(y) = \sup _{y\in I} \frac{1}{|I|} \int _I |F|, \end{aligned}$$

where the supremum runs over all anisotropic cubes \(I\subset \mathbb {R}^n\). Clearly, \(\mathcal {M}\) is a bounded operator \(L^2(\mathbb {R}^n)\rightarrow L^2(\mathbb {R}^n)\), because it is bounded pointwise by a composition of one-dimensional Hardy–Littlewood maximal functions applied in each component.

Applying the Cauchy-Schwarz inequality and the \(L^2\) boundedness of \(\mathcal {M}\) we see that (7.17) is

$$\begin{aligned} \lesssim \mu |I_T|^{\frac{1}{2}} \left( \left\| G*\mathrm {M}_{\xi _0}K\right\| _2 + \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}}\left\| G\right\| _2\right) . \end{aligned}$$

By repeating the arguments that lead to the proof of (6.6), using (4.16) or (4.17), respectively, we obtain that

$$\begin{aligned} \left\| G*\mathrm {M}_{\xi _0}K\right\| _2 + \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}}\left\| G\right\| _2\lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}}\varepsilon |I_\mathbf {T}|^{\frac{1}{2}}. \end{aligned}$$

This concludes the proof. It remains to prove (7.16). Let us write

$$\begin{aligned} R(y)=(\mathrm {M}_{N(x)}K-\mathrm {M}_{\xi _0}K)*h_x(y). \end{aligned}$$

We will give two different estimates for R. The first one is only effective if \(\rho _\alpha ({y})\) is large and the second one if \(\rho _\alpha ({y})\) is small. Let us start with the first estimate. By Fourier inversion, we can write R(y) (up to a constant) as

$$\begin{aligned} \int _{\mathbb {R}^n} (m(\xi -N(x))-m(\xi -\xi _0)) \widehat{h_x}(\xi ) e^{i\xi y} d\xi . \end{aligned}$$
(7.18)

Fix y and let i be such that \(\rho _\alpha ({y})=|y_i|^{1/\alpha _i}\). Then we integrate by parts in the \(i\hbox {th}\) component to see that (7.18) is bounded by

$$\begin{aligned} \lesssim \rho _\alpha ({y})^{-\nu '\alpha _i} \int _{\mathbb {R}^n}\Big | \partial ^{\nu '}_{\xi _i} \Big [ (m(\xi -N(x))-m(\xi -\xi _0)) \widehat{h_x}(\xi ) \Big ] \Big | d\xi \end{aligned}$$
(7.19)

for integer \(\nu '\ge 0\), where we have used that \(\rho _\alpha ({y})\ge 2^{k_-}\) to estimate \(|\delta _{2^{-k_-}}(y)|\ge 2^{-k_-}\rho _\alpha ({y})\).

Let \(\ell \le \nu _0\) be a non-negative integer. Using (4.13) we obtain

$$\begin{aligned} \Big |\partial ^\ell _{\xi _i}\Big [m(\xi -N(x))-m(\xi -\xi _0)\Big ]\Big | \le \Vert m\Vert _{\mathscr {M}_\alpha ^\ell } ( \rho _\alpha ({\xi -N(x)})^{-\ell \alpha _i} + \rho _\alpha ({\xi -\xi _0})^{-\ell \alpha _i}). \end{aligned}$$
(7.20)

Recall that \(\xi _0\) and N(x) are contained in \(\omega _-\) and the integrand of (7.19) is supported on \(b_0 b_1^{-1}\omega _+ \cap (2b_3 \omega _-)^c\). Also, there exist \(\omega _1,\dots ,\omega _M\in \mathcal {D}^\alpha \) such that

$$\begin{aligned} \omega _-\subsetneq \omega _1\subsetneq \cdots \subsetneq \omega _M=\omega _+ \end{aligned}$$

and \(|\omega _{j}|=2^{-k_j|\alpha |}\) with \(k_1=k_-\) and \(k_{j+1}=k_j-1\). If \(\xi \in (2b_3\omega _-)^c\) we have

$$\begin{aligned} \min (\rho _\alpha ({\xi -N(x)}),\rho _\alpha ({\xi -\xi _0})) \gtrsim 2^{-k_-}. \end{aligned}$$
(7.21)

On the other hand, if \(\xi \in (b_0b_1^{-1}\omega _j)\cap \omega _{j-1}^c\) for \(j=2,\dots ,M\), then

$$\begin{aligned} \min (\rho _\alpha ({\xi -N(x)}),\rho _\alpha ({\xi -\xi _0}))\gtrsim 2^{-k_j}. \end{aligned}$$
(7.22)

Combining (7.20) and (7.21), (7.22) we get

$$\begin{aligned} |\partial ^\ell _{\xi _i}\Big [m(\xi -N(x))-m(\xi -\xi _0)\Big ]| \lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^\ell } \sum _{j=1}^M 2^{k_j \ell \alpha _i} \mathbf {1}_{b_0b_1^{-1}\omega _j}(\xi ). \end{aligned}$$
(7.23)

We also have

$$\begin{aligned} |\partial ^\ell _{\xi _i}\widehat{h_x}(\xi )| \lesssim 2^{k_+\ell \alpha _i}\mathbf {1}_{b_0b_1^{-1}\omega _+}(\xi ) +2^{k_-\ell \alpha _i} \mathbf {1}_{2b_3\omega _-}(\xi ). \end{aligned}$$
(7.24)

Thus we see from (7.23) and (7.24) that for all \(i=1,\dots ,n\) and \(0\le \ell \le \nu '\) we obtain

$$\begin{aligned} \int _{\mathbb {R}^n} \Big | \partial ^{\ell }_{\xi _i} \Big [ m(\xi -N(x))-m(\xi -\xi _0) \Big ] \partial ^{\nu '-\ell }_{\xi _i} \widehat{h_x}(\xi ) \Big |d\xi \lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu '}} 2^{k_-(\nu '\alpha _i-|\alpha |)}, \end{aligned}$$

provided that \(\nu '\alpha _i\ge |\alpha |\). Setting \(\nu '=\lceil \frac{\nu _0}{\alpha _i}\rceil \), we have shown that

$$\begin{aligned} |R(y)| \lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} 2^{-k_-|\alpha |} (2^{-k_-}\rho _\alpha ({y}))^{-\nu _0}. \end{aligned}$$
(7.25)

It remains to find a good estimate for R(y) when \(\rho _\alpha ({y})\) is small. Let us estimate

$$\begin{aligned} |R(y)|\le R_+(y) + R_-(y), \end{aligned}$$

where

$$\begin{aligned} R_{\pm } = |(\mathrm {M}_{N(x)}K-\mathrm {M}_{\xi _0}K)* \mathrm {D}^1_{2^{k_\pm }}\phi _{\pm }|. \end{aligned}$$

The first claim is that if \(\rho (y)\le 2^{k_\pm +1}\), then

$$\begin{aligned} R_{\pm }(y) \lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} 2^{-k_\pm |\alpha |}. \end{aligned}$$
(7.26)

(Here and throughout the proof of this claim ± always stands for a fixed choice of sign, either \(+\) or −.) To see this, we first estimate \(R_\pm (y)\) by

$$\begin{aligned} 2^{-k_\pm |\alpha |} \int _{\mathbb {R}^n} \left| (e^{i(N(x)-\xi _0)z}-1) K(z) \phi _\pm (2^{-k_\pm }(y-z))\right| dz \lesssim 2^{-k_\pm |\alpha |}(\mathbf {I} + \mathbf {II}), \end{aligned}$$

where

$$\begin{aligned} \mathbf {I}= & {} \int _{\rho _\alpha ({z})\le 2^{k_\pm +2}} \left| (e^{i(N(x)-\xi _0)z}-1) K(z) \phi _\pm (2^{-k_\pm }(y-z))\right| dz,\quad \text {and}\\ \mathbf {II}= & {} \sum _{j=2}^\infty \int _{2^{k_\pm +j}\le \rho _\alpha ({z})\le 2^{k_\pm +j+1}} \left| K(z) \phi _\pm (2^{-k_\pm }(y-z))\right| dz. \end{aligned}$$

We first estimate \(\mathbf {I}\). Changing variables \(z\mapsto \delta _{2^{k_\pm +2}}(z)\) we see that

$$\begin{aligned} \mathbf {I} \lesssim \int _{\rho (z)\le 1} \left| (e^{i\delta _{2^{k_\pm +2}}(N(x)-\xi _0)z}-1) K(z)\right| dz. \end{aligned}$$

Using (4.14), the previous display is

$$\begin{aligned} \lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} \left| \delta _{2^{k_\pm +2}}(N(x)-\xi _0)\right| \int _{\rho (z)\le 1} |z| \rho _\alpha ({z})^{-|\alpha |} dz. \end{aligned}$$

Using \(\rho _\alpha ({\delta _{2^{k_\pm +2}}(N(x)-\xi _0)})=2^{k_\pm +2} \rho _\alpha ({N(x)-\xi _0})\lesssim 1\) we can bound this further as

$$\begin{aligned} \lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}}\int _{\rho (z)\le 1} \rho _\alpha ({z})^{1-|\alpha |} dz \lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}}. \end{aligned}$$

This proves that \(\mathbf {I}\lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}}\). It remains to treat \(\mathbf {II}\). Here we make use of the fact that we have \(\rho _\alpha ({y-z})\ge 2^{k_\pm +j-1}\) in the integrand of \(\mathbf {II}\), because of our assumption \(\rho (y)\le 2^{k_\pm }\). Using the decay of \(\phi _\pm \) we obtain

$$\begin{aligned} \mathbf {II} \lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}}\sum _{j=2}^\infty 2^{-j} \int _{2^{k_\pm +j}\le \rho _\alpha ({z})\le 2^{k_\pm +j+1}} \rho _\alpha ({z})^{-|\alpha |} dz\lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}}. \end{aligned}$$

Thus we have proved (7.26). The only further ingredient which we need in order to verify (7.16) is a good estimate for \(R_+(y)\) when \(2^{k_++1}\le \rho _\alpha ({y})\le 2^{k_-}\). In order to do this we need to do a slightly more careful decomposition. Let us write

$$\begin{aligned} Q_\ell = [\delta _{2^{k_+}}(\ell ),\delta _{2^{k_+}}(\ell +1))=\prod _{i=1}^n {[}2^{k_+\alpha _i} \ell _i, 2^{k_+\alpha _i} (\ell _i+1)) \end{aligned}$$

for \(\ell \in \mathbb {Z}^n\). Assume that \(y\in Q_\ell \) with \(1\le |\ell |_\infty < 2^{k_--k_+}\). We have

$$\begin{aligned} R_+(y) \le 2^{-k_+|\alpha |} \sum _{s\in \mathbb {Z}^n} \int _{Q_s} |(e^{i(N(x)-\xi _0)z}-1) K(z) \phi _+(2^{-k_+}(y-z))| dz. \end{aligned}$$
(7.27)

Moreover, the same estimates that were used to prove (7.26) yield

$$\begin{aligned}&\int _{Q_s} \left| (e^{i(N(x)-\xi _0)z}-1) K(z) \phi _+(2^{-k_+}(y-z))\right| dz\\&\quad \lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} 2^{k_+-k_-} (1+\rho _\alpha ({s-\ell }))^{-|\alpha |-1} (1+\rho _\alpha ({s}))^{1-|\alpha |} \end{aligned}$$

Plugging this inequality into (7.27) we obtain

$$\begin{aligned} R_+(y)&\lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} 2^{-k_-} 2^{k_+(1-|\alpha |)} \sum _{s\in \mathbb {Z}^n} (1+\rho _\alpha ({s-\ell }))^{-|\alpha |-1} (1+\rho _\alpha ({s}))^{1-|\alpha |}\\&\lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} 2^{-k_-} (2^{k_+} \rho _\alpha ({\ell }))^{1-|\alpha |}, \end{aligned}$$

where the last inequality requires \(\nu \) to be large enough. Therefore,

$$\begin{aligned} R_+(y) \lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} 2^{-k_-} \rho _\alpha ({y})^{1-|\alpha |}. \end{aligned}$$
(7.28)

Finally, summarizing (7.25), (7.26) and (7.28) we have shown that

$$\begin{aligned} |R(y)| \lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu _0}} (w_0(y) + w_+(y) + w_-(y) + w_1(y)), \end{aligned}$$

where

$$\begin{aligned} w_0(y)= & {} 2^{-k_-|\alpha |} (2^{-k_-}\rho _\alpha ({y}))^{-|\alpha |-1} \mathbf {1}_{\rho _\alpha ({y})\ge 2^{k_-}},\\ w_\pm (y)= & {} 2^{-k_\pm |\alpha |}\mathbf {1}_{\rho _\alpha ({y})\le 2^{k_\pm +1}},\\ w_1(y)= & {} 2^{-k_-} \rho _\alpha ({y})^{1-|\alpha |}\mathbf {1}_{2^{k_++1}\le \rho _\alpha ({y})\le 2^{k_-}}. \end{aligned}$$

Each of these functions is integrable with an \(L^1(\mathbb {R}^n)\) norm not depending on \(k_-, k_+\), radial and decreasing with respect to \(\rho \) in the sense of Lemma 7.2 and constant on \(\{\rho (y)\le 2^{k_+}\}\) or \(\{\rho (y)\le 2^{k_-}\}\). Thus, applying Lemma 7.2 to each of these functions yields (7.16). Note that to prove (7.16) we only required that \(\nu _0\ge |\alpha |+1\).

8 Proofs of Auxiliary Estimates

In this section we prove (4.15), (4.16) and (4.17).

Proof of (4.15). Expanding definitions and using Fourier inversion we see that, up to a universal constant, \(|\psi ^N_P(x)|\) is equal to

$$\begin{aligned} 2^{k(P)|\alpha |/2} \left| \int _{\mathbb {R}^n} e^{i\xi (x-c(I_P))} m(\xi -N) \widehat{\phi }(\delta _{2^{k(P)}}(\xi -c(\omega _{P(0)}))) d\xi \right| . \end{aligned}$$

Via a change of variables \(\delta _{2^{k(P)}}(\xi -c(\omega _{P(0)}))\rightarrow \zeta \) and using that \(m(\xi )=m(\delta _{2^{k(P)}}(\xi ))\) this becomes

$$\begin{aligned} 2^{-k(P)|\alpha |/2} \left| \int _{\mathbb {R}^n} e^{i\zeta \delta _{2^{-k(P)}}(x-c(I_P))} m(\zeta +\delta _{2^{k(P)}}(c(\omega _{P(0)})-N)) \widehat{\phi }(\zeta ) d\zeta \right| . \end{aligned}$$
(8.1)

Let us fix x and P and take i to be such that \(\rho _\alpha ({x-c(I_P)})=|x_i-c(I_P)_i|^{1/\alpha _i}\). From repeated integration by parts we see that (8.1) is bounded by

$$\begin{aligned} 2^{-k(P)|\alpha |/2} (2^{-k(P)}\rho _\alpha ({x-c(I_P)}))^{-\nu ' \alpha _i} \int _{\mathbb {R}^n} \left| \partial ^{\nu '}_{\zeta _i} ( m(\zeta +\delta _{2^{k(P)}}(c(\omega _{P(0)})-N)) \widehat{\phi }(\zeta ) )\right| d\zeta , \end{aligned}$$

for every integer \(\nu '\ge 0\). We set \(\nu '=\lceil \nu /\alpha _i\rceil \le \nu \). Since \(N\not \in \omega _{P(0)}\) we have \(|\zeta +\delta _{2^{k(P)}}(c(\omega _{P(0)})-N)|\gtrsim 1\). Therefore,

$$\begin{aligned} \int _{\mathbb {R}^n} \left| \partial ^{\nu '}_{\zeta _i} ( m(\zeta +\delta _{2^{k(P)}}(c(\omega _{P(0)})-N)) \widehat{\phi }(\zeta ) )\right| d\zeta \lesssim \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu '}}\le \Vert m\Vert _{\mathscr {M}_\alpha ^{\nu }}. \end{aligned}$$

This concludes the proof of (4.15) in the case that \(\rho _\alpha ({x-c(I_P)})\ge 1\). In the case \(\rho _\alpha ({x-c(I_P)})\le 1\) we simply use the triangle inequality on (8.1). \(\square \)

Proof of (4.16) and (4.17). If \(c(I_P)=c(I_{P'})\), the estimates are trivial. Thus we may assume \(c(I_P)\not =c(I_{P'})\). We have

$$\begin{aligned} |\langle \phi _P, \phi _{P'}\rangle | \le |I_P|^{-\frac{1}{2}} |I_{P'}|^{-\frac{1}{2}} \int _{\mathbb {R}^n} |\phi (\delta _{2^{k(P)}}(x-c(I_P)))\phi (\delta _{2^{k_{P'}}}(x-c(I_{P'}))) | dx. \end{aligned}$$
(8.2)

Since

$$\begin{aligned} \rho _\alpha ({c(I_P)-c(I_{P'})}) \le \rho _\alpha ({x-c(I_P)}) + \rho _\alpha ({x-c(I_{P'})}), \end{aligned}$$

at least one of \(\rho _\alpha ({x-c(I_P)}), \rho _\alpha ({x-c(I_{P'})})\) is \(\ge \frac{1}{2} \rho _\alpha ({c(I_P)-c(I_{P'})})\). Thus, splitting the integral over x accordingly and using rapid decay of \(\phi \), the right hand side of (8.2) is no greater than a constant times

$$\begin{aligned}&|I_P|^{-\frac{1}{2}} |I_{P'}|^{\frac{1}{2}} (1+2^{-k(P)}\rho _\alpha ({c(I_P)-c(I_{P'})}))^{-\nu }\\&\quad + |I_P|^{\frac{1}{2}} |I_{P'}|^{-\frac{1}{2}} (1+2^{-k_{P'}}\rho _\alpha ({c(I_P)-c(I_{P'})}))^{-\nu }. \end{aligned}$$

Recalling that we assumed \(|I_P|\ge |I_{P'}|\) we see that the previous display is bounded by a constant times

$$\begin{aligned} |I_P|^{-\frac{1}{2}} |I_{P'}|^{\frac{1}{2}} (1+2^{-k(P)}\rho _\alpha ({c(I_P)-c(I_{P'})}))^{-\nu }. \end{aligned}$$

This proves (4.16). The estimate (4.17) can be proven in the same way, by using the decay estimate (4.15). \(\square \)