1 Introduction

It was long conjectured that classical inequalities for singular integrals T on weighted spaces \(L^2(w)\) with a Muckenhoupt \(A_2\) weight w should take the sharp form

$$\begin{aligned} \Vert Tf\Vert _{L^2(w)}\le c_T[w]_{A_2}\Vert f\Vert _{L^2(w)}. \end{aligned}$$

This \(A_2\) conjecture was first verified by one of us [7] by introducing a dyadic representation of T, an expansion in terms of simpler discrete model operators (called dyadic/Haar shifts). Earlier versions of the \(A_2\) conjecture for special operators such as the martingale transform, the Beurling–Ahlfors transform, the Hilbert transform and the Riesz transform were due to Wittwer [29], Petermichl and Volberg [25], Petermichl [23, 24], respectively. Since then, simpler proofs of the \(A_2\) theorem as in Lerner [12], Lacey [11], and Lerner–Ombrosi [13] replaced the dyadic representation by sparse domination, but the original dyadic representation theorem continues to have an independent interest and other applications.

One such application is the extension of the linear dyadic representation to bi-parameter (also known as product-space, or Journé-type, after [10]) singular integrals in [17], which has defined the new standard framework for the study of these operators. The multi-parameter extension of this is due to Y. Ou [21] and a bi-linear version is due to Li–Martikainen–Ou–Vuorinen [14]. In the bi-parameter context the representation theorem has proven to be extremely useful e.g., in connection with bi-parameter commutators and weighted analysis, see Holmes–Petermichl–Wick [6], Ou–Petermichl–Strouse [22] and Li–Martikainen–Vuorinen [15, 16]. On the other hand, there are some fundamental obstacles to sparse domination of bi-parameter objects, see [1], which makes the dyadic representation particularly useful in this setting.

In another direction, an open problem in vector-valued Harmonic Analysis is to describe the linear dependence of the norm of a vector-valued Calderón–Zygmund operator on the UMD constant of the underlying Banach space. In abstract UMD spaces, the linear bound has only been shown for the Beurling–Ahlfors transform and for some other special operators with even kernel such as certain Fourier multiplier operators (see [19]). It is also interesting to mention that, as was the case with the \(A_2\) theorem, the linear bound for the Beurling–Ahlfors transform has been known for some time, yet the possible linear dependence between the vector-valued Hilbert transform and the UMD constant is still a famous open problem (see [9, Problem O.6]). More recently, Pott and Stoica established in [26] the linear dependence of sufficiently smooth Banach space-valued even singular integrals on the UMD constant by showing such a linear estimate for symmetric dyadic shifts. Their estimate for dyadic shifts grows like \(2^{\max (i,j)/2}\) in terms of the parameters (ij) of the shifts. As explained in their work, to have convergence, one needs a decay factor \(2^{-s\max (i,j)}\), which is guaranteed by kernel smoothness \(s>\frac{1}{2}\) and only in dimension \(d=1\). It is interesting to notice that in most other applications of the dyadic representation theorem, notably to the weighted inequalities, the rate of convergence of the representation is irrelevant as long as it is exponential. Formally, the same argument should work in any dimension d assuming smoothness of order \(s>\frac{1}{2}d\), but the existing Haar dyadic representation can only “see” smoothness up to order \(s\le 1\); thus \(\frac{1}{2}d<s\le 1\) forces \(d=1\).

This motivated us to find a new version of the dyadic representation theorem with faster decay using smooth wavelets with compact support. Our main result is the following (see Sect. 2 for a precise definition of the wavelet shifts \(S^{ij}_{\omega }\) and the required regularity of the wavelets):

Theorem 1.1

Let \(s\in \mathbb {Z}_+\), and T be a bounded Calderón–Zygmund operator in \(L^2(\mathbb {R}^d)\) with a kernel satisfying \(|\partial ^{\alpha }K(x,y)|\le \Vert K\Vert _{CZ_s}|x-y|^{-d-|\alpha |}\) for every \(|\alpha |\le s\). In addition, suppose that \(T,T^*:\mathcal {P}_s\rightarrow \mathcal {P}_s\), where \(\mathcal {P}_s\) is the space of polynomials of degree less than s. Then for any given \(\epsilon >0\), T has an expansion, say for \(f,g\in C^1_c(\mathbb {R}^d)\),

$$\begin{aligned} \langle g,Tf \rangle =c\cdot \big (\Vert K\Vert _{CZ_s}+\Vert T\Vert _{L^2\rightarrow L^2}\big )\cdot \mathbb {E}_{\omega } \sum _{i,j=1}^{\infty }2^{-(s-\epsilon )\max (i,j)}\langle g,S^{ij}_{\omega }f \rangle , \end{aligned}$$

where c depends only on d, s and \(\epsilon \), \(\mathbb {E}_{\omega }\) is the expectation with respect to the random parameter \(\omega \), and \(S^{ij}_{\omega }\) is a version of a dyadic shift with parameters (ij) but using sufficiently regular wavelets in place of the Haar functions.

Having this result at our disposal, we can hope to extend the result of [26] to dimensions \(d>1\). We plan to address this question in future work. Another possible area of applications is numerical algorithms for singular integrals, as in [2], where an ancestor of the dyadic representation is used for this purpose. It is clear that, in such applications, a high rate of convergence would be preferred.

The interpretation of the assumption that T and \(T^*\) map the space of polynomials \(\mathcal {P}_s\) into itself is made rigorous in Sect. 3, where we restate Theorem 1.1 as Theorem 3.2. These are “special cancellation” or “vanishing paraproduct” assumptions that one might like to remove in future work.

Since the circulation of this work, Di Plinio et al. [5] have presented an alternative “representation theorem using smooth wavelets”, where they also deal with the paraproduct terms arising from more general cancellation assumptions; see also [4] for an extension of their representation to bi-linear operators. Their version is a closer relative of the continuous wavelet transform, in contrast to the semi-discrete representation in our Theorem 1.1.

The paper is organized as follows: in Sect. 2 we recall the necessary definitions and results that we are using. Section 3 is dedicated to a detailed statement of our main result (see Theorem 3.2). As in the case of the dyadic representations using Haar functions, our proof of Theorem 1.1/3.2 relies on an expansion (see Proposition 3.4) of the Calderón–Zygmund operator in terms of the (previously Haar, now smooth) wavelet basis, but the subsequent analysis of the expansion presents some significant departures from the Haar case. We split the series that appears in this expansion into five parts which are treated in Sect. 4.

1.1 Notation

Throughout the paper, we denote by cC constants that depend at most on some fixed parameters that should be clear from the context. The notation \(A\lesssim B\) means that \(A\le CB\) holds for such a constant C. Moreover, when Q is a cube and \(t>0\), then tQ represents the cube with the same centre and t times the sidelength of Q. Also, we make the convection that \(|\ |\) stands for the \(\ell ^{\infty }\) norm on \(\mathbb {R}^d\), i.e., \(|x|:=\max _{1\le i\le d}|x_i|\). While the choice of the norm is not particularly important, this choice is slightly more convenient than the usual Euclidean norm when dealing with cubes as we will: e.g., the diameter of a cube in the \(\ell ^{\infty }\) norm is equal to its sidelength \(\ell (Q)\).

2 Preliminaries

We recall the following from [8, Sect. 2].

The standard (or reference) system of dyadic cubes is

$$\begin{aligned} \mathscr {D}^0:=\{2^{-k}([0,1)^d+l):k\in \mathbb {Z},l\in \mathbb {Z}^d\}. \end{aligned}$$

We will need several dyadic systems, obtained by translating the reference system as follows. Let \(\omega =(\omega _j)_{j\in \mathbb {Z}}\in (\{0,1\}^d)^{\mathbb {Z}}\) and

$$\begin{aligned} I\dot{+}\omega :=I+\sum _{j:2^{-j}<\ell (I)}2^{-j}\omega _j. \end{aligned}$$

Then

$$\begin{aligned} \mathscr {D}^{\omega }:=\{I\dot{+}\omega :I\in \mathscr {D}^0\}, \end{aligned}$$

and it is straightforward to check that \(\mathscr {D}^{\omega }\) inherits the important nestedness property of \(\mathscr {D}^0\): if \(I,J\in \mathscr {D}^{\omega }\), then \(I\cap J\in \{I,J,\varnothing \}\). When the particular \(\omega \) is unimportant, the notation \(\mathscr {D}\) is sometimes used for a generic dyadic system.

2.1 Random Dyadic Systems; Good and Bad Cubes

We obtain a notion of random dyadic systems by equipping the parameter set \(\Omega :=(\{0,1\}^d)^{\mathbb {Z}}\) with the natural probability measure: each component \(\omega _j\) has an equal probability \(2^{-d}\) of taking any of the \(2^d\) values in \(\{0,1\}^d\), and all components are independent of each other. We denote by \(\mathbb {E}_{\omega }\) the expectation over the random variables \(\omega _j,j\in \mathbb {Z}\).

Consider the modulus of continuity \(\Theta (t)=t^\theta , \theta \in (0,1)\), for which we will formulate the notion of good and bad cubes. We also fix a (large) parameter \(r\in \mathbb {Z}_+\).

Definition 2.1

A cube \(I\in \mathscr {D}^{\omega }\) is called bad if there exists \(J\in \mathscr {D}^{\omega }\) such that \(\ell (J)\ge 2^r\ell (I)\) and

$$\begin{aligned} {\text {dist}}(I,\partial J)\le \Big (\frac{\ell (I)}{\ell (J)}\Big )^\theta \ell (J): \end{aligned}$$
(2.2)

roughly, I is relatively close to the boundary of a much bigger cube. A cube is called good if it is not bad.

We repeat from [8, Sect. 2.3] some basic probabilistic observations related to badness. Let \(I\in \mathscr {D}^0\) be a reference interval. The position of the translated interval

$$\begin{aligned} I\dot{+}\omega =I+\sum _{j:2^{-j}<\ell (I)}2^{-j}\omega _j, \end{aligned}$$

by definition, depends only on \(\omega _j\) for \(2^{-j}<\ell (I)\). On the other hand, the badness of \(I\dot{+}\omega \) depends on its relative position with respect to the bigger intervals

$$\begin{aligned} J\dot{+}\omega =J+\sum _{j:2^{-j}<\ell (I)}2^{-j}\omega _j+\sum _{j:\ell (I)\le 2^{-j}<\ell (J)}2^{-j}\omega _j. \end{aligned}$$

The same translation component \(\sum _{j:2^{-j}<\ell (I)}2^{-j}\omega _j\) appears in both \(I\dot{+}\omega \) and \(J\dot{+}\omega \), and so does not affect the relative position of these intervals. Thus this relative position, and hence the badness of I, depends only on \(\omega _j\) for \(2^{-j}\ge \ell (I)\). In particular:

Lemma 2.3

For \(I\in \mathscr {D}^0\), the position and badness of \(I\dot{+}\omega \) are independent random variables.

Another observation is the following: by symmetry and the fact that the condition of badness only involves relative position and size of different cubes, it readily follows that the probability of a particular cube \(I\dot{+}\omega \) being bad is equal for all cubes \(I\in \mathscr {D}^0\):

$$\begin{aligned} \mathbb {P}_{\omega }(I\dot{+}\omega {\text {bad}})=\pi _{{\text {bad}}}=\pi _{{\text {bad}}}(r,d,\theta ). \end{aligned}$$

The final observation concerns the value of this probability:

Lemma 2.4

We have

$$\begin{aligned} \pi _{{\text {bad}}}\le 8d\int \limits _0^{2^{-r}}t^\theta \frac{\,\mathrm {d}t}{t}=\frac{8d}{\theta }2^{-r\theta }; \end{aligned}$$

in particular, \(\pi _{{\text {bad}}}<1\) if \(r=r(d,\theta )\) is chosen large enough.

The proof of the previous lemma can be found in [8, Lemma 2.3].

2.2 Wavelet Functions

We introduce the notion of the smooth wavelet functions with compact support associated to any given dyadic system \(\mathscr {D}\). Such wavelets were originally constructed by I. Daubechies [3] but in this paper we will follow [18].

In [18, Chapter 3] one can find the construction of the smooth wavelets with compact support for \(d=1\). Moreover, once the 1-dimensional wavelets and the related father wavelets \(\psi ^0=\phi \) are available, the d-dimensional wavelets can be constructed by \(\psi ^{\eta }(x)=\prod _{i=1}^d\psi ^{\eta _i}(x_i)\), where \(\eta \in \{0,1\}^d\setminus \{0\}\) and we make the convention \(\psi ^1=\psi \).

Definition 2.5

We say that \(\big \{\psi _I^\eta \big \}_{I\in \mathscr {D},\eta \in \{0,1\}^d\setminus \{0\}}\) is a system of wavelets with parameters (muv) if

$$\begin{aligned} \psi _I^{\eta }(x):=2^{{dk}/2}\psi ^{\eta }(2^{k}x-l), \end{aligned}$$

for some d-dimensional wavelet \(\psi ^{\eta },\) \(I=2^{-k}([0,1)^d+l)\), and this collection has the following fundamental properties of a wavelet basis:

  1. (i)

    Being an orthonormal basis of \(L^2(\mathbb {R}^d)\)

  2. (ii)

    Localization: \({\text {supp}}\psi _I^\eta \subset mI\)

  3. (iii)

    Regularity: \(|\partial ^{\alpha }\psi _I^\eta |\le C{\ell (I)}^{-|\alpha |}|I|^{-1/2}\), for every multi-index \(\alpha \in \mathbb {N}\) of order \(|\alpha |\le u\)

  4. (iv)

    Cancellation: \(\int x^\alpha \psi _I^\eta (x)\,\mathrm {d}x=0\), when \(|\alpha |\le v.\)

Here \(u,v\in \mathbb {N}\) are two parameters that may or may not be equal. Note that Haar functions correspond to \(m=1\), \(u=v=0\), but in general \(m>1\).

For a fixed \(\mathscr {D}\), all the wavelet functions \(\psi _I^{\eta }\), \(I\in \mathscr {D}\) and \(\eta \in \{0,1\}^d\setminus \{0\}\), form an orthonormal basis of \(L^2(\mathbb {R}^d)\). Hence any function \(f\in L^2(\mathbb {R}^d)\) has the orthogonal expansion

$$\begin{aligned} f=\sum _{I\in \mathscr {D}}\sum _{\eta \in \{0,1\}^d\setminus \{0\}}\langle f,\psi _I^{\eta } \rangle \psi _I^{\eta }. \end{aligned}$$

Since the different \(\eta \)’s seldom play any major role, this will be often abbreviated (with slight abuse of language) simply as

$$\begin{aligned} f=\sum _{I\in \mathscr {D}}\langle f,\psi _I \rangle \psi _I, \end{aligned}$$

and the finite summation over \(\eta \) is understood implicitly.

2.3 Wavelet Shifts

A wavelet shift with parameters \(i,j\in \mathbb {N}:=\{0,1,2,\ldots \}\) is an operator of the form

$$\begin{aligned} Sf=\sum _{K\in \mathscr {D}}A_K f,\qquad A_K f=\sum _{\begin{array}{c} I,J\in \mathscr {D}:I,J\subseteq K \\ \ell (I)=2^{-i}\ell (K)\\ \ell (J)=2^{-j}\ell (K) \end{array}}a_{IJK}\langle f,\psi _I \rangle \psi _J, \end{aligned}$$

where \(\psi _I\) is a wavelet function on I (similarly \(\psi _J\)), and the \(a_{IJK}\) are coefficients with

$$\begin{aligned} |a_{IJK}|\le \frac{\sqrt{|I||J|}}{|K|}. \end{aligned}$$
(2.6)

Remark 2.7

The dyadic shifts considered in many other papers correspond to the special case of Haar wavelets.

The wavelet shift is called good if all dyadic cubes IJK such that \(a_{IJK}\ne 0\) satisfy \(mI, mJ\subset K\); otherwise, it is called bad. We note that this condition is automatic when \(m=1\), but not in general. Nevertheless, a closely related notion of good shifts already appeared in [7], where it played a certain role. This notion was not needed in the many works that appeared on this topic since [7]. The \(L^2\) boundedness of the good wavelet shift S is a consequence of the following facts:

Lemma 2.8

If S is a good wavelet shift then \(A_K\) indicates an “averaging operator” on K which satisfies:

$$\begin{aligned} |A_K f|\lesssim 1_K\frac{1}{|K|}\int \limits _K|f|. \end{aligned}$$

Proof

Since S is a good wavelet shift, if \(a_{IJK}\ne 0\) then \(mJ\subset K\) and \(mI\subset K\), for fixed \(m\ge 1\), i.e., mJ and mI are good cubes inside K.

Using the bound (2.6) for the coefficients \(a_{IJK}\), the regularity of \(\psi _I\), and the previous fact that \(mJ\subset K\), \(mI\subset K\), for fixed \(m\ge 1\), we have

$$\begin{aligned} \begin{aligned} |A_K f|&\lesssim \sum _{\begin{array}{c} I,J\in \mathscr {D}:I,J\subseteq K\\ \ell (I)=2^{-i}\ell (K)\\ \ell (J)=2^{-j}\ell (K) \end{array}}\frac{\sqrt{|I||J|}}{|K|}\frac{1_{mJ}}{\sqrt{|J|}}\cdot \int \frac{|f|1_{mI}}{\sqrt{|I|}}\\&=\frac{1}{|K|}\Big (\sum _{\begin{array}{c} J\in \mathscr {D}:J\subseteq K\\ \ell (J)=2^{-j}\ell (K) \end{array}}1_{mJ}\Big )\int |f|\Big (\sum _{\begin{array}{c} I\in \mathscr {D}:I\subseteq K\\ \ell (I)=2^{-i}\ell (K) \end{array}}1_{mI}\Big )\\&\lesssim 1_K\frac{1}{|K|}\int \limits _K|f|, \end{aligned} \end{aligned}$$

where the (easy to check) bounded overlap of the cubes mJ (respectively mI) was used in the last step. \(\square \)

Corollary 2.9

Let S be a good wavelet shift. The following estimate for the “averaging operator” \(A_K\) holds:

$$\begin{aligned} \Vert A_K f\Vert _{L^p}\lesssim \Vert f\Vert _{L^p},\quad \forall p\in [1,\infty ]. \end{aligned}$$

Proof

Applying the pointwise bound of Lemma 2.8 to each \(A_K\) we have

$$\begin{aligned} \Vert A_K f\Vert _{L^p}\lesssim \Big \Vert 1_K\frac{1}{|K|}\int _K|f|\Big \Vert _{L^p}\lesssim |K|^{1/p}\frac{1}{|K|}|K|^{1/p'}\Vert f\Vert _{L^p}=\Vert f\Vert _{L^p}. \end{aligned}$$

\(\square \)

Lemma 2.10

Let S be a good wavelet shift. Then

$$\begin{aligned} \Vert Sf\Vert _{L^2}\lesssim \Vert f\Vert _{L^2}. \end{aligned}$$

Proof

We use the orthonormality of the wavelet functions. Let

$$\begin{aligned} {\mathcal {H}}_{K}^{i}:={\text {span}}\{\psi _I:I\subseteq K, \ell (I)=2^{-i}\ell (K)\}, \end{aligned}$$

and let \(\mathbb {P}_K^i\) be the orthogonal projection of \(L^2\) onto this subspace. For a fixed i, these spaces are orthogonal, as K ranges over \(\mathscr {D}\).

We have \(\langle f,\psi _I \rangle =\langle \mathbb {P}_K^if,\psi _I \rangle \) for all I appearing in \(A_K\), and hence \(A_{K}f=A_K\mathbb {P}_K^if\). Also, \(\psi _J=\mathbb {P}_K^j\psi _J\) for all J appearing in \(A_K\), and hence \(A_{K}f=\mathbb {P}_K^{j}A_K{f}\). We can apply these identities and Pythagoras’ theorem to the result that:

$$\begin{aligned} \begin{aligned} \Vert Sf\Vert _{L^2}&=\Big \Vert \sum _{K\in \mathscr {D}}\mathbb {P}_K^{j}A_K\mathbb {P}_K^if\Big \Vert _{L^2}\\&=\Big (\sum _{K\in \mathscr {D}}\Vert \mathbb {P}_K^{j}A_K\mathbb {P}_K^if\Vert _{L^2}^2\Big )^{1/2}\\&\lesssim \Big (\sum _{K\in \mathscr {D}}\Vert \mathbb {P}_K^{i}f\Vert _{L^2}^2\Big )^{1/2}\\&\lesssim \Vert f\Vert _{L^2}, \end{aligned} \end{aligned}$$

where we used the \(L^2\) boundedness of \(A_K\) from Corollary 2.9 in the second-to-last step. \(\square \)

3 The Dyadic Representation Theorem for Smooth Compactly Supported Wavelets

Let T be a Calderón–Zygmund operator on \(\mathbb {R}^d\). That is, it acts on a suitable dense subspace of functions in \(L^2(\mathbb {R}^d)\) (for the present purposes, this class should at least contain the indicators of cubes in \(\mathbb {R}^d\)) and has the kernel representation

$$\begin{aligned} Tf(x)=\int \limits _{\mathbb {R}^d}K(x,y)f(y)\,\mathrm {d}y,\qquad x\notin {\text {supp}}f. \end{aligned}$$

Moreover, we assume that the kernel is s-times differentiable and satisfies the higher order standard estimate:

$$\begin{aligned} \begin{aligned} |\partial ^{\alpha }K(x,y)|&\le \frac{C_1}{|x-y|^{d+|\alpha |}} \end{aligned} \end{aligned}$$
(3.1)

for all \(x,y\in \mathbb {R}^d\), \(x\ne y\), \(\alpha \in \mathbb {N}\) and \(|\alpha |\le s\). Let us denote the smallest admissible constant \(C_1\) by \(\Vert K\Vert _{CZ_s}\).

We say that T is a bounded Calderón–Zygmund operator, if in addition \(T:L^2(\mathbb {R}^d)\rightarrow L^2(\mathbb {R}^d)\), and we denote its operator norm by \(\Vert T\Vert _{L^2\rightarrow L^2}\).

Under such assumptions, we can also define the action of T on the space \(\mathcal {P}_s\) of polynomials of degree less than s. This is well known, and the reader can consult [28] (see also [27]) for a comprehensive discussion. The necessary set-up for our needs is as follows:

If \(\psi \in C_c^s(B(0,R))\) satisfies \(\int _{\mathbb {R}^d}P\psi =0\) for all \(P\in \mathcal {P}_s\), then we have, for \(|x|>2R\),

$$\begin{aligned} T\psi (x)=\int \limits _{\mathbb {R}^d} K(x,y)\psi (y)\,\mathrm {d}y =\int \limits _{B(0,R)} \Big (K(x,y)-\sum _{\mathop {\begin{array}{c} 0\le |\alpha |<s \end{array}}}\frac{y^{\alpha }}{\alpha !}\partial _2^{\alpha }K(x,0)\Big )\psi (y)\,\mathrm {d}y \end{aligned}$$

and hence

$$\begin{aligned} |T\psi (x)|\le \int \limits _{B(0,R)} s\Big (\int \limits _{0}^{1}\sum _{\begin{array}{c} |\alpha |=s \end{array}}\frac{|y|^{|\alpha |}}{\alpha !}|\partial _2^{\alpha }K(x,ty)|(1-t)^{s-1}\,\mathrm {d}t\Big )|\psi (y)|\,\mathrm {d}y \\ \lesssim \Vert K\Vert _{CZ_s}\int \limits _{B(0,R)} R^s\Big (\int \limits _0^1\frac{s}{|x-ty|^{d+s}}(1-t)^{s-1}\,\mathrm {d}t\Big )|\psi (y)|\,\mathrm {d}y \\ \lesssim \Vert K\Vert _{CZ_s}\frac{R^s}{|x|^{d+s}}\Vert \psi \Vert _{1}. \end{aligned}$$

This expression is integrable against any \(P\in \mathcal {P}_s\) over the region \(B(0,2R)^c\). On the other hand, it is clear that \(T\psi \in L^2(\mathbb {R}^d)\subset L^1_{{\text {loc}}}(\mathbb {R}^d)\) is also integrable against \(P\in \mathcal {P}_s \subset L^\infty _{{\text {loc}}}(\mathbb {R}^d)\) over B(0, 2R). Thus

$$\begin{aligned} \langle T^* P,\psi \rangle :=\langle P,T\psi \rangle :=\int \limits _{\mathbb {R}^d}P(x)T\psi (x)\,\mathrm {d}x \end{aligned}$$

is well defined for every \(P\in \mathcal {P}_s\) and every \(\psi \in C_c^s(\mathbb {R}^d)\) that is orthogonal to \(\mathcal {P}_s\). This defines \(T^*P\) as a functional on the said subspace of \(C_c^s(\mathbb {R}^d)\), and the definition of TP can be given in a similar way, since the adjoint \(T^*\) satisfies the same assumptions.

We say that T maps \(\mathcal {P}_s\) into itself, if \(\langle TP,\psi \rangle =0\) for all \(P\in \mathcal {P}_s\) and all \(\psi \in C_c^s(\mathbb {R}^d)\) that are orthogonal to \(\mathcal {P}_s\).

Here is our main result:

Theorem 3.2

Let T be a bounded Calderón–Zygmund operator with a kernel satisfying (3.1) and suppose that both T and \(T^*\) map \(\mathcal {P}_s\) into itself, in the sense defined above. Moreover, let the wavelet \(\psi _I\) satisfy the regularity and cancellation property for \(u=s\) and \(v=s-1\), respectively. Then for any given \(\epsilon >0\), T has an expansion, say for \(f,g\in C^1_c(\mathbb {R}^d)\),

$$\begin{aligned} \langle g,Tf \rangle =c\cdot \big (\Vert K\Vert _{CZ_s}+\Vert T\Vert _{L^2\rightarrow L^2}\big )\cdot \mathbb {E}_{\omega } \sum _{i,j=1}^{\infty }2^{-(s-\epsilon )\max (i,j)}\langle g,S^{ij}_{\omega }f \rangle , \end{aligned}$$

where c depends only on d, s and \(\epsilon \), and \(S^{ij}_{\omega }\) is a good wavelet shift of parameters (ij) on the dyadic system \(\mathscr {D}^{\omega }\).

The following remark shows that the assumption that T and \(T^*\) map \(\mathcal {P}_s\) into itself follows from the other assumptions of Theorem 3.2, if in addition the operator T is translation invariant.

Remark 3.3

Let T be a bounded Calderón–Zygmund operator with a kernel satisfying (3.1) and suppose in addition that T is translation invariant. Then both T and \(T^*\) map \(\mathcal {P}_s\) into itself.

Proof

It is enough to consider just T, since all the assumptions, and hence the conclusions, pass to the adjoint \(T^*\). For the result concerning T, we refer the reader to [28, Proposition 2.2.17]. \(\square \)

A key to the proof of the dyadic representation is a random expansion of T in terms of wavelet functions \(\psi _I\), where the bad cubes are avoided:

Proposition 3.4

Let \(T\in \mathscr {L}(L^2(\mathbb {R}^d))\) and \(f\in C^1_c(\mathbb {R}^d)\), \(g\in C^1_c(\mathbb {R}^d)\). Then the following representation is valid:

$$\begin{aligned} \langle g,Tf \rangle =\frac{1}{\pi _{{\text {good}}}}\mathbb {E}_{\omega }\sum _{I,J\in \mathscr {D}^{\omega }}1_{{\text {good}}}({\text {smaller}}\{I,J\})\cdot \langle g,\psi _{J} \rangle \langle \psi _{J},T\psi _{I} \rangle \langle \psi _{I},f \rangle , \end{aligned}$$

where

$$\begin{aligned} {\text {smaller}}\{I,J\}:={\left\{ \begin{array}{ll} I &{} \text {if }\ell (I)\le \ell (J) \\ J &{} \text {if }\ell (I)>\ell (J), \end{array}\right. } \end{aligned}$$

and \(\pi _{{\text {good}}}:=1-\pi _{{\text {bad}}}>0\).

Proof

The proof is analogous to the one given in [8, Proposition 3.5], replacing the Haar functions \(h_I\) and \(h_J\) should be replaced by the wavelet functions \(\psi _I\) and \(\psi _J\), respectively. We provide the details for the convenience of the reader.

Recall that

$$\begin{aligned} f=\sum _{I\in \mathscr {D}^0}\langle f,\psi _{I\dot{+}\omega } \rangle \psi _{I\dot{+}\omega } \end{aligned}$$

for any fixed \(\omega \in \Omega \); and we can also take the expectation \(\mathbb {E}_{\omega }\) of both sides of this identity.

Let

$$\begin{aligned} 1_{{\text {good}}}(I\dot{+}\omega ):={\left\{ \begin{array}{ll} 1, &{} \text {if }I\dot{+}\omega \ \text {is good},\\ 0, &{} \text {else}\end{array}\right. } \end{aligned}$$

We make use of the above random wavelet expansion of f, multiply and divide by

$$\begin{aligned} \pi _{{\text {good}}}=\mathbb {P}_{\omega }(I\dot{+}\omega {\text {good}})=\mathbb {E}_{\omega }1_{{\text {good}}}(I\dot{+}\omega ), \end{aligned}$$

and use the independence from Lemma 2.3 to get:

$$\begin{aligned} \begin{aligned} \langle g,Tf \rangle&=\mathbb {E}_{\omega }\sum _{I}\langle g,T\psi _{I\dot{+}\omega } \rangle \langle \psi _{I\dot{+}\omega },f \rangle \\&=\frac{1}{\pi _{{\text {good}}}}\sum _{I}\mathbb {E}_{\omega }[1_{{\text {good}}}(I\dot{+}\omega )] \mathbb {E}_{\omega }[\langle g,T\psi _{I\dot{+}\omega } \rangle \langle \psi _{I\dot{+}\omega },f \rangle ] \\&=\frac{1}{\pi _{{\text {good}}}}\mathbb {E}_{\omega }\sum _{I}1_{{\text {good}}}(I\dot{+}\omega ) \langle g,T\psi _{I\dot{+}\omega } \rangle \langle \psi _{I\dot{+}\omega },f \rangle \\&=\frac{1}{\pi _{{\text {good}}}}\mathbb {E}_{\omega }\sum _{I,J}1_{{\text {good}}}(I\dot{+}\omega ) \langle g,\psi _{J\dot{+}\omega } \rangle \langle \psi _{J\dot{+}\omega },T\psi _{I\dot{+}\omega } \rangle \langle \psi _{I\dot{+}\omega },f \rangle . \end{aligned} \end{aligned}$$

On the other hand, using independence again in half of this double sum, we have

$$\begin{aligned} \begin{aligned}&\frac{1}{\pi _{{\text {good}}}}\sum _{\ell (I)>\ell (J)}\mathbb {E}_{\omega }[1_{{\text {good}}}(I\dot{+}\omega ) \langle g,\psi _{J\dot{+}\omega } \rangle \langle \psi _{J\dot{+}\omega },T\psi _{I\dot{+}\omega } \rangle \langle \psi _{I\dot{+}\omega },f \rangle ] \\&\quad =\frac{1}{\pi _{{\text {good}}}}\sum _{\ell (I)>\ell (J)}\mathbb {E}_{\omega }[1_{{\text {good}}}(I\dot{+}\omega )] \mathbb {E}_{\omega }[ \langle g,\psi _{J\dot{+}\omega } \rangle \langle \psi _{J\dot{+}\omega },T\psi _{I\dot{+}\omega } \rangle \langle \psi _{I\dot{+}\omega },f \rangle ] \\&\quad = \mathbb {E}_{\omega }\sum _{\ell (I)>\ell (J)} \langle g,\psi _{J\dot{+}\omega } \rangle \langle \psi _{J\dot{+}\omega },T\psi _{I\dot{+}\omega } \rangle \langle \psi _{I\dot{+}\omega },f \rangle , \end{aligned} \end{aligned}$$

and hence

$$\begin{aligned} \begin{aligned} \langle g,Tf \rangle&= \frac{1}{\pi _{{\text {good}}}}\mathbb {E}_{\omega }\sum _{\ell (I)\le \ell (J)} 1_{{\text {good}}}(I\dot{+}\omega ) \langle g,\psi _{J\dot{+}\omega } \rangle \langle \psi _{J\dot{+}\omega },T\psi _{I\dot{+}\omega } \rangle \langle \psi _{I\dot{+}\omega },f \rangle \\&\qquad +\mathbb {E}_{\omega }\sum _{\ell (I)>\ell (J)} \langle g,\psi _{J\dot{+}\omega } \rangle \langle \psi _{J\dot{+}\omega },T\psi _{I\dot{+}\omega } \rangle \langle \psi _{I\dot{+}\omega },f \rangle . \end{aligned} \end{aligned}$$

Comparison with the basic identity

$$\begin{aligned} \langle g,Tf \rangle =\mathbb {E}_{\omega }\sum _{I,J}\langle g,\psi _{J\dot{+}\omega } \rangle \langle \psi _{J\dot{+}\omega },T\psi _{I\dot{+}\omega } \rangle \langle \psi _{I\dot{+}\omega },f \rangle \end{aligned}$$
(3.5)

shows that

$$\begin{aligned} \begin{aligned}&\mathbb {E}_{\omega }\sum _{\ell (I)\le \ell (J)} \langle g,\psi _{J\dot{+}\omega } \rangle \langle \psi _{J\dot{+}\omega },T\psi _{I\dot{+}\omega } \rangle \langle \psi _{I\dot{+}\omega },f \rangle \\&\quad = \frac{1}{\pi _{{\text {good}}}}\mathbb {E}_{\omega }\sum _{\ell (I)\le \ell (J)} 1_{{\text {good}}}(I\dot{+}\omega ) \langle g,\psi _{J\dot{+}\omega } \rangle \langle \psi _{J\dot{+}\omega },T\psi _{I\dot{+}\omega } \rangle \langle \psi _{I\dot{+}\omega },f \rangle . \end{aligned} \end{aligned}$$

Symmetrically, we also have

$$\begin{aligned} \begin{aligned}&\mathbb {E}_{\omega }\sum _{\ell (I)>\ell (J)} \langle g,\psi _{J\dot{+}\omega } \rangle \langle \psi _{J\dot{+}\omega },T\psi _{I\dot{+}\omega } \rangle \langle \psi _{I\dot{+}\omega },f \rangle \\&\quad = \frac{1}{\pi _{{\text {good}}}}\mathbb {E}_{\omega }\sum _{\ell (I)>\ell (J)} 1_{{\text {good}}}(J\dot{+}\omega ) \langle g,\psi _{J\dot{+}\omega } \rangle \langle \psi _{J\dot{+}\omega },T\psi _{I\dot{+}\omega } \rangle \langle \psi _{I\dot{+}\omega },f \rangle , \end{aligned} \end{aligned}$$

and this completes the proof. \(\square \)

For the analysis of the series appearing in Proposition 3.4 we recall the notion of the long distance [20, Definition 6.3]

$$\begin{aligned} D(I,J):=\ell (I)+{\text {dist}}(I,J)+\ell (J). \end{aligned}$$

We focus on the summation inside \(\mathbb {E}_{\omega }\), for a fixed value of \(\omega \in \Omega \), and manipulate it into the required form. Moreover, we will focus on the half of the sum with \(\ell (J)\ge \ell (I)\), the other half being handled symmetrically. We further divide this sum into the following parts:

$$\begin{aligned} \begin{aligned} \sum _{\ell (I)\le \ell (J)}&=\sum _{\begin{array}{c} {\text {dist}}(I,J)>\ell (J)(\ell (I)/\ell (J))^\theta \\ {\text {dist}}(mI,mJ)>\frac{1}{2}D(I,J) \end{array}}+\sum _{\begin{array}{c} {\text {dist}}(I,J)>\ell (J)(\ell (I)/\ell (J))^\theta \\ {\text {dist}}(mI,mJ)\le \frac{1}{2}D(I,J) \end{array}}+\sum _{I\subsetneq J}+\sum _{I=J}\\&\quad +\sum _{\begin{array}{c} {\text {dist}}(I,J)\le \ell (J)(\ell (I)/\ell (J))^\theta \\ I\cap J=\varnothing \end{array}}\\&=:\sigma _{{\text {far}}}+\sigma _{{\text {between}}}+\sigma _{{\text {in}}}+\sigma _{=}+\sigma _{{\text {near}}}. \end{aligned} \end{aligned}$$

We observe that the main difference in the division of the previous sum and the one in [8, after the Proposition 3.5] is that the sum \(\sigma _{{\text {out}}}\) in [8] has been split into \(\sigma _{{\text {far}}}\) and \(\sigma _{{\text {between}}}\), which are handled differently. Regarding the sum \(\sigma _{{\text {in}}}\) we will not use the same method as in [8, Sect. 3.2]. The sums \(\sigma _{{\text {=}}}\) and \(\sigma _{{\text {near}}}\) will be treated in a similar but not exactly the same way as in [8, Sect. 3.3].

In order to recognize these series as sums of good wavelet shifts, we need to find, for each pair (IJ) appearing here, a common dyadic ancestor which contains mI and mJ. The following lemma provides the existence of such containing cubes, with control on their size:

Lemma 3.6

If \(I\in \mathscr {D}\) is good and \(J\in \mathscr {D}\) is a cube with \(\ell (J)\ge \ell (I)\), then there exists \(K\supseteq mI\cup mJ\) which satisfies

$$\begin{aligned} \begin{aligned} \ell (K)\Big (\frac{\ell (I)}{\ell (K)}\Big )^\theta&\lesssim D(I,J),\qquad \text {always,}\qquad \qquad \text {and}\\ \ell (K)\lesssim \ell (I),\qquad \text {if}\qquad {\text {dist}}(I,J)&\le \ell (J)\Big (\frac{\ell (I)}{\ell (J)}\Big )^\theta \qquad \text {and}\qquad J\cap I=\varnothing . \end{aligned} \end{aligned}$$

Proof

Let us start with the following initial observation: if \(I\in \mathscr {D}\) is good and \(K\in \mathscr {D}\) satisfies \(I\subseteq K\), and \(\ell (K)\ge 2^r\ell (I)\), then

$$\begin{aligned} {\text {dist}}(I,K^c)={\text {dist}}(I,\partial K)>\ell (K)\Big (\frac{\ell (I)}{\ell (K)}\Big )^\theta =\ell (K)^{1-\theta }\ell (I)^{\theta }\ge 2^{r(1-\theta )}\ell (I)>m\ell (I), \end{aligned}$$

when r is large enough. Hence \(mI\subseteq K\), and we can proceed with the proof of \(mJ\subseteq K\). Using an elementary triangle inequality we estimate \({\text {dist}}(I,K^c)\) in the following way:

$$\begin{aligned} \begin{aligned} {\text {dist}}(I,K^c)&\le {\text {dist}}(I,mJ)+\ell (mJ)+{\text {dist}}(mJ,K^c)\\&\le {\text {dist}}(I,J)+m\ell (J)+{\text {dist}}(mJ,K^c). \end{aligned} \end{aligned}$$

Thus,

$$\begin{aligned} \begin{aligned} {\text {dist}}(mJ,K^c)&\ge {\text {dist}}(I,K^c)-{\text {dist}}(I,J)-m\ell (J)\\&>\ell (K)\Big (\frac{\ell (I)}{\ell (K)}\Big )^\theta -{\text {dist}}(I,J)-m\ell (J). \end{aligned} \end{aligned}$$
(3.7)

In order to conclude that \(mJ\subseteq K\) we want the right hand side of (3.7) to be non-negative. This is achieved by taking the smallest \(\ell (K)\) such that

$$\begin{aligned} \ell (K)\Big (\frac{\ell (I)}{\ell (K)}\Big )^\theta \ge {\text {dist}}(I,J)+m\ell (J). \end{aligned}$$

Then, in fact

$$\begin{aligned} \ell (K)\Big (\frac{\ell (I)}{\ell (K)}\Big )^\theta \lesssim {\text {dist}}(I,J)+m\ell (J)\lesssim {\text {dist}}(I,J)+\ell (J). \end{aligned}$$
(3.8)

Hence,

$$\begin{aligned} \ell (K)\Big (\frac{\ell (I)}{\ell (K)}\Big )^\theta \lesssim D(I,J). \end{aligned}$$

This proves the first estimate.

Case \({\text {dist}}(I,J)\le \ell (J)(\ell (I)/\ell (J))^\theta \) and \(I\cap J=\varnothing \): As \(I\cap J=\varnothing \), we have \({\text {dist}}(I,J)={\text {dist}}(I,\partial J)\), and since I is good, this implies \(\ell (J)<2^r\ell (I)\). We can then dominate the right hand side of (3.8) by

$$\begin{aligned} \ell (J)(\ell (I)/\ell (J))^\theta +\ell (J)\lesssim \ell (J)\lesssim \ell (I). \end{aligned}$$
(3.9)

Thus, from (3.8) and (3.9) we have

$$\begin{aligned} \frac{\ell (K)}{\ell (I)}\Big (\frac{\ell (I)}{\ell (K)}\Big )^\theta \lesssim 1\quad \text {and}\quad \ell (K)\lesssim \ell (I), \end{aligned}$$

so this proves the second estimate. \(\square \)

We denote the minimal such K by \(I\vee J\), thus

$$\begin{aligned} I\vee J:=\bigcap _{K\supseteq mI\cup mJ} K. \end{aligned}$$

4 Estimates for the Different Sums \(\sigma _{{\text {far}}}, \sigma _{{\text {between}}}, \sigma _{{\text {in}}}, \sigma _{=}, \sigma _{{\text {near}}}\)

4.1 Far Away Cubes, \(\sigma _{{\text {far}}}\)

We reorganize the sum \(\sigma _{{\text {far}}}\) with respect to the new summation variable \(K=I\vee J\), as well as the relative size of I and J with respect to K:

$$\begin{aligned} \sigma _{{\text {far}}}=\sum _{j=1}^{\infty }\sum _{i=j}^{\infty }\sum _K\sum _{\begin{array}{c} I,J:{\text {dist}}(I,J)>\ell (J)(\ell (I)/\ell (J))^\theta \\ {\text {dist}}(mI,mJ)>\frac{1}{2}D(I,J)\\ I\vee J=K\\ \ell (I)=2^{-i}\ell (K),\ell (J)=2^{-j}\ell (K) \end{array}}. \end{aligned}$$

Note that we can start the summation from 1 instead of 0, since the disjointness of I and J implies that \(K=I\vee J\) must be strictly larger than either of I and J. The goal is to identify the quantity in parentheses as a decaying factor times an averaging operator with parameters (ij). The proof of the following lemma is similar to [8, Lemma 3.8] but to make use of the smoothness, we subtract a higher order Taylor expansion of the kernel K instead of \(y\mapsto K(x,y)\) at \(y=c_I\).

Lemma 4.1

For I and J appearing in \(\sigma _{{\text {far}}}\), we have

$$\begin{aligned} |\langle \psi _J,T\psi _I \rangle |\lesssim \Vert K\Vert _{CZ_s}\frac{\sqrt{|I||J|}}{|K|}\Big (\frac{\ell (I)}{\ell (K)}\Big )^{-\theta (d+s)}\Big (\frac{\ell (I)}{\ell (K)}\Big )^{s}, \end{aligned}$$

where \(K=I\vee J\) and \(\theta \in (0,1)\).

Proof

Using the properties of \(\psi _I\), Taylor series of order s of \(y\mapsto K(x,y)\) at the centre point \(y=c_{I}\) of I, higher order standard estimate of the kernel (3.1), and Lemma 3.6

$$\begin{aligned}&|\langle \psi _J,T\psi _I \rangle |=\Big |\iint \psi _J(x)K(x,y)\psi _I(y)\,\mathrm {d}y\,\mathrm {d}x\Big |\\&\quad =\Big |\iint \psi _J(x)\Big (K(x,y)-\sum _{\mathop {\begin{array}{c} 0\le |\alpha |<s \end{array}}}\frac{(y-c_I)^{\alpha }}{\alpha !}\partial _2^{\alpha }K(x,c_I)\Big )\psi _I(y)\,\mathrm {d}y\,\mathrm {d}x\Big |\\&\quad \le \iint s|\psi _{J}(x)|\Big (\int \limits _{0}^{1}\sum _{\begin{array}{c} |\alpha |=s \end{array}}\frac{|y-c_I|^{|\alpha |}}{\alpha !}|\partial _2^{\alpha }K(x,ty+(1-t)c_I)|(1-t)^{s-1}\,\mathrm {d}t\Big )|\psi _{I}(y)|\,\mathrm {d}y\,\mathrm {d}x\\&\quad \lesssim \Vert K\Vert _{CZ_s}\iint |\psi _{J}(x)|\ell (I)^s\Big (\int \limits _0^1\frac{s}{|x-(c_I+t(y-c_I))|^{d+s}}(1-t)^{s-1}\,\mathrm {d}t\Big )|\psi _{I}(y)|\,\mathrm {d}y\,\mathrm {d}x\\&\quad \lesssim \Vert K\Vert _{CZ_s}\frac{\ell (I)^s}{{\text {dist}}(mI,mJ)^{d+s}}\Vert \psi _J\Vert _{1}\Vert \psi _I\Vert _{1}\\&\quad \lesssim \Vert K\Vert _{CZ_s}\frac{\ell (I)^s}{D(I,J)^{d+s}}\Vert \psi _J\Vert _{1}\Vert \psi _I\Vert _{1}\\&\quad \lesssim \Vert K\Vert _{CZ_s}\frac{\ell (I)^s}{\ell (K)^{d+s}}\Big (\frac{\ell (I)}{\ell (K)}\Big )^{-\theta (d+s)}\Vert \psi _J\Vert _{1}\Vert \psi _I\Vert _{1}\\&\quad \lesssim \Vert K\Vert _{CZ_s}\frac{1}{\ell (K)^d}\Big (\frac{\ell (I)}{\ell (K)}\Big )^s\Big (\frac{\ell (I)}{\ell (K)}\Big )^{-\theta (d+s)}|mJ||mI||J|^{-\frac{1}{2}}|I|^{-\frac{1}{2}}\\&\quad \lesssim \Vert K\Vert _{CZ_s}\frac{1}{\ell (K)^d}\Big (\frac{\ell (I)}{\ell (K)}\Big )^s\Big (\frac{\ell (I)}{\ell (K)}\Big )^{-\theta (d+s)}\sqrt{|J|}\sqrt{|I|}. \end{aligned}$$

\(\square \)

Lemma 4.2

$$\begin{aligned} \begin{aligned} \sum _{\begin{array}{c} I,J:{\text {dist}}(I,J)>\ell (J)(\ell (I)/\ell (J))^\theta \\ {\text {dist}}(mI,mJ)>\frac{1}{2}D(I,J)\\ I\vee J=K\\ \ell (I)=2^{-i}\ell (K)\le \ell (J)=2^{-j}\ell (K) \end{array}}&1_{{\text {good}}}(I)\cdot \langle g,\psi _{J} \rangle \langle \psi _{J},T\psi _{I} \rangle \langle \psi _{I},f \rangle \\&=c\Vert K\Vert _{CZ_s}2^{i\theta (d+s)}2^{-is}\langle g,A_K^{ij}f \rangle , \end{aligned} \end{aligned}$$

where \(\theta \in (0,1)\) and \(A_K^{ij}\) is an averaging operator with parameters (ij).

Proof

By Lemma 4.1, substituting \(\ell (I)/\ell (K)=2^{-i}\),

$$\begin{aligned} |\langle \psi _J,T\psi _I \rangle |\lesssim \Vert K\Vert _{CZ_s}\frac{\sqrt{|I||J|}}{|K|}2^{i\theta (d+s)}2^{-is}, \end{aligned}$$

and the first factor is precisely the required size of the coefficients of \(A_K^{ij}\). \(\square \)

Summarizing, we have

$$\begin{aligned} \sigma _{{\text {far}}}=c\Vert K\Vert _{CZ_s}\sum _{j=1}^{\infty }\sum _{i=j}^{\infty }2^{-i(s-\epsilon )}\langle g,S^{ij}f \rangle , \end{aligned}$$

where we choose \(\theta =\frac{\epsilon }{d+s}\) for any given \(\epsilon >0\) and \(S^{ij}\) is a good wavelet shift with parameters (ij).

4.2 Intermediate Cubes,

\(\sigma _{{\text {between}}}\) Let \(M>m\). In this part, we make use of the fact that \(\psi _J\) has a Taylor series of order s at the centre point \(c_I\) of I and we denote

$$\begin{aligned} \text {Tayl}_{s}(\psi _J,c_I):=\sum _{\begin{array}{c} 0\le |\alpha |<s \end{array}}\frac{(x-c_I)^{\alpha }}{\alpha !}\partial ^{\alpha }\psi _J(c_I). \end{aligned}$$

We drop \(c_I\), when it is clear from the context. Thus, we have

$$\begin{aligned} \langle \psi _J,T\psi _I \rangle =\langle \psi _J-\text {Tayl}_{s}(\psi _J,c_I),T\psi _I \rangle +\langle \text {Tayl}_{s}(\psi _J,c_I),T\psi _I \rangle . \end{aligned}$$
(4.3)

Observe that due to the hypothesis of Theorem 3.2 that the operators \(T,T^*\) map \(\mathcal {P}_s\) to itself, and by the cancellation of \(\psi _I\) the last term of (4.3) vanishes. The first term of (4.3) we can further split as

$$\begin{aligned} \begin{aligned} \langle \psi _J,T\psi _I \rangle&=\langle 1_{(MI)^c}(\psi _J-\text {Tayl}_{s}(\psi _J,c_I)),T\psi _I \rangle \\&\qquad +\langle 1_{MI}(\psi _J-\text {Tayl}_{s}(\psi _J,c_I)),T\psi _I \rangle . \end{aligned} \end{aligned}$$
(4.4)

In the following we estimate the remaining non-vanishing terms of (4.4). For these terms, we obtain estimates that do not depend on the fact that we are dealing with the intermediate cubes, and in fact we will use these same estimates again to deal with \(\sigma _{{\text {in}}}\).

Lemma 4.5

For all \(I,J\in \mathscr {D}\) such that \(\ell (I)\le \ell (J)\), we have

$$\begin{aligned} |\langle 1_{(MI)^c}(\psi _J-\text {Tayl}_{s}(\psi _J,c_I)),T\psi _I \rangle |\lesssim \Vert K\Vert _{CZ_s}\Big (\frac{|I|}{|J|}\Big )^{1/2}\Psi \Big (\frac{\ell (I)}{\ell (J)}\Big ), \end{aligned}$$

where

$$\begin{aligned} \Psi (t):=t^s\Big (\log \frac{1}{t}+1\Big )\lesssim t^{s-\epsilon }\quad \text {for any given}\quad \epsilon >0\quad \text {and}\quad t\in (0,1]. \end{aligned}$$

Proof

Let us denote \(\text {Tayl}_{s}(K):=\sum _{\begin{array}{c} 0\le |\alpha |<s \end{array}}\frac{(y-c_I)^{\alpha }}{\alpha !}\partial ^{\alpha }_2K(x,c_I)\). Using the cancellation of \(\psi _I\), the Taylor series of order s of \(y\mapsto K(x,y)\) at the centre point \(y=c_{I}\) of I and the higher order standard estimate of the kernel (3.1)

$$\begin{aligned}&|\langle 1_{(MI)^c}(\psi _J-\text {Tayl}_{s}(\psi _J,c_I)),T\psi _I \rangle |\nonumber \\&\quad \le \iint 1_{(MI)^{c}}(x)|\psi _J(x)-\text {Tayl}_{s}(\psi _J,c_I)||K(x,y)-\text {Tayl}_{s}(K)||\psi _{I}(y)|\,\mathrm {d}y\,\mathrm {d}x\nonumber \\&\quad \lesssim \Vert K\Vert _{CZ_s}\ell (I)^s\Vert \psi _I\Vert _1\int 1_{(MI)^c}(x)\frac{|\psi _J(x)-\text {Tayl}_{s}(\psi _J,c_I)|}{{\text {dist}}(x,mI)^{d+s}}\,\mathrm {d}x. \end{aligned}$$
(4.6)

Now, using the regularity property and the Taylor series of order s of \(\psi _J\) at the centre point \(c_I\) of I we derive the following two estimates:

4.2.1 Estimate 1

$$\begin{aligned} \begin{aligned} |\psi _J(x)-\text {Tayl}_{s}(\psi _J,c_I)|&\le \Vert \psi _J\Vert _{\infty }+\sum _{\mathop {\begin{array}{c} 0\le |\alpha |<s \end{array}}}\frac{|x-c_I|^{|\alpha |}}{\alpha !}\Vert \partial ^{\alpha }\psi _J\Vert _{\infty }\\&\lesssim |J|^{-1/2}\Big (\sum _{\begin{array}{c} a<s \end{array}}\frac{{\text {dist}}(x,mI)^{a}}{\ell (J)^a}\Big ), \end{aligned} \end{aligned}$$

where \(a=|\alpha |\in \mathbb {N}\).

4.2.2 Estimate 2

$$\begin{aligned} \begin{aligned} |\psi _J(x)-\text {Tayl}_{s}(\psi _J,c_I)|&\le s\int \limits _{0}^{1}\sum _{\mathop {\begin{array}{c} |\alpha |=s \end{array}}}\frac{|x-c_I|^{|\alpha |}}{\alpha !}|\partial ^{\alpha }\psi _{J}(tx+(1-t)c_{I})|(1-t)^{s-1}\,\mathrm {d}t\\&\lesssim |x-c_I|^s\Vert \partial ^{s}\psi _J\Vert _{\infty }\\&\lesssim |J|^{-1/2}\frac{{{\text {dist}}(x,mI)}^s}{\ell (J)^s}. \end{aligned} \end{aligned}$$

Thus, the right hand side of (4.6) is dominated by

$$\begin{aligned} \begin{aligned}&\lesssim \Vert K\Vert _{CZ_s}\ell (I)^s\frac{\Vert \psi _I\Vert _1}{|J|^{1/2}}\Big (\int \limits _{\begin{array}{c} (MI)^c\\ {\text {dist}}(x,mI)\le \ell (J) \end{array}}\frac{{\text {dist}}(x,mI)^s}{\ell (J)^s}\frac{1}{{\text {dist}}(x,mI)^{d+s}}\,\mathrm {d}x\\&\qquad +\int \limits _{{\text {dist}}(x,mI)>\ell (J)}\frac{{\text {dist}}(x,mI)^{s-1}}{\ell (J)^{s-1}}\frac{1}{{\text {dist}}(x,mI)^{d+s}}\,\mathrm {d}x\Big )\\&\lesssim \Vert K\Vert _{CZ_s}\ell (I)^s\Big (\frac{|I|}{|J|}\Big )^{1/2}\Big (\int \limits _{\ell (I)}^{\ell (J)}\frac{1}{\ell (J)^s}\frac{1}{t}\,\mathrm {d}t +\int \limits _{\ell (J)}^{\infty }\frac{1}{\ell (J)^{s-1}}\frac{1}{t^2}\,\mathrm {d}t\Big )\\&=\Vert K\Vert _{CZ_s}\ell (I)^s\Big (\frac{|I|}{|J|}\Big )^{1/2}\Big (\frac{1}{\ell (J)^s}\log \frac{\ell (J)}{\ell (I)}+\frac{1}{\ell (J)^{s-1}}\frac{1}{\ell (J)}\Big )\\&=\Vert K\Vert _{CZ_s}\Big (\frac{|I|}{|J|}\Big )^{1/2}\Psi \Big (\frac{\ell (I)}{\ell (J)}\Big ). \end{aligned} \end{aligned}$$

\(\square \)

Lemma 4.7

For all \(I,J\in \mathscr {D}\) such that \(\ell (I)\le \ell (J)\), we have

$$\begin{aligned} |\langle 1_{MI}(\psi _J-\text {Tayl}_{s}(\psi _J,c_I)),T\psi _I \rangle |\lesssim \Vert T\Vert _{L^2\rightarrow L^2}\Big (\frac{\ell (I)}{\ell (J)}\Big )^s\Big (\frac{|I|}{|J|}\Big )^{1/2}. \end{aligned}$$
(4.8)

Proof

By the Taylor series of order s of \(\psi _J\) at the centre point \(c_I\) of I and the regularity properties of \(\psi _I,\psi _J\), we can compute the left hand side of (4.8) as follows:

$$\begin{aligned}&\Big |\Big \langle 1_{MI}\int \limits _0^1\sum _{\mathop {\begin{array}{c} |\alpha |=s \end{array}}}\frac{{(x-c_I)}^{\alpha }}{\alpha !}\partial ^{\alpha }\psi _J(tx+(1-t)c_I)(1-t)^{s-1}\,\mathrm {d}t,T\psi _I \Big \rangle \Big |\\&\quad \lesssim \Vert T\Vert _{L^2\rightarrow L^2}(M\ell (I))^s\Vert \partial ^{s}\psi _J\Vert _{\infty }|I|^{1/2}\\&\quad \lesssim \Vert T\Vert _{L^2\rightarrow L^2}\ell (I)^s\ell (J)^{-s}|J|^{-1/2}|I|^{1/2}\\&\quad =\Vert T\Vert _{L^2\rightarrow L^2}\Big (\frac{\ell (I)}{\ell (J)}\Big )^s\Big (\frac{|I|}{|J|}\Big )^{1/2}. \end{aligned}$$

\(\square \)

By combining Eq. (4.4), Lemmata 4.5 and 4.7 we have

$$\begin{aligned} |\langle \psi _J,T\psi _I \rangle |\lesssim (\Vert K\Vert _{CZ_s}+\Vert T\Vert _{L^2\rightarrow L^2})\Big (\frac{|I|}{|J|}\Big )^{1/2}\Big (\frac{\ell (I)}{\ell (J)}\Big )^{s-\epsilon }. \end{aligned}$$
(4.9)

Now, for the completion of the analysis of the sum \(\sigma _{{\text {between}}}\) we will need the following lemma:

Lemma 4.10

For I and J appearing in \(\sigma _{{\text {between}}}\), we have

$$\begin{aligned} D(I,J)\lesssim \ell (J), \end{aligned}$$

where D(IJ) is the long distance introduced in Sect. 3.

Proof

We start by estimating \({\text {dist}}(I,J)\) as follows:

$$\begin{aligned} \begin{aligned} {\text {dist}}(I,J)&\le \frac{1}{2}(m-1)\ell (I)+{\text {dist}}(mI,mJ)+\frac{1}{2}(m-1)\ell (J)\\&\le \frac{1}{2}(m-1)\ell (I)+\frac{1}{2}D(I,J)+\frac{1}{2}(m-1)\ell (J)\\&=\frac{1}{2}(m-1)\ell (I)+\frac{1}{2}\ell (I)+\frac{1}{2}{\text {dist}}(I,J)+\frac{1}{2}\ell (J)+\frac{1}{2}(m-1)\ell (J)\\&=\frac{m}{2}\ell (I)+\frac{1}{2}{\text {dist}}(I,J)+\frac{m}{2}\ell (J). \end{aligned} \end{aligned}$$
(4.11)

Hence, (4.11) implies

$$\begin{aligned} {\text {dist}}(I,J)\le m\ell (I)+m\ell (J). \end{aligned}$$
(4.12)

Applying (4.12) we have

$$\begin{aligned} \frac{D(I,J)}{\ell (J)}=\frac{\ell (I)+{\text {dist}}(I,J)+\ell (J)}{\ell (J)}\le (m+1)\frac{\ell (I)+\ell (J)}{\ell (J)}\le 2(m+1). \end{aligned}$$

\(\square \)

Using Lemma 3.6 we can organize the sum \(\sigma _{{\text {between}}}\) in a similar way as the sum \(\sigma _{{\text {far}}}\)

$$\begin{aligned} \sigma _{{\text {between}}}=\sum _{j=1}^{\infty }\sum _{i=j}^{\infty }\sum _K\sum _{\begin{array}{c} I,J:\ell (I)\le \ell (J)\\ {\text {dist}}(I,J)>\ell (J)(\ell (I)/\ell (J))^\theta \\ {\text {dist}}(mI,mJ)\le \frac{1}{2}D(I,J)\\ I\vee J=K\\ \ell (I)=2^{-i}\ell (K),\ell (J)=2^{-j}\ell (K) \end{array}}a(I,J), \end{aligned}$$

where

$$\begin{aligned} a(I,J):=\langle g,\psi _J \rangle \langle \psi _J,T\psi _I \rangle \langle \psi _I,f \rangle \end{aligned}$$
(4.13)

satisfies, by (4.9), the estimate

$$\begin{aligned} \begin{aligned} |a(I,J)|&\lesssim |\langle g,\psi _J \rangle |(\Vert K\Vert _{CZ_s}+\Vert T\Vert _{L^2\rightarrow L^2})\Big (\frac{|I|}{|J|}\Big )^{1/2}\Big (\frac{\ell (I)}{\ell (J)}\Big )^{s-\epsilon }|\langle \psi _I,f \rangle |\\&=|\langle g,\psi _J \rangle |(\Vert K\Vert _{CZ_s}+\Vert T\Vert _{L^2\rightarrow L^2})\frac{\sqrt{|I||J|}}{|K|}\frac{|K|}{|J|}\Big (\frac{\ell (I)}{\ell (J)}\Big )^{s-\epsilon }|\langle \psi _I,f \rangle |. \end{aligned} \end{aligned}$$
(4.14)

By combining Lemmata 3.6 and 4.10 we can estimate \(|K|/|J|\Big (\ell (I)/\ell (J)\Big )^{s-\epsilon }\) of (4.13) as follows:

$$\begin{aligned} \begin{aligned} \frac{|K|}{|J|}\Big (\frac{\ell (I)}{\ell (J)}\Big )^{s-\epsilon }&=\frac{\ell (K)^d}{\ell (J)^d}\Big (\frac{\ell (I)}{\ell (J)}\Big )^{s-\epsilon }\\&\lesssim \frac{\ell (K)^d}{\Big ({\ell (K)\Big (\frac{\ell (I)}{\ell (K)}\Big )^\theta }\Big )^{d+s-\epsilon }}\ell (I)^{s-\epsilon }\\&=\frac{\ell (K)^d}{\ell (K)^{(1-\theta )(d+s-\epsilon )}}\frac{\ell (I)^{s-\epsilon }}{\ell (I)^{\theta (d+s-\epsilon )}}\\&=\Big (\frac{\ell (I)}{\ell (K)}\Big )^{s-\epsilon -\theta (d+s-\epsilon )}\\&\le \Big (\frac{\ell (I)}{\ell (K)}\Big )^{s-\epsilon -\theta (d+s)}\\&=\Big (\frac{\ell (I)}{\ell (K)}\Big )^{s-2\epsilon }, \end{aligned} \end{aligned}$$
(4.15)

where we choose \(\theta =\frac{\epsilon }{d+s}\) for any given \(\epsilon >0\). Summarizing, from (4.14) and (4.15) we have

$$\begin{aligned} \begin{aligned} \sigma _{{\text {between}}}&=c\sum _{j=1}^{\infty }\sum _{i=j}^{\infty }\sum _K(\Vert K\Vert _{CZ_s}+\Vert T\Vert _{L^2\rightarrow L^2})2^{-i(s-2\epsilon )}\langle g,A_K^{ij} f \rangle \\&=c(\Vert K\Vert _{CZ_s}+\Vert T\Vert _{L^2\rightarrow L^2})\sum _{j=1}^{\infty }\sum _{i=j}^{\infty }2^{-i(s-2\epsilon )}\langle g,S^{ij} f \rangle , \end{aligned} \end{aligned}$$

where \(A^{ij}_K\) is an averaging operator and \(S^{ij}\) is a good wavelet shift with parameters (ij).

4.3 Contained Cubes, \(\sigma _{{\text {in}}}\)

Let \(M>m\). When \(I\subsetneq J\), the argument is the same as in the case of the sum \(\sigma _{{\text {between}}}\) but further apart from the corresponding estimate in [8, Sect. 3.2]. Hence, by combining Eqs. (4.3) and (4.4), Lemmata 4.5 and 4.7, estimate (4.9) we can organize

$$\begin{aligned} \sigma _{{\text {in}}}=\sum _{j=1}^{\infty }\sum _{i=j}^{\infty }\sum _K\sum _{\begin{array}{c} I,J:I\subsetneq J\\ I\vee J=K\\ \ell (I)=2^{-i}\ell (K),\ell (J)=2^{-j}\ell (K) \end{array}}a(I,J), \end{aligned}$$

where a(IJ) is defined in (4.13) and satisfies the estimate (4.14).

We observe that for the contained cubes \(\sigma _{{\text {in}}}\), we have from Lemma 3.6 the bound \(\ell (K)\Big (\frac{\ell (I)}{\ell (K)}\Big )^\theta \lesssim D(I,J).\) Also, from the definition of the contained cubes we have \(D(I,J)\lesssim \ell (J)\), which is the same as the conclusion of Lemma 4.10 in the case of \(\sigma _{{\text {between}}}\). Thus, we have all the same auxiliary estimates as in \(\sigma _{{\text {between}}}\) and the same conclusion

$$\begin{aligned} \begin{aligned} \sigma _{{\text {in}}}&=c\sum _{j=1}^{\infty }\sum _{i=j}^{\infty }\sum _K(\Vert K\Vert _{CZ_s}+\Vert T\Vert _{L^2\rightarrow L^2})2^{-i(s-2\epsilon )}\langle g,A_K^{ij} f \rangle \\&=c(\Vert K\Vert _{CZ_s}+\Vert T\Vert _{L^2\rightarrow L^2})\sum _{j=1}^{\infty }\sum _{i=j}^{\infty }2^{-i(s-2\epsilon )}\langle g,S^{ij} f \rangle , \end{aligned} \end{aligned}$$

where \(A^{ij}_K\) is an averaging operator and \(S^{ij}\) is a good wavelet shift with parameters (ij).

4.4 Near-by Cubes, \(\sigma _{=}\) and \(\sigma _{{\text {near}}}\)

We are left to deal with the sums \(\sigma _{=}\) of equal cubes \(I=J\), as well as \(\sigma _{{\text {near}}}\) of disjoint near-by cubes with \({\text {dist}}(I,J)\le \ell (J)(\ell (I)/\ell (J))^\theta \). Since I is good, this necessarily implies that \(\ell (I)>2^{-r}\ell (J)\). Then, for a given J, there are only boundedly many related I in this sum. Note that in contrast to [8, Sect. 3.3] we compute both sums using good wavelet shifts of type (ii) and (ij).

Lemma 4.16

For all \(I,J\in \mathscr {D}\), we have

$$\begin{aligned} |\langle \psi _J,T\psi _I \rangle |\le \Vert T\Vert _{L^2\rightarrow L^2}. \end{aligned}$$

Proof

Using the \(L^2\)-boundedness of T, we estimate simply

$$\begin{aligned} |\langle \psi _J,T\psi _I \rangle |\le \Vert \psi _J\Vert _{2}\Vert T\Vert _{L^2\rightarrow L^2}\Vert \psi _I\Vert _{2}=\Vert T\Vert _{L^2\rightarrow L^2}. \end{aligned}$$

\(\square \)

Using this lemma and applying Lemma 3.6 for the good \(I=J\in \mathscr {D}\) and a cube \(J'\in \mathscr {D}\) adjacent to I (i.e., \(\ell (J')=\ell (I)\) and \({\text {dist}}(I,J')=0\)), we have that \(K:=I\vee J'\) satisfies \(\ell (K)\lesssim \ell (I)\) and \(mI\subset K\). Moreover, from Lemma 4.16, we have

$$\begin{aligned} |\langle \psi _I,T\psi _I \rangle |\le \Vert T\Vert _{L^2\rightarrow L^2}=\frac{\sqrt{|I||J|}}{|I|}\Vert T\Vert _{L^2\rightarrow L^2}\lesssim \frac{\sqrt{|I||J|}}{|K|}\Vert T\Vert _{L^2\rightarrow L^2}. \end{aligned}$$

Thus, we can organize the sum \(\sigma _{=}\) as follows

$$\begin{aligned} \begin{aligned} \sigma _{{\text {=}}}&=\sum _{i=1}^{c}\sum _K\sum _{\begin{array}{c} I:mI\subset K\\ \ell (I)=2^{-i}\ell (K)\\ \end{array}}\langle g,\psi _I \rangle \langle \psi _I,T\psi _I \rangle \langle \psi _I,f \rangle \\&=c\sum _{i=1}^{c}\sum _K\Vert T\Vert _{L^2\rightarrow L^2}\langle g,A_K^{ii}f \rangle \\&=c\Vert T\Vert _{L^2\rightarrow L^2}\sum _{i=1}^{c}\langle g,S^{ii}f \rangle , \end{aligned} \end{aligned}$$

where \(A^{ii}_K\) is an averaging operator and \(S^{ii}\) is a good wavelet shift with parameters (ii).

For I and J participating in \(\sigma _{{\text {near}}}\), we conclude from Lemma 3.6 that \(K:=I\vee J\) satisfies \(\ell (K)\lesssim \ell (I)\). Also, from Lemma 4.16, we have

$$\begin{aligned} |\langle \psi _J,T\psi _I \rangle |\le \Vert T\Vert _{L^2\rightarrow L^2}\le \frac{\sqrt{|I||J|}}{|I|}\Vert T\Vert _{L^2\rightarrow L^2}\lesssim \frac{\sqrt{|I||J|}}{|K|}\Vert T\Vert _{L^2\rightarrow L^2}. \end{aligned}$$

Hence, we may organize

$$\begin{aligned} \begin{aligned} \sigma _{{\text {near}}}&=\sum _{i=1}^{c}\sum _{j=1}^i \sum _K\sum _{\begin{array}{c} I,J:{\text {dist}}(I,J)\le \ell (J)(\ell (I)/\ell (J))^\theta \\ I\cap J=\varnothing ,I\vee J=K\\ \ell (I)=2^{-i}\ell (K),\ell (J)=2^{-j}\ell (K) \end{array}}\langle g,\psi _J \rangle \langle \psi _J,T\psi _I \rangle \langle \psi _I,f \rangle \\&=c\sum _{i=1}^{c}\sum _{j=1}^i \sum _K\Vert T\Vert _{L^2\rightarrow L^2}\langle g,A_K^{ij}f \rangle \\&=c\Vert T\Vert _{L^2\rightarrow L^2}\sum _{i=1}^{c}\sum _{j=1}^i\langle g,S^{ij}f \rangle , \end{aligned} \end{aligned}$$

where \(A^{ij}_K\) is an averaging operator and \(S^{ij}\) is a good wavelet shift with parameters (ij).

Summarizing, we have

$$\begin{aligned} \sigma _{=}+\sigma _{{\text {near}}}=c\Vert T\Vert _{L^2\rightarrow L^2}\sum _{j=1}^{c}\sum _{i=j}^{c} \langle g,S^{ij}f \rangle , \end{aligned}$$

where \(S^{ij}\) is a good wavelet shift of type (ij).

4.5 Synthesis

We have checked that

$$\begin{aligned} \begin{aligned}&\sum _{\ell (I)\le \ell (J)} 1_{{\text {good}}}(I)\langle g,\psi _J \rangle \langle \psi _J,T\psi _I \rangle \langle \psi _I,f \rangle \\&\quad =c(\Vert K\Vert _{CZ_s}+\Vert T\Vert _{L^2\rightarrow L^2})\Big (\sum _{1\le j\le i<\infty }(2^{-i(s-\epsilon )}+2^{-i(s-2\epsilon )})\langle g,S^{ij}f \rangle \Big ), \end{aligned} \end{aligned}$$

where \(S^{ij}\) is a good wavelet shift of type (ij).

By symmetry (just observing that the cubes of equal size contributed precisely to the presence of the shifts of type (ii), and that the dual of a shift of type (ij) is a shift of type (ji)), it follows that

$$\begin{aligned} \begin{aligned}&\sum _{\ell (I)>\ell (J)} 1_{{\text {good}}}(J)\langle g,\psi _J \rangle \langle \psi _J,T\psi _I \rangle \langle \psi _I,f \rangle \\&\quad =c(\Vert K\Vert _{CZ_s}+\Vert T\Vert _{L^2\rightarrow L^2})\Big (\sum _{1\le i<j<\infty }(2^{-j(s-\epsilon )}+2^{-j(s-2\epsilon )})\langle g,S^{ij}f \rangle \Big ) \end{aligned} \end{aligned}$$

so that altogether

$$\begin{aligned} \begin{aligned}&\sum _{I,J} 1_{{\text {good}}}(\min \{I,J\}) \langle g,\psi _J \rangle \langle \psi _J,T\psi _I \rangle \langle \psi _I,f \rangle \\&\quad =c(\Vert K\Vert _{CZ_s}+\Vert T\Vert _{L^2\rightarrow L^2})\Big (\sum _{i,j=1}^{\infty }(2^{-\max (i,j)(s-\epsilon )}+2^{-\max (i,j)(s-2\epsilon )})\langle g,S^{ij}f \rangle \Big ). \end{aligned} \end{aligned}$$
(4.17)

The coefficient in (4.17) is dominated by \(2^{-\max (i,j)(s-\epsilon ')}\), where \(\epsilon '=2\epsilon \) for any given \(\epsilon >0\). This completes the proof of Theorem 3.2.