1 Introduction

The main purpose of this paper is to consider two weight norm inequalities for “well localized operators” (see Definition 4.2). Nazarov et al. [8] proved that Sawyer type testing conditions are necessary and sufficient for a well localized operator T to be bounded from \(L^{2}(\mu )\) into \( L^{2}(\nu )\), where \(\mu \) and \(\nu \) are two arbitrary Radon measures on \(\mathbb {R}^{n}\). This means that to deduce boundedness of the operator T it suffices to test T and its formal adjoint with one indicator of a (dyadic) cube at a time. Here we investigate the same well localized operators but with general exponents \(1<p<\infty \) defining the \(L^{p}\)-spaces. As an example of the applicability of the two weight theorem for well localized operators it was shown in [8] that two weight inequalities for Haar multipliers and Haar shifts can be seen as two weight inequalities for well localized operators.

There exists an unpublished manuscript by Nazarov showing that there are situations where the Sawyer type testing conditions do not work in \(L^{p}\) when \( p \not =2\). He fixes an exponent \( 1<p<\infty , \ p \not =2\), and provides an example of a certain operator related to Haar multipliers for which the Sawyer type testing conditions with the exponent p do not imply the corresponding quantitative two weight estimate. Also, in this example the Sawyer type testing would be enough in the case \(p=2\). A quantitative consequence of this counterexample related to Haar multipliers is explained in Sect. 4.

But, if we look at the Sawyer type testing a little differently, we see that there is another way to generalize it to other exponents \(1<p< \infty \). Namely, we consider a kind of square function testing condition, whose motivation comes from \(\mathcal {R}\)-bounded operator families as used for instance in [12]. An operator family on \(L^{2}\)-spaces is \(\mathcal {R}\)-bounded if and only if it is uniformly bounded, but for other exponents \(1<p<\infty \) \(\mathcal {R}\)-boundedness is in general a stronger property. In the same spirit our square function testing condition is equivalent with the Sawyer type testing in the case \(p=2\), but for other exponents \(1<p<\infty \) it can be a stronger requirement.

The initial idea was to try if this kind of testing is necessary and sufficient for a well localized operator T to bounded from \(L^{p}(\mu )\) into \(L^{p}(\nu )\) for any exponent \(1 < p < \infty \), which indeed is the case. But it was observed that another property of this square function testing is that it gives with exactly the same proof also a characterization for T to be bounded from \(L^{p}(\mu )\) into \(L^{q}(\nu )\) for any exponents \(1<p,q<\infty \).

To see what kind of theorem we are talking about we formulate a simplified qualitative version of the main Theorem 4.2. For the exact definition of the operator we refer to Sect. 4.

Theorem 1.1

Assume we have two exponents \(1<p,q<\infty \) and two Radon measures \(\mu \) and \(\nu \) on \(\mathbb {R}^{n}\). Let \(T^{\mu }\) be a well localized operator with respect to a dyadic lattice \(\mathscr {D}\) in \(\mathbb {R}^{n}\), and suppose \(T^{\nu }\) is a formal adjoint of \(T^{\mu }\). Then the operator \(T^{\mu }\) extends to a bounded operator \(T^{\mu }:L^{p}(\mu ) \rightarrow L^{q}(\nu )\) if and only if there exist two non-negative constants \(\mathcal {T}\) and \(\mathcal {T}^{*}\), so that for every finite subcollection \(\mathscr {D}_{0} \subset \mathscr {D}\) and every set of non-negative real numbers \(\{a_{Q}\}_{Q \in \mathscr {D}_{0}}\) the inequalities

$$\begin{aligned} \Big \Vert \Big ( \sum _{Q \in \mathscr {D}_{0}} ( T^{\mu }a_{Q}1_{Q})^{2} \Big )^{\frac{1}{2}} \Big \Vert _{L^{q}(\nu )} \le \mathcal {T} \Big \Vert \Big ( \sum _{Q \in \mathscr {D}_{0}} (a_{Q}1_{Q})^{2} \Big )^{\frac{1}{2}} \Big \Vert _{L^{p}(\mu )} \end{aligned}$$
(1.1)

and

$$\begin{aligned} \Big \Vert \Big ( \sum _{Q \in \mathscr {D}_{0}} ( T^{\nu }a_{Q}1_{Q})^{2} \Big )^{\frac{1}{2}} \Big \Vert _{L^{p'}(\mu )} \le \mathcal {T}^{*} \Big \Vert \Big ( \sum _{Q \in \mathscr {D}_{0}} (a_{Q}1_{Q})^{2} \Big )^{\frac{1}{2}} \Big \Vert _{L^{q'}(\nu )} \end{aligned}$$
(1.2)

hold.

We will also demonstrate the use of our testing condition with positive dyadic operators, and we will get again an \(L^{p}(\mu ) \rightarrow L^{q}(\mu )\) characterization for any exponents \(1<p,q<\infty \). Previously there has been two different characterizations depending on the relative order of the exponents p and q, see [4] (or [2] for a different proof technique) and [11]. Here we get one characterization for all cases. This also provides another example of a situation where the square function testing is sufficient but the Sawyer type testing is not.

Even though we have a different kind of testing condition, the proofs will follow the existing outlines. With the positive dyadic operators we follow the technique in [2], and our study of the well localized operators is structured as in [8].

2 Set Up and Preliminaries

We begin by recalling a general theorem due to Marcinkiewicz and Zygmund [6] (Theorem 2.1) showing that bounded linear operators on \(L^{p}\)- spaces have extensions to a certain vector valued situation. This theorem will also show that the square function testing condition follows from boundedness of the corresponding operator.

So fix a positive integer n and suppose \(\mu \) and \(\nu \) are two Radon measures on \(\mathbb {R}^{n}\). We consider these fixed for the rest of the paper. We shall give the definitions below with the measure \(\mu \), but they are defined similarly with any Radon measure.

Let \((\varepsilon _{i})_{i=1}^{\infty }\) a sequence of independent random signs on some probablity space \((\Omega , \mathbb {P})\). This means that the sequence is independent and \(\mathbb {P}(\varepsilon _{i}=1)=\mathbb {P}(\varepsilon _{i}=-1)=1/2\) for all i. We will use the Kahane-Khinchine inequality [3] saying that for any Banach space X and two exponents \(1 \le p,q< \infty \) there exists a constant \(C>0\), depending only on p and q, so that for any \(x_{1}, \dots , x_{N} \in X\)

$$\begin{aligned} C^{-1}\Bigg (\mathbb {E}\Vert \sum _{i=1}^{N} \varepsilon _{i}x_{i} \Vert _{X}^{q} \Bigg )^{\frac{1}{q}} \le \Bigg (\mathbb {E}\Vert \sum _{i=1}^{N} \varepsilon _{i}x_{i} \Vert _{X}^{p} \Bigg )^{\frac{1}{p}} \le C \Bigg (\mathbb {E}\Vert \sum _{i=1}^{N} \varepsilon _{i}x_{i} \Vert _{X}^{q} \Bigg )^{\frac{1}{q}}, \end{aligned}$$
(2.1)

where \(\mathbb {E}\) refers to the expectation with respect to the random signs. The Kahane-Khinchine inequalities will be used when \(X=\mathbb {R}\) or when X is some \(L^{p}\)-space, and we note here that the constant C in (2.1) in the case of \(L^{p}\)-spaces does not depend on the underlying measure.

Two sided estimates like (2.1) will be abbreviated as

$$\begin{aligned} \Bigg (\mathbb {E}\Vert \sum _{i=1}^{N} \varepsilon _{i}x_{i} \Vert _{X}^{q}\Bigg )^{\frac{1}{q}} \simeq _{p,q} \Bigg (\mathbb {E}\Vert \sum _{i=1}^{N} \varepsilon _{i}x_{i} \Vert _{X}^{p} \Bigg )^{\frac{1}{p}}, \end{aligned}$$

where possible subscripts (in this case pq) refer to the information that the implicit constant C depends on. A similar one sided estimate will be abbreviated as “\(\lesssim \)” or “\(\gtrsim \)”. The implicit constants will never depend on any relevant information in the situation, and no confusion should arise.

For simplicity all our scalar valued functions will be real (or \([- \infty , \infty ]\)) valued. For any exponent \(1\le p < \infty \) we denote by \(L^{p}(\mu )\) the usual \(L^{p}\)-space on \(\mathbb {R}^{n}\) with respect to the measure \(\mu \), and by \(L^{p}(\mu ,l^{2})\) the space of sequences \((f_{i})_{i=1}^{\infty }\) of \(\mu \)-measurable real valued functions defined on \(\mathbb {R}^{n}\) for which the norm

$$\begin{aligned} \Vert (f_{i})_{i=1}^{\infty } \Vert _{L^{p}(\mu ,l^{2})}:=\Bigg ( \int \Bigg ( \sum _{i=1}^{\infty } | f_{i} |^{2}\Bigg )^{\frac{p}{2}} \mathrm {d}\mu \Bigg )^{\frac{1}{p}} \end{aligned}$$

is finite.

Theorem 2.1

Let \(1 \le p,q<\infty \) be two exponents and assume that \(T:L^{p}(\mu ) \rightarrow L^{q}(\nu )\) is a bounded linear operator. Then the operator

$$\begin{aligned} (f_{i})_{i=1}^{\infty } \mapsto \tilde{T}(f_{i})_{i=1}^{\infty }:= (Tf_{i})_{i=1}^{\infty } \end{aligned}$$

is also a bounded linear operator from \(L^{p}(\mu ,l^{2})\) into \(L^{q}(\nu ,l^{2})\), with operator norm satisfying

$$\begin{aligned} \Vert T \Vert _{L^{p}(\mu ) \rightarrow L^{q}(\nu )} \simeq _{p,q} \Vert \tilde{T} \Vert _{L^{p}(\mu , l^{2}) \rightarrow L^{q}(\nu , l^{2})}. \end{aligned}$$

Proof

We recall a short proof for the reader’s convenience. It suffices to consider an arbitrary sequence \((f_{i})_{i=1}^{\infty }\) of \(L^{p}(\mu )\)-functions such that \(f_{i}\not =0\) only for finitely many indices i. Let \((\varepsilon _{i})_{i=1}^{\infty }\) be an independent sequence of random signs.

Using the Kahane-Khinchine inequality (four times) and the linearity of T we get

$$\begin{aligned}&\Big ( \int \big ( \sum _{i=1}^{\infty } |T f_{i} |^{2}\big )^{\frac{q}{2}} \mathrm {d}\nu \Big )^{\frac{1}{q}} = \Big ( \int \big ( \mathbb {E}| \sum _{i=1}^{\infty } \varepsilon _{i} Tf_{i} |^{2}\big )^{\frac{q}{2}} \mathrm {d}\nu \Big )^{\frac{1}{q}} \\&\quad \simeq _{q} \Big ( \mathbb {E}\int | \sum _{i=1}^{\infty } \varepsilon _{i} Tf_{i} |^{q} \mathrm {d}\nu \Big )^{\frac{1}{q}} \simeq _{q} \mathbb {E}\Big ( \int | T\sum _{i=1}^{\infty } \varepsilon _{i} f_{i} |^{q} \mathrm {d}\nu \Big )^{\frac{1}{q}} \\&\quad \le \Vert T \Vert _{L^{p}(\mu ) \rightarrow L^{q}(\nu )} \mathbb {E}\Big ( \int | \sum _{i=1}^{\infty } \varepsilon _{i} f_{i} |^{p} \mathrm {d}\mu \Big )^{\frac{1}{p}}\\&\quad \simeq _{p}\Vert T \Vert _{L^{p}(\mu ) \rightarrow L^{q}(\nu )} \Big ( \int \big ( \sum _{i=1}^{\infty } | f_{i} |^{2}\big )^{\frac{p}{2}} \mathrm {d}\mu \Big )^{\frac{1}{p}}, \end{aligned}$$

where in the last step we used the Kahane-Khinchine inequality twice. \(\square \)

Let \(\mathscr {D}\) be a dyadic lattice in \(\mathbb {R}^{n}\). More specifically, for each \(k \in \mathbb {Z}\), let \(\mathscr {D}_{k}\) consist of disjoint cubes of the form \(x+[0, 2^{-k})^{n}\), \(x \in \mathbb {R}^{n}\), that cover \(\mathbb {R}^{n}\). It is required that for every \(k \in \mathbb {Z}\) and \(Q \in \mathscr {D}_{k}\) the cube Q is a union of \(2^{n}\) cubes \(Q' \in \mathscr {D}_{k+1}\). Then define \(\mathscr {D}:= \cup _{k \in \mathbb {Z}}\mathscr {D}_{k}\). The side length \(2^{-k}\) of a cube \(Q \in \mathscr {D}_{k}\) is written as l(Q). We will fix one lattice \(\mathscr {D}\).

For a cube \(Q \in \mathscr {D}_{k}\) define \(Q^{(1)}\) to be the unique cube in \(\mathscr {D}_{k-1}\) that contains Q, and for \(2 \le r \in \mathbb {Z}\) define inductively \(Q^{(r)}:= (Q^{(r-1)})^{(1)}\). Also, for any positive integer r, let \(ch^{(r)}(Q)\) consist of those cubes \(Q'\) in \(\mathscr {D}\) that satisfy \(Q'^{(r)}=Q\), and for \(r=1\) write just \(ch(Q):=ch^{(1)}(Q)\). We talk about ch(Q) as the children of the cube Q.

Let f be a function in \(L^{p}(\mu ), 1 < p< \infty \). For any cube \(Q \in \mathscr {D}\) denote the average of f over Q by

$$\begin{aligned} \langle f \rangle ^{\mu }_{Q}:= \frac{1}{\mu (Q)} \int _{Q} f \mathrm {d}\mu , \end{aligned}$$

and define the differences

$$\begin{aligned} \Delta ^{\mu }_{Q} f:= \sum _{Q' \in ch(Q)} \langle f \rangle ^{\mu }_{Q'} 1_{Q'}- \langle f \rangle ^{\mu }_{Q} 1_{Q}. \end{aligned}$$

We shall use the martingale difference decomposition

$$\begin{aligned} f = \sum _{Q \in \mathscr {D}_{k}} \langle f \rangle ^{\mu }_{Q} 1_{Q} +\sum _{\begin{array}{c} Q \in \mathscr {D}\\ l(Q) \le 2^{-k} \end{array}} \Delta ^{\mu }_{Q} f, \end{aligned}$$

where \(k \in \mathbb {Z}\) is any integer.

For any cube \(Q \in \mathscr {D}\) with at least two children that have non-zero \(\mu \)-measure, let \(h^{\mu }_{Q, k}, k \in \{1, \dots , m(Q)\}\), be a collection of Haar functions on Q, where \(m(Q)+1\) is the number of children of Q that have non-zero measure. The Haar functions are required to form an orthonormal basis for the space

$$\begin{aligned} \{f:Q \rightarrow \mathbb {R}: f\text { is constant on the children of { Q} and } \int f \mathrm {d}\mu =0\} \end{aligned}$$
(2.2)

equipped with the \(L^{2}(\mu )\)-norm. Below we shall sometimes just write \(h^{\mu }_{Q}\) for a generic Haar function related to a cube \(Q\in \mathscr {D}\).

With the Haar functions the differences \(\Delta ^{\mu }_{Q}f\) may be written as

$$\begin{aligned} \Delta ^{\mu }_{Q}f= \sum _{k=1}^{m(Q)} \langle f, h^{\mu }_{Q,k} \rangle _{\mu } h^{\mu }_{Q,k}, \end{aligned}$$

where

$$\begin{aligned} \langle f,g\rangle _{\mu }:= \int f g \mathrm {d}\mu \end{aligned}$$

for \(g \in L^{p'}(\mu )\), and \(p'\) is the dual exponent to p, i.e., \(\frac{1}{p} + \frac{1}{p'} =1\). Indeed, if Q has at most one child with non-zero measure, then \(\Delta ^{\mu }_{Q}f=0\). Otherwise, the requirement that Haar functions are constant in the children of Q and have zero average, and the fact that every function in the space (2.2) can be represented with the basis, give

$$\begin{aligned}&\sum _{Q' \in ch(Q)} \langle f \rangle ^{\mu }_{Q'} 1_{Q'}- \langle f \rangle ^{\mu }_{Q} 1_{Q} = \sum _{k=1}^{m(Q)} \Big \langle \sum _{Q' \in ch(Q)} \langle f \rangle ^{\mu }_{Q'} 1_{Q'}- \langle f \rangle ^{\mu }_{Q} 1_{Q}, h^{\mu }_{Q,k} \Big \rangle _{\mu } h^{\mu }_{Q,k} \\&\quad =\sum _{k=1}^{m(Q)} \Big \langle \sum _{Q' \in ch(Q)} \langle f \rangle ^{\mu }_{Q'} 1_{Q'}, h^{\mu }_{Q,k} \Big \rangle _{\mu } h^{\mu }_{Q,k} =\sum _{k=1}^{m(Q)} \langle f, h^{\mu }_{Q,k} \rangle _{\mu } h^{\mu }_{Q,k}. \end{aligned}$$

The norm of f may be estimated with the martingale difference decomposition as

$$\begin{aligned} \Vert f \Vert _{L^{p}(\mu )} \simeq _{p} \Big \Vert \big ( \sum _{Q \in \mathscr {D}_{k}} | \langle f \rangle ^{\mu }_{Q} |^{2} 1_{Q} +\sum _{\begin{array}{c} Q \in \mathscr {D}: \\ l(Q) \le 2^{-k} \end{array}} | \Delta ^{\mu }_{Q} f |^{2}\big )^{\frac{1}{2}} \Big \Vert _{L^{p}(\mu )}, \end{aligned}$$
(2.3)

where again \(k \in \mathbb {Z}\) is arbitrary. We emphasize that

$$\begin{aligned} \Vert f \Vert _{L^{2}(\mu )} = \Bigg (\sum _{Q \in \mathscr {D}_{k}} \Vert \langle f \rangle ^{\mu }_{Q} 1_{Q}\Vert _{L^{2}(\mu )}^{2} +\sum _{\begin{array}{c} Q \in \mathscr {D}\\ l(Q) \le 2^{-k} \end{array}} \Vert \Delta ^{\mu }_{Q} f \Vert _{L^{2}(\mu )}^{2}\Bigg )^{\frac{1}{2}} \end{aligned}$$
(2.4)

holds only for \(p=2\), and in general if one replaces all the numbers 2 in (2.4) with an arbitrary \(1<p<\infty \), one gets (2.4) with “\(\lesssim \)” if \(1<p\le 2\) and with “\(\gtrsim \)” if \(2 \le p < \infty \).

2.1 Principal Cubes and Carleson Embedding Theorem

We shall also use the usual principal cubes and a form of the dyadic Carleson embedding theorem. To construct the principal cubes suppose \(f \in L^{1}_{loc}(\mu )\) and \(\mathscr {D}_{0} \subset \mathscr {D}\) is a subcollection such that each \(Q' \in \mathscr {D}_{0}\) is contained in some maximal cube \(Q\in \mathscr {D}_{0}\). Maximality of a cube here means that it is not contained in any strictly bigger cube in the same collection.

Let \(\mathscr {F}_{0}\) be the set of maximal cubes in \(\mathscr {D}_{0}\). Assume that \(\mathscr {F}_{0}, \dots , \mathscr {F}_{k}\) are defined for some non-negative integer k. Then, for \(Q \in \mathscr {F}_{k}\), let \(ch_{\mathscr {F}}(Q)\) consist of the maximal cubes \(Q' \in \mathscr {D}_{0}\) such that \(Q' \subset Q\) and

$$\begin{aligned} \langle |f| \rangle _{Q'}^{\mu } > 2 \langle |f| \rangle _{Q}^{\mu }. \end{aligned}$$

Set \(\mathscr {F}_{k+1}:= \cup _{Q \in \mathscr {F}_{k}} ch_{\mathscr {F}}(Q)\) and

$$\begin{aligned} \mathscr {F}:= \bigcup _{k=0}^{\infty }\mathscr {F}_{k}. \end{aligned}$$

For any cube \(Q \in \mathscr {D}_{0}\) denote by \(\pi _{\mathscr {F}}Q\) the smallest cube in \(\mathscr {F}\) that contains Q, and by \(\pi ^{1}_{\mathscr {F}}Q\) the smallest cube (if it exists) in \(\mathscr {F}\) that strictly contains it.

The collection \(\mathscr {F}\) is \(\frac{1}{2}\)-sparse, that is, there exist pairwise disjoint subsets \(E(F) \subset F, F \in \mathscr {F},\) such that \(\mu (E(F)) \ge \frac{1}{2}\mu (F)\). Indeed, one can define \(E(F):= F \setminus \cup _{F' \in ch_{\mathscr {F}}(F)}F'\), and the construction of \(\mathscr {F}\) implies that \(\mu (E(F)) \ge \frac{1}{2}\mu (F)\). The property that \(\mathscr {F}\) is \(\frac{1}{2}\)-sparse implies also that \(\mathscr {F}\) is \(2-Carleson \), i.e., for every \(F \in \mathscr {F}\)

$$\begin{aligned} \sum _{\begin{array}{c} F' \in \mathscr {F}: \\ F' \subset F \end{array}} \mu (F') \le 2 \mu (F). \end{aligned}$$

The well known Carleson embedding theorem says that if \(\{a_{Q}\}_{Q\in \mathscr {D}}\) is a collection of non-negative real numbers, then the estimate

$$\begin{aligned} \sum _{Q \in \mathscr {D}} |\langle f \rangle _{Q}^{\mu } |^{p}a_{Q} \le C \Vert f \Vert _{L^{p}(\mu )}^{p} \end{aligned}$$

holds for all \(f \in L^{p}(\mu )\), where C is a fixed constant and \(p \in (1,\infty )\), if and only if there exists \(C'>0\) so that

$$\begin{aligned} \sum _{\begin{array}{c} Q' \in \mathscr {D}: \\ Q' \subset Q \end{array}} a_{Q'} \le C' \mu (Q) \end{aligned}$$

for all \(Q \in \mathscr {D}\).

The version of the theorem we shall use is the following:

Theorem 2.2

Suppose \(\mathscr {D}_{0} \subset \mathscr {D}\) is a subcollection and \(1 < p < \infty \). Then we have the estimate

$$\begin{aligned} \Big \Vert \sum _{Q \in \mathscr {D}_{0}} \langle | f |\rangle ^{\mu }_{Q}1_{Q}\Big \Vert _{L^{p}(\mu )} \le C \Vert f \Vert _{L^{p}(\mu )} \end{aligned}$$
(2.5)

for all \(f \in L^{p}(\mu )\), where C is independent of f, if and only if there exists \(C'>0\) such that for all \(Q \in \mathscr {D}_{0}\)

$$\begin{aligned} \sum _{\begin{array}{c} Q' \in \mathscr {D}_{0}: \\ Q' \subset Q \end{array}} \mu (Q') \le C' \mu (Q). \end{aligned}$$
(2.6)

Moreover, the smallest possible constants C and \(C'\) satisfy \(C' \le C^p \lesssim _p C'^{p}\).

Proof

Assume (2.5) holds. Then for any \(Q \in \mathscr {D}_{0}\) we have

$$\begin{aligned} \sum _{\begin{array}{c} Q' \in \mathscr {D}_{0}: \\ Q' \subset Q \end{array}} \mu (Q') \!=\! \int \sum _{\begin{array}{c} Q' \in \mathscr {D}_{0}: \\ Q' \subset Q \end{array}} (\langle 1_{Q}\rangle ^{\mu }_{Q'}1_{Q'})^{p} \mathrm {d}\mu \!\le \! \int \Bigg (\sum _{\begin{array}{c} Q' \in \mathscr {D}_{0}: \\ Q' \subset Q \end{array}} \langle 1_{Q}\rangle ^{\mu }_{Q'}1_{Q'}\Bigg )^{p} \mathrm {d}\mu \le C^p \mu (Q). \end{aligned}$$

On the other hand assume that (2.6) holds and \(f \in L^{p}(\mu )\). If \(g \in L^{p'}(\mu )\) is any function, then

$$\begin{aligned}&\int \Bigg (\sum _{Q \in \mathscr {D}_{0}} \langle | f |\rangle ^{\mu }_{Q}1_{Q}\Bigg ) g \mathrm {d}\mu = \sum _{Q \in \mathscr {D}_{0}} \langle | f |\rangle ^{\mu }_{Q} \langle g \rangle ^{\mu }_{Q} \mu (Q) \\&\quad \le \Big ( \sum _{Q \in \mathscr {D}_{0}} (\langle | f |\rangle ^{\mu }_{Q})^{p}\mu (Q) \Big )^{\frac{1}{p}} \Big ( \sum _{Q \in \mathscr {D}_{0}} (\langle | g| \rangle ^{\mu }_{Q})^{p'}\mu (Q) \Big )^{\frac{1}{p'}} \lesssim _{p}C'\Vert f \Vert _{L^{p}(\mu )} \Vert g \Vert _{L^{p'}(\mu )}, \end{aligned}$$

where the last step follows from the usual formulation of the Carleson embedding theorem. \(\square \)

3 Positive Dyadic Operators

Before going to work with the well localized operators we introduce and illustrate the square function testing condition with a simpler positive dyadic operator. Fix two exponents \(1 < p,q < \infty \). Let \(\{\lambda _{Q}\}_{Q \in \mathscr {D}}\) be a set of non-negative real numbers. Define for non-negative Borel measurable functions a mapping

$$\begin{aligned} f \mapsto T^{\mu }f:=\sum _{Q \in \mathscr {D}} \lambda _{Q} \int _{Q} f \mathrm {d}\mu 1_{Q}. \end{aligned}$$
(3.1)

We want to investigate when we have an estimate

$$\begin{aligned} \Vert T^{\mu }f \Vert _{L^{q}(\nu )} \le C \Vert f\Vert _{L^{p}(\mu )} \end{aligned}$$
(3.2)

for all \(0 \le f \in L^{p}(\mu )\), where of course C should not depend on f. Similarly define for \(0 \le f\)

$$\begin{aligned} f \mapsto T^{\nu }f:=\sum _{Q \in \mathscr {D}} \lambda _{Q} \int _{Q} f \mathrm {d}\nu 1_{Q}, \end{aligned}$$

and for every cube \(Q \in \mathscr {D}\) also the localized versions

$$\begin{aligned} T_{Q}^{\mu }f:=\sum _{\begin{array}{c} Q' \in \mathscr {D}: \\ Q' \subset Q \end{array}} \lambda _{Q'} \int _{Q'} f \mathrm {d}\mu 1_{Q'} \end{aligned}$$

and correspondingly \(T_{Q}^{\nu }\).

Theorem 3.1

The estimate (3.2) holds if and only if there exist two constants \(0\le C_{1}, C_{2} < \infty \) so that for every finite 2-Carleson family \(\mathscr {D}_{0} \subset \mathscr {D}\) and every set \(\{a_{Q}\}_{Q \in \mathscr {D}_{0}}\) of positive real numbers the inequalities

$$\begin{aligned} \Big \Vert \Big ( \sum _{Q \in \mathscr {D}_{0}} (a_{Q}T^{\mu }_{Q}1_{Q})^{2}\Big )^{\frac{1}{2}} \Big \Vert _{L^{q}(\nu )} \le C_{1} \Big \Vert \Big ( \sum _{Q \in \mathscr {D}_{0}} (a_{Q}1_{Q})^{2}\Big )^{\frac{1}{2}} \Big \Vert _{L^{p}(\mu )} \end{aligned}$$
(3.3)

and

$$\begin{aligned} \Big \Vert \Big ( \sum _{Q \in \mathscr {D}_{0}} (a_{Q}T^{\nu }_{Q}1_{Q})^{2}\Big )^{\frac{1}{2}} \Big \Vert _{L^{p'}(\nu )} \le C_{2} \Big \Vert \Big ( \sum _{Q \in \mathscr {D}_{0}} (a_{Q}1_{Q})^{2}\Big )^{\frac{1}{2}} \Big \Vert _{L^{q'}(\mu )} \end{aligned}$$
(3.4)

hold.

If \(\mathcal {T}\) and \(\mathcal {T}^{*}\) denote the smallest possible constants \(C_{1}\) and \(C_{2}\), respectively, then the smallest possible constant \(\Vert T\Vert \) in (3.2) satisfies \(\Vert T\Vert \simeq \mathcal {T} + \mathcal {T}^{*}\).

This problem and the related results, as well as the whole “testing philosophy”, has its roots in the work of Sawyer [9, 10] in the 80’s. A characterization for the inequality (3.2) was first given by Nazarov et al. [7] in the case \(p=q=2\) using the Bellman function method. The case \(1<p\le q <\infty \) was characterized by Lacey et al. [4]. Finally, Tanaka [11] gave a characterization when the exponents are in the order \(1<q<p<\infty \).

Let us discuss here briefly the relation between the conditions (3.3) and (3.4) and the Sawyer type testing. The Sawyer type testing corresponds to the case when there is only one term in the sums in (3.3) and (3.4), that is the operator and its formal adjoint would be tested with one indicator of a dyadic cube at a time. Hence it is clear that the square function testing condition implies the Sawyer type testing condition.

On the other hand, when \(p=q=2\), the left hand side of (3.3) can be written as

$$\begin{aligned} \Big \Vert \Big ( \sum _{Q \in \mathscr {D}_{0}} (a_{Q}T^{\mu }_{Q}1_{Q})^{2}\Big )^{\frac{1}{2}} \Big \Vert _{L^{2}(\nu )} =\Big ( \sum _{Q \in \mathscr {D}_{0}} \Vert a_{Q}T^{\mu }_{Q}1_{Q}\Vert _{L^{2}(\nu )}^{2}\Big )^{\frac{1}{2}}, \end{aligned}$$

and a similar computation on the right hand side of (3.3) shows that in this case the Sawyer type testing would imply the square function testing.

Equation (3.3) could be written with the Kahane-Khinchine inequalities as

$$\begin{aligned} \mathbb {E}\Big \Vert \sum _{Q \in \mathscr {D}_{0}} \varepsilon _{Q}a_{Q}T^{\mu }_{Q}1_{Q} \Big \Vert _{L^{q}(\nu )} \le C_{1}'\mathbb {E}\Big \Vert \sum _{Q \in \mathscr {D}_{0}} \varepsilon _{Q}a_{Q}1_{Q} \Big \Vert _{L^{p}(\mu )}, \end{aligned}$$

where the constants \(C'_{1}\) and \(C_{1}\) are comparable depending only on p and q. This formulation explains how the square function testing is in the spirit of \(\mathcal {R}\)-bounded operator families, as mentioned in the introduction.

The Sawyer type testing is in general sufficient for (3.2) if and only if the exponents are in the order \(1<p\le q\le \infty \), see [1]. Thus, in this situation our result is worse than the existing one. In the case \(1<q<p<\infty \) Tanaka [11] has given a characterization in terms of discrete Wolff’s potentials, and here our result can be seen as an alternative way. We note that in our method the relative position of the exponents p and q does not make any difference.

Proof of theorem 3.1

Our proof will follow the technique of “parallel stopping cubes” as in [2]. This method was first introduced in [5] (only in the older arXiv versions) and was used in the investigations of the two weight inequality for the Hilbert transform.

If (3.2) holds, then the sum (3.1) defining \(T^{\mu }\) actually defines a bounded linear operator from \(L^{p}(\mu )\) into \(L^{q}(\nu )\). Clearly \(T^{\mu }_{Q}1_{Q} \le T^{\mu }1_{Q}\) for every \(Q \in \mathscr {D}\), and the same is true for \(T^{\nu }\) also. Hence in this situation we may apply Theorem 2.1 to show that (3.3) and (3.4) hold.

Assume then that (3.3) and (3.4) are true, and let \(0 \le f \in L^{p}(\mu )\) and \( 0\le g \in L^{q'}(\nu )\) be two functions. For the estimate (3.2) it is enough to choose a finite subcollection \(\mathscr {D}_{0} \subset \mathscr {D}\) and show that

$$\begin{aligned} \sum _{Q\in \mathscr {D}_{0}}\lambda _{Q}\int _{Q}f \mathrm {d}\mu \int _{Q}g \mathrm {d}\nu \lesssim (C_{1}+C_{2}) \Vert f\Vert _{L^{p}(\mu )}\Vert g\Vert _{L^{q'}(\nu )}. \end{aligned}$$
(3.5)

Since \(\mathscr {D}_{0}\) is finite, we can construct the collections \(\mathscr {F}\) and \(\mathscr {G}\) of principal cubes for the function f and g, respectively, where \(\mathscr {F}\) is constructed with respect to the measure \(\mu \) and \(\mathscr {G}\) with respect to \(\nu \). If \(Q \in \mathscr {D}_{0}\) the notation \(\pi Q=(F,G)\) means that \(\pi _{\mathscr {F}}Q=F\) and \(\pi _{\mathscr {G}}Q=G\).

For every cube \(Q \in \mathscr {D}_{0}\) there is a unique pair \((F,G) \in \mathscr {F}\times \mathscr {G}\) so that \(\pi Q=(F,G)\), and the properties of dyadic cubes imply that \(F \subset G\) or \(G\subset F\). Hence, the sum in (3.5) may be divided as

$$\begin{aligned} \sum _{Q \in \mathscr {D}_{0}} \le \sum _{F \in \mathscr {F}} \sum _{\begin{array}{c} G \in \mathscr {G}: \\ G \subset F \end{array}} \sum _{\begin{array}{c} Q \in \mathscr {D}_{0}:\\ \pi Q=(F,G) \end{array}} +\sum _{G \in \mathscr {G}} \sum _{\begin{array}{c} F \in \mathscr {F}:\\ F \subset G \end{array}} \sum _{\begin{array}{c} Q \in \mathscr {D}_{0}:\\ \pi Q=(F,G) \end{array}}, \end{aligned}$$
(3.6)

where “\(\le \)” is needed since we have double-counted the terms corresponding to all Q for which \(\pi _{\mathscr {F}}Q= \pi _{\mathscr {G}}Q\). The two sums in (3.6) are treated in the same way by symmetry, and we focus on the first one.

So let \(\mathscr {F}\ni F \supset G \in \mathscr {G}\) and suppose \(Q \in \mathscr {D}_{0}\) is such that \(\pi Q=(F,G)\). Write \(ch_{\mathscr {F}}^{*}(F)\) for the collection of all \(F' \in ch_{\mathscr {F}}(F)\) such that \(\pi _{\mathscr {G}}F'\subset F\). Then, by the construction of \(\mathscr {F}\), we have \( \langle f \rangle _{Q}^{\mu } \le 2 \langle f \rangle _{F}^{\mu }\), and

$$\begin{aligned} \int _{Q} g \mathrm {d}\nu = \int _{Q} g_{F} \mathrm {d}\nu , \end{aligned}$$

where

$$\begin{aligned} g_{F}:= 1_{E(F)} g + \sum _{F' \in ch_{\mathscr {F}}^{*}(F)}\langle g \rangle _{F'}^{\nu }. \end{aligned}$$

Using these observations we get

$$\begin{aligned}&\sum _{F \in \mathscr {F}} \sum _{\begin{array}{c} G \in \mathscr {G}: \\ G \subset F \end{array}} \sum _{\begin{array}{c} Q \in \mathscr {D}_{0}:\\ \pi (Q)=(F,G) \end{array}} \lambda _{Q}\int _{Q}f \mathrm {d}\mu \int _{Q}g \mathrm {d}\nu \lesssim \sum _{F \in \mathscr {F}} \langle f \rangle _{F}^{\mu } \sum _{\begin{array}{c} Q \in \mathscr {D}: \\ Q \subset F \end{array}}\lambda _{Q}\int _{Q}1 \mathrm {d}\mu \int _{Q}g_{F} \mathrm {d}\nu \\&\quad = \int \sum _{F \in \mathscr {F}} \big (\langle f \rangle _{F}^{\mu } T^{\mu }_{F}1_{F}\big ) g_{F} \mathrm {d}\nu \le \Big \Vert \Big (\ \sum _{F \in \mathscr {F}}\big (\langle f \rangle _{F}^{\mu } T^{\mu }_{F}1_{F}\big )^{2}\Big )^{\frac{1}{2}}\Big \Vert _{L^{q}(\nu )}\\&\qquad \cdot \Big \Vert \Big (\sum _{F \in \mathscr {F}} (g_{F})^{2}\Big )^{\frac{1}{2}} \Big \Vert _{L^{q'}(\nu )} \\&\quad \le C_{1} \Big \Vert \Big (\ \sum _{F \in \mathscr {F}}\big (\langle f \rangle _{F}^{\mu }1_{F}\big )^{2}\Big )^{\frac{1}{2}}\Big \Vert _{L^{p}(\mu )} \Big \Vert \Big (\sum _{F \in \mathscr {F}} (g_{F})^{2}\Big )^{\frac{1}{2}} \Big \Vert _{L^{q'}(\nu )}. \end{aligned}$$

The Carleson embedding theorem 2.2 implies that

$$\begin{aligned} \Big \Vert \Big (\ \sum _{F \in \mathscr {F}}\big (\langle f \rangle _{F}^{\mu }1_{F}\big )^{2}\Big )^{\frac{1}{2}}\Big \Vert _{L^{p}(\mu )} \le \Big \Vert \ \sum _{F \in \mathscr {F}}\langle f \rangle _{F}^{\mu }1_{F} \Big \Vert _{L^{p}(\mu )} \lesssim \Vert f \Vert _{L^{p}(\mu )}. \end{aligned}$$

Considering the second factor we have

$$\begin{aligned}&\sum _{F \in \mathscr {F}} g_{F}= \sum _{F \in \mathscr {F}} (1_{E(F)}g +\sum _{F' \in ch_{\mathscr {F}}^{*}(F)}\langle g \rangle _{F'}^{\nu }1_{F'}) \le g\\&\qquad + \sum _{F \in \mathscr {F}} \sum _{\begin{array}{c} G \in \mathscr {G}: \\ \pi _{\mathscr {F}} G = F \text { or} \\ G \in ch_{\mathscr {F}}(F) \end{array}} \sum _{\begin{array}{c} F' \in ch_{\mathscr {F}}(F): \\ \pi _{\mathscr {G}} F'=G \end{array}} \langle g \rangle _{F'}^{\nu } 1_{F'} \\&\quad \le g+ 2\sum _{F \in \mathscr {F}} \sum _{\begin{array}{c} G \in \mathscr {G}: \\ \pi _{\mathscr {F}} G = F \text { or} \\ G \in ch_{\mathscr {F}}(F) \end{array}} \langle g \rangle _{G}^{\nu } \sum _{\begin{array}{c} F' \in ch_{\mathscr {F}}(F): \\ \pi _{\mathscr {G}} F'=G \end{array}} 1_{F'} \le g+ 2\sum _{F \in \mathscr {F}} \sum _{\begin{array}{c} G \in \mathscr {G}: \\ \pi _{\mathscr {F}} G = F \text { or} \\ G \in ch_{\mathscr {F}}(F) \end{array}} \langle g \rangle _{G}^{\nu }1_{G} \\&\quad \le g+ 4\sum _{G \in \mathscr {G}} \langle g \rangle _{G}^{\nu }1_{G}. \end{aligned}$$

Hence Theorem 2.2 implies again that

$$\begin{aligned} \Big \Vert \Big (\sum _{F \in \mathscr {F}} (g_{F})^{2}\Big )^{\frac{1}{2}} \Big \Vert _{L^{q'}(\nu )} \le \Big \Vert \sum _{F \in \mathscr {F}} g_{F} \Big \Vert _{L^{q'}(\nu )} \lesssim \Vert g \Vert _{L^{q'}(\nu )}. \end{aligned}$$

\(\square \)

4 Well Localized Operators

We turn our attention to the main result of this paper and first recall the definition of a well localized operator from [8]. Suppose that we have a linear function \(T^{\mu }\) mapping finite linear combinations of indicators \(1_{Q}, Q \in \mathscr {D}\), into locally \(\nu \)-integrable functions. Also it is assumed that we have a linear \(T^{\nu }\) mapping indicators \(1_{Q}, Q \in \mathscr {D}\), into locally \(\mu \)-integrable functions so that

$$\begin{aligned} \langle T^{\mu }1_{Q}, 1_{R}\rangle _{\nu }= \langle 1_{Q}, T^{\nu }1_{R}\rangle _{\mu } \end{aligned}$$

for all \(Q,R \in \mathscr {D}\). We call \(T^{\nu }\) and \(T^{\mu }\) formal adjoints of each other.

Definition 4.1

Fix some number \(r \in \{0,1,2,\dots \}\). The operator \(T^{\mu }\) is said to be lower triangularly localized if

$$\begin{aligned} \langle T^{\mu }1_{Q}, h^{\nu }_{R}\rangle _{\nu }=0 \end{aligned}$$

for all \(Q,R \in \mathscr {D}\) such that

  • \(l(R) \le 2l(Q)\) and \(R \not \subset Q^{(r+1)}\), or

  • \(l(R) \le 2^{-r}l(Q)\) and \(R \not \subset Q\).

The operator \(T^{\mu }\) is well localized if both \(T^{\mu }\) and \(T^{\nu }\) are lower triangularly localized.

Remark 1

This definition differs slightly from the definition given by Nazarov et al. [8]. The difference is that our condition “\(l(R) \le 2l(Q)\) and \(R \not \subset Q^{(r+1)}\)” above corresponds to “\(l(R) \le l(Q)\) and \(R \not \subset Q^{(r)}\)” in [8]. This modification seems necessary to handle the sum I in the proof of Theorem 4.2 below.

A special example of a well localized operator is the two weight formulation of a Haar multiplier. Suppose we are on \(\mathbb {R}\) for the moment and let \(h_{I}:=|I|^{-1/2}(1_{I_{1}}-1_{I_{2}})\) be the \(L^{2}\)- normalized Haar function of a dyadic interval \(I \in \mathscr {D}\). Let \(\{\lambda _{I}\}_{I \in \mathscr {D}}\) be a set of real numbers such that only finitely many \(\lambda _{I}\)s are non-zero. Then consider the operator \(T^{\mu }_{\lambda }\) defined for locally \(\mu \)-integrable functions by

$$\begin{aligned} T^{\mu }_{\lambda }f:= \sum _{I \in \mathscr {D}}\lambda _{I} \langle f, h_{I} \rangle _{\mu }h_{I}. \end{aligned}$$

We assumed finiteness of the coefficient sequence just to have the operator well defined in the general two weight setting, but of course it does not always have to be finite.

It is not difficult to see that in this case \(T^{\mu }_{\lambda }\) is a well localized operator with parameter \(r=0\). In a similar way one could see that dyadic shifts in \(\mathbb {R}^{n}\) (of which the Haar multiplier is an example) in this two weight formulation would be well localized operators. For a discussion about where the motivation for the definition of a well localized operator comes from we refer to [8]. There it is also shown that, more generally than just for dyadic shifts, two weight inequalities for the so called “band operators” can be seen as two weight inequalities for well localized operators.

The following main theorem characterizes the boundedness of well localized operators:

Theorem 4.2

Let \(1 < p,q < \infty \) be two exponents and suppose \(T^{\mu }\) is a well localized operator with a formal adjoint \(T^{\nu }\). The mapping \(T^{\mu }\) extends to a bounded operator \(T^{\mu }: L^{p}(\mu ) \rightarrow L^{q}(\nu )\) if and only if there exist constants \(C_{1},C_{2}>0\) such that for every finite subcollection \(\mathscr {D}_{0} \subset \mathscr {D}\) and every set of non-negative real numbers \(\{a_{Q}\}_{Q \in \mathscr {D}_{0}}\) the inequalities

$$\begin{aligned} \Big \Vert \Big ( \sum _{Q \in \mathscr {D}_{0}} ( 1_{R(Q)} T^{\mu }a_{Q}1_{Q})^{2} \Big )^{\frac{1}{2}} \Big \Vert _{L^{q}(\nu )} \le C_{1} \Big \Vert \Big ( \sum _{Q \in \mathscr {D}_{0}} (a_{Q}1_{Q})^{2} \Big )^{\frac{1}{2}} \Big \Vert _{L^{p}(\mu )} \end{aligned}$$
(4.1)

and

$$\begin{aligned} \Big \Vert \Big ( \sum _{Q \in \mathscr {D}_{0}} ( 1_{R(Q)} T^{\nu }a_{Q}1_{Q})^{2} \Big )^{\frac{1}{2}}\Big \Vert _{L^{p'}(\mu )} \le C_{2} \Big \Vert \Big ( \sum _{Q \in \mathscr {D}_{0}} (a_{Q}1_{Q})^{2} \Big )^{\frac{1}{2}} \Big \Vert _{L^{q'}(\nu )} \end{aligned}$$
(4.2)

hold. Here \(R(Q) \in \mathscr {D}_{0}\) is any cube of size \(l(R(Q))=l(Q)\).

Furthermore in the case \(T^{\mu }\) is bounded, its norm satisfies the estimate

$$\begin{aligned} \Vert T^{\mu }\Vert _{L^{p}(\mu ) \rightarrow L^{q}(\nu )} \simeq \mathcal {T}+\mathcal {T^{*}}, \end{aligned}$$

where \(\mathcal {T}\) and \(\mathcal {T^{*}}\) denote the best constants in (4.1) and (4.2), respectively.

Remark 2

The testing conditions (4.1) and (4.2) will be applied in the situation where the cubes \(R(Q)\in \mathscr {D}_{0}\) have side length 2l(Q). This is possible since every dyadic cube can be covered with its \(2^{n}\) children.

The cubes R(Q) appearing in the testing conditions can actually be assumed in a sense to be close to the cube Q. To be precise, when we use the full square function testing conditions (4.1) and (4.2), the cubes R(Q) satisfy \(l(R(Q))= l(Q)\) and \(R(Q) \subset Q^{(r+1)}\), where r is the parameter from the definition of the well localized operator.

Also reduced versions of Eqs. (4.1) and (4.2) where there is only one term in the sums, that is the Sawyer type testing, will be used. If the underlying dyadic system has an increasing sequence \(R_{0}\subset R_{1}\subset \dots \) of cubes so that \(\mathbb {R}^{n}=\cup _{k=0}^{\infty }R_{k}\), then Sawyer type testing will be used only with \(Q=R(Q)\).

But if the dyadic system does not have an increasing sequence of cubes covering the whole space, then the Sawyer type testing will be used when the cubes Q and R(Q) have equal side length \(l(Q)=l(R(Q))\) and are adjacent in the sense that \(\overline{Q} \cap \overline{R(Q)} \not =\emptyset \).

Remark 3

We explain here the consequence of Nazarov’s counterexample mentioned in the introduction. Namely, for any exponent \(1<p<\infty , \ p\not =2,\) the Sawyer type testing for Haar multipliers fails in the following quantitative sense: There does not exist a universal constant C such that for an arbitrary Haar multiplier \(T^{\mu }\), with a formal adjoint \(T^{\nu }\), the testing conditions

$$\begin{aligned} \Big \Vert T^{\mu }1_{Q}\Big \Vert _{L^{p}(\nu )} \le \mathcal {T} \mu (Q)^{\frac{1}{p}} \ \ (\text {for all } Q \in \mathscr {D}) \end{aligned}$$

and

$$\begin{aligned} \Big \Vert T^{\nu }1_{Q}\Big \Vert _{L^{p'}(\mu )} \le \mathcal {T}^{*} \nu (Q)^{\frac{1}{p'}} \ \ (\text {for all } Q \in \mathscr {D}) \end{aligned}$$

would imply that the operator \(T^{\mu }\) could be extended to a bounded operator \(T^{\mu }:L^{p}(\mu ) \rightarrow L^{p}(\nu )\) with norm at most \(C(\mathcal {T} + \mathcal {T}^{*})\). In other words, Theorem 4.2 fails for general exponents p if the square function testing is reduced to Sawyer type testing.

Proof of Theorem 4.2

That (4.1) and (4.2) hold if \(T^{\mu }\) is bounded follows again from Theorem 2.1, since the quantities on the left hand side of (4.1) and (4.2) can be made bigger by omitting the indicators \(1_{R(Q)}\). Hence, we only need to prove sufficiency, which we show next.

We fix two compactly supported and bounded functions \(f \in L^{p}(\mu )\) and \(g \in L^{q'}(\nu )\). There are at most \(2^{n}\) big cubes \(P_{1}, \dots , P_{2^{n}} \in \mathscr {D}\) that cover the supports of f and g. Perform the martingale decompositions

$$\begin{aligned} f= \sum _{i=1}^{2^{n}} \langle f \rangle ^{\mu }_{P_{i}}1_{P_{i}} + \sum _{\begin{array}{c} Q \in \mathscr {D}: \\ Q \subset \cup _{i=1}^{2^{n}} P_{i} \end{array}} \Delta ^{\mu }_{Q} f \end{aligned}$$

and

$$\begin{aligned} g= \sum _{i=1}^{2^{n}} \langle g \rangle ^{\nu }_{P_{i}}1_{P_{i}} + \sum _{\begin{array}{c} Q \in \mathscr {D}: \\ Q \subset \cup _{i=1}^{2^{n}} P_{i} \end{array}} \Delta ^{\nu }_{Q} g. \end{aligned}$$

We may furthermore assume that there are only finitely many terms in the decompositions of f and g, since these kind of functions are dense in \(L^{p}\). Accordingly every sum below is finite, so there are no convergence problems. Also, for clear notation, define the functions

$$\begin{aligned} \tilde{f}:=f -\sum _{i=1}^{2^{n}}\langle f\rangle ^{\mu }_{P_{i}}1_{P_{i}}, \ \ \ \tilde{g}:=g -\sum _{i=1}^{2^{n}}\langle g \rangle ^{\nu }_{P_{i}}1_{P_{i}}. \end{aligned}$$

We consider the pairing \(\langle T^{\mu }f, g \rangle _{\nu }\) and use the martingale difference decompositions to write it as

$$\begin{aligned}&\sum _{i=1}^{2^{n}} \Big \langle \langle f\rangle ^{\mu }_{P_{i}}T^{\mu }1_{P_{i}}, g \Big \rangle _{\nu } +\sum _{j=1}^{2^{n}} \Big \langle \tilde{f}, \langle g\rangle ^{\nu }_{P_{j}}T^{\nu }1_{P_{j}} \Big \rangle _{\mu } \nonumber \\&\quad +\, \sum _{Q,R \in \mathscr {D}} \Big \langle T^{\mu } \Delta ^{\mu }_{Q} \tilde{f} , \Delta ^{\nu }_{R} \tilde{g} \Big \rangle _{\nu }. \end{aligned}$$
(4.3)

Note that \(\Delta ^{\mu }_{Q} \tilde{f}\) can be non-zero only if \(Q \subset \cup _{i=1}^{2^{n}} P_{i}\), and similarly with \(\tilde{g}\).

As a direct application of the testing conditions (4.1) and (4.2) each term in the first two sums in (4.3) is bounded by a testing constant times the norms of f and g, and actually we need here only the Sawyer type testing. Since there are finitely many (depending on the dimension n) terms in those sums we see that they are bounded as we want.

Turn attention to the third sum in (4.3). It is further split according to the relative size of the cubes Q and R as

$$\begin{aligned} \sum _{\begin{array}{c} Q,R \in \mathscr {D}: \\ l(R) \le l(Q) \end{array}}\langle T^{\mu } \Delta ^{\mu }_{Q} \tilde{f} , \ \Delta ^{\nu }_{R} \tilde{g} \rangle _{\nu } +\sum _{\begin{array}{c} Q,R \in \mathscr {D}: \\ l(R) > l(Q) \end{array}} \langle \Delta ^{\mu }_{Q} \tilde{f} , \ T^{\nu }\Delta ^{\nu }_{R} \tilde{g} \rangle _{\mu }, \end{aligned}$$

and by symmetry we concentrate on the first half.

So suppose we have two cubes \(Q, R \in \mathscr {D}\) with \(l(R) \le l(Q)\). For \(\langle T^{\mu } \Delta ^{\mu }_{Q} \tilde{f} , \ \Delta ^{\nu }_{R} \tilde{g} \rangle _{\nu }\) to be non-zero it must first of all be, because \(T^\mu \) is well localized, that \(R \subset Q^{(r)}\). Also, if \(l(R) < 2^{-r}l(Q)\), then \(R \subset Q\). Hence the summation we are considering can be split as

$$\begin{aligned} \sum _{ \begin{array}{c} Q,R \in \mathscr {D}: \\ R \subset Q^{( r )}, \\ l(Q)\ge l( R ) \ge 2^{-r}l(Q) \end{array}} \langle T^{\mu } \Delta ^{\mu }_{Q} \tilde{f} , \ \Delta ^{\nu }_{R} \tilde{g} \rangle _{\nu } +\sum _{ \begin{array}{c} Q,R \in \mathscr {D}: \\ R \subset Q, \\ l(R) < 2^{-r}l(Q) \end{array}} \langle T^{\mu } \Delta ^{\mu }_{Q} \tilde{f} , \ \Delta ^{\nu }_{R} \tilde{g} \rangle _{\nu } =: I+II. \end{aligned}$$

4.1 Estimation of I

For each \(Q\in \mathscr {D}\) there are at most N(r) cubes R such that \(R \subset Q^{(r)}\) and \(2^{-r}l(Q) \le l(R) \le l(Q)\). Hence the sum I may be divided into N(r) different sums of the form

$$\begin{aligned} \sum _{Q \in \mathscr {D}} \langle T^{\mu } \Delta ^{\mu }_{Q} \tilde{f} , \ \Delta ^{\nu }_{R(Q)} \tilde{g} \rangle _{\nu }, \end{aligned}$$
(4.4)

where \(R(Q) \in \mathscr {D}\) is a cube such that \(R \subset Q^{(r)}\) and \(2^{-r}l(Q) \le l(R) \le l(Q)\).

Let \(Q^1,\dots ,Q^{2^n}\) denote the children ch(Q) of Q. Then concerning (4.4) we have

$$\begin{aligned}&\Big | \sum _{Q \in \mathscr {D}} \langle T^{\mu } \Delta ^{\mu }_{Q} \tilde{f} , \ \Delta ^{\nu }_{R(Q)} \tilde{g} \rangle _{\nu } \Big | \\&\quad \le \Big \Vert \Big ( \sum _{Q \in \mathscr {D}} ( 1_{R(Q)} | T^{\mu } \Delta ^{\mu }_{Q} \tilde{f} |^{2} \Big )^{\frac{1}{2}} \Big \Vert _{L^{q}(\nu )} \Big \Vert \Big ( \sum _{Q \in \mathscr {D}} ( | \Delta ^{\nu }_{R(Q)} \tilde{g} |^{2} \Big )^{\frac{1}{2}} \Big \Vert _{L^{q'}(\nu )} \\&\quad \lesssim \sum _{k=1}^{2^{n}} \Big \Vert \Big ( \sum _{Q \in \mathscr {D}} ( 1_{R(Q)} | T^{\mu } \langle \Delta ^{\mu }_{Q} \tilde{f}\rangle ^{\mu }_{Q^{k}} 1_{Q^{k}} |^{2} \Big )^{\frac{1}{2}} \Big \Vert _{L^{q}(\nu )} \Vert \tilde{g} \Vert _{L^{q'}(\nu )} \\&\quad \lesssim C_{1} \sum _{k=1}^{2^{n}} \Big \Vert \Big ( \sum _{Q \in \mathscr {D}} | \langle \Delta ^{\mu }_{Q} \tilde{f}\rangle ^{\mu }_{Q^{k}} 1_{Q^{k}} |^{2} \Big )^{\frac{1}{2}} \Big \Vert _{L^{p}(\mu )} \Vert \tilde{g} \Vert _{L^{q'}(\nu )} \lesssim C_{1} \Vert \tilde{f} \Vert _{L^{p}(\mu )} \Vert \tilde{g} \Vert _{L^{q'}(\nu )} \\&\quad \lesssim C_1\Vert f \Vert _{L^{p}(\mu )} \Vert g \Vert _{L^{q'}(\nu )}. \end{aligned}$$

4.2 Estimation of II

Suppose we have two cubes \(Q,R \in \mathscr {D}\) such that \(R \subset Q\) and \(l(R)<2^{-r}l(Q)\). Denote by \(Q_R\) the child \(Q' \in ch(Q)\) that contains R. Then, because \(T^\mu \) is well localized, it holds that

$$\begin{aligned} \langle T^{\mu } \Delta ^{\mu }_{Q} \tilde{f} , \ \Delta ^{\nu }_{R} \tilde{g} \rangle _{\nu } = \langle \Delta ^{\mu }_{Q}\tilde{f} \rangle _{Q_R}^{\mu } \langle T^{\mu } 1_{R^{(r)}} , \ \Delta ^{\nu }_{R} \tilde{g} \rangle _{\nu }. \end{aligned}$$

Hence

$$\begin{aligned} \sum _{ \begin{array}{c} Q,R \in \mathscr {D}: \\ R \subset Q, \\ l(R) < 2^{-r}l(Q) \end{array}} \langle T^{\mu } \Delta ^{\mu }_{Q} \tilde{f} , \ \Delta ^{\nu }_{R} \tilde{g} \rangle _{\nu }= & {} \sum _{R \in \mathscr {D}} \sum _{\begin{array}{c} Q \in \mathscr {D}: Q \supsetneq R^{(r)} \end{array}} \langle \Delta ^{\mu }_{Q}\tilde{f} \rangle _{Q_R}^{\mu } \langle T^{\mu } 1_{R^{(r)}} , \ \Delta ^{\nu }_{R} \tilde{g} \rangle _{\nu } \nonumber \\&=\sum _{R \in \mathscr {D}} \langle \tilde{f} \rangle ^\mu _{R^{(r)}} \langle T^{\mu } 1_{R^{(r)}} , \ \Delta ^{\nu }_{R} \tilde{g} \rangle _{\nu } \nonumber \\&= \sum _{R \in \mathscr {D}} \langle \tilde{f} \rangle ^\mu _{R} \langle T^{\mu } 1_{R} , \ P^\nu _{R,r} \tilde{g} \rangle _{\nu }, \end{aligned}$$
(4.5)

where

$$\begin{aligned} P^\nu _{R,r}\tilde{g}:=\sum _{\begin{array}{c} R' \in \mathscr {D}: \\ R'^{(r)}=R \end{array}} \Delta ^{\nu }_{R'}\tilde{g}. \end{aligned}$$

Let \(\mathscr {D}_0:= \{Q \in \mathscr {D}: Q \subset \bigcup _{i=1}^{2^n}P_i\}\). Construct the principal cubes \(\mathscr {F}\) for the function \(\tilde{f}\) in the collection \(\mathscr {D}_0\) as in Sect. 2.1. Note that if \(\langle \tilde{f} \rangle ^{\mu }_{Q}\not =0\) for some \(Q \in \mathscr {D}\), then \(Q \in \mathscr {D}_0\).

Suppose \(F \in \mathscr {F}\) and \(Q \in \mathscr {D}_0\) are such that \(\pi _\mathscr {F}Q =F\). Then there exists a constant \(c_Q \in [-2,2]\) so that \(c_Q\langle \tilde{f}\rangle ^{\mu }_Q = \langle |\tilde{f}|\rangle ^\mu _F\). Note also, that if \(Q,R \in \mathscr {D}\) are two cubes such that \(Q \supset R\), then

$$\begin{aligned} \langle T^{\mu } 1_{R} , \ P^\nu _{R,r} \tilde{g} \rangle _{\nu } =\langle T^{\mu } 1_{Q} , \ P^\nu _{R,r} \tilde{g} \rangle _{\nu } \end{aligned}$$

because \(T^\mu \) is well localized.

Hence, continuing from (4.5), we have

$$\begin{aligned} \Big |\sum _{R \in \mathscr {D}}&\langle \tilde{f} \rangle ^\mu _{R} \langle T^{\mu } 1_{R} , P^\nu _{R,r} \tilde{g} \rangle _{\nu } \Big | =\Big |\sum _{F \in \mathscr {F}} \langle |\tilde{f}| \rangle ^\mu _{F}\sum _{\begin{array}{c} R \in \mathscr {D}_0: \\ \pi _\mathscr {F}R=F \end{array}} \langle 1_F T^{\mu } 1_{F} , c_R P^\nu _{R,r} \tilde{g} \rangle _{\nu } \Big | \\&\le \Big \Vert \Big ( \sum _{F \in \mathscr {F}} |\langle |\tilde{f}| \rangle ^\mu _{F}T^{\mu }1_F|^2 \Big )^\frac{1}{2}\Big \Vert _{L^q(\nu )} \Big \Vert \Big ( \sum _{F \in \mathscr {F}} |\sum _{\begin{array}{c} R \in \mathscr {D}_0: \\ \pi _\mathscr {F}R=F \end{array}} c_R P^\nu _{R,r} \tilde{g}|^2 \Big )^\frac{1}{2}\Big \Vert _{L^{q'}(\nu )}. \end{aligned}$$

We note here that this trick of introducing the constants \(c_Q\) is from [5].

From the testing condition (4.1) and the Carleson embedding theorem 2.2 it follows that

$$\begin{aligned} \Big \Vert \Big ( \sum _{F \in \mathscr {F}} |\langle |\tilde{f}| \rangle ^\mu _{F}T^{\mu }1_F|^2 \Big )^\frac{1}{2}\Big \Vert _{L^q(\nu )}&\le C_1\Big \Vert \Big ( \sum _{F \in \mathscr {F}} (\langle |\tilde{f}| \rangle ^\mu _{F})^21_F \Big )^\frac{1}{2}\Big \Vert _{L^p(\mu )} \\&\le C_1 \Big \Vert \sum _{F \in \mathscr {F}} \langle |\tilde{f}| \rangle ^\mu _{F}1_F\Big \Vert _{L^p(\mu )} \\&\lesssim C_1 \Vert \tilde{f}\Vert _{L^p(\mu )} \lesssim C_1 \Vert f\Vert _{L^p(\mu )}. \end{aligned}$$

With a similar computation as in the proof of Theorem 2.1, the Kahane-Khinchine inequality (2.1) and Eq. (2.3) for computing \(L^p\)-norms with martingale differences give

$$\begin{aligned} \Big \Vert \Big ( \sum _{F \in \mathscr {F}} |\sum _{\begin{array}{c} R \in \mathscr {D}_0: \\ \pi _\mathscr {F}R=F \end{array}} c_R P^\nu _{R,r} \tilde{g}|^2 \Big )^\frac{1}{2}\Big \Vert _{L^{q'}(\nu )}&\simeq \mathbb {E}\Big \Vert \sum _{F \in \mathscr {F}} \varepsilon _F \sum _{\begin{array}{c} R \in \mathscr {D}_0: \\ \pi _\mathscr {F}R=F \end{array}} c_R P^\nu _{R,r} \tilde{g} \Big \Vert _{L^{q'}(\nu )} \\&\lesssim \Vert \tilde{g}\Vert _{L^{q'}(\nu )} \lesssim \Vert g\Vert _{L^{q'}(\nu )}. \end{aligned}$$

This concludes the estimation of the sum II, and hence also the proof of Theorem 4.2.

Remark 4

Let \(\mathscr {D}_0 \subset \mathscr {D}\) be any finite subcollection. Define a paraproduct operator by

$$\begin{aligned} \Pi ^{\mu ,\nu }_{T,\mathscr {D}_{0}}f:=\sum _{Q \in \mathscr {D}_{0}} \langle f \rangle _{Q}^{\mu } \sum _{R \in ch^{(r)} Q} \Delta ^{\nu }_{R} T^{\mu }1_{Q}. \end{aligned}$$

From the testing condition (4.1) it follows, with similar arguments as in the estimation of II, that

$$\begin{aligned} \Vert \Pi ^{\mu ,\nu }_{T,\mathscr {D}_{0}}\Vert _{L^p(\mu ) \rightarrow L^q(\nu )} \lesssim C_1, \end{aligned}$$

where the implied constant does not depend on the collection \(\mathscr {D}_0\). Conversely, by slightly modifying the proof of Theorem 4.2, the part that corresponds to the estimation of the sum II follows from this fact. See [8] where this was done in the case \(p=q=2\).

\(\square \)