1 Introduction

A classical problem in Calderón–Zygmund theory is the control of a given singular operator by means of a maximal type operator. As a model example of this phenomenon, we can take the classical Coifman–Fefferman inequality involving a Calderón–Zygmund (C–Z) operator and the usual Hardy–Littlewood maximal operator \(M\) (see [7]).

Theorem 1

(Coifman–Fefferman) For any weight \(w\) in the Muckenhoupt class \(A_{\infty }\), the following norm inequality holds:

$$\begin{aligned} \Vert T^*f\Vert _{L^p(w)}\le c \,\Vert Mf\Vert _{L^p(w)}, \end{aligned}$$
(1)

where \(0<p<\infty \) and \(c=c_{n,w,p}\) is a positive constant depending on the dimension \(n\), the exponent \(p\) and the weight \(w\).

We use here the standard notation \(T^*\) for the maximal singular integral operator of  \(T\)\(T^*f(x)=\sup _{\varepsilon >0}|T_{\varepsilon }f(x)|\), where \(T_\varepsilon \) is, as usual, the truncated singular integral. This theorem says that the maximal operator \(M\) plays the role of a “control operator” for C–Z operators, but the dependence of the constant \(c\) on both \(w\) and \(p\) is not precise enough for some applications. The original proof was based on the good–\(\lambda \) technique introduced by Burkholder and Gundy [4]. The goal is to prove that the following estimate holds

$$\begin{aligned} \left| \{x\in \mathbb{R }^n: T^*f(x)>2\lambda , Mf(x) \le \gamma \lambda \}\right| \le c\gamma \left| \{x\in \mathbb{R }^n: T^*(x)>\lambda \}\right| \end{aligned}$$
(2)

for any \(\lambda >0\) and for sufficiently small \(\gamma >0\). Very roughly, the main idea to prove (2) is to localize the level set \(\{x\in \mathbb{R }^n: T^*f(x)>\lambda \}\) by means of Whitney cubes. Then the problem is reduced to study a local estimate of the form

$$\begin{aligned} |\{x\in Q: T^*f(x)>2\lambda , Mf(x)\le \gamma \lambda \}|\le c\,\gamma |Q|, \end{aligned}$$
(3)

where \(Q\) is a cube from the Whitney decomposition and where \(f\) is supported on \(Q\) and by standard methods, weighted norm inequalities for \(T\) and \(M\) can be derived.

In this paper we focus our attention on the growth rate of \(\gamma \). In fact (3) is too rough since the constant \(c=c_{n, p,w}\) obtained in (1) is not sharp neither on the \(A_{{\infty }}\) constant of the weight nor on \(p\) as shown by Bagby and Kurtz [2].

Pursuing the sharp dependence on the \(A_p\) constant of the weight \(w\) for the operator norm of singular integrals, Buckley improved this good–\(\lambda \) inequality (3) (see [3]), obtaining a local exponential decay in \(\gamma \) in the following way:

$$\begin{aligned} |\{x\in Q: T^*f(x)>2\lambda , Mf(x)\le \gamma \lambda \}|\le c\,e^{-c/\gamma } |Q|. \end{aligned}$$
(4)

Buckley proved this estimate using as a model a more classical inequality due to Hunt for the conjugate function which was inspired by a result of Carleson [5]. We mention here in passing that this optimal weighted dependence, called the \(A_2\) conjecture, has been proved recently and by different means by Hytönen [18] (see also [15, 17] and [16] for a further improvement and the recent work [22] for a very interesting simplication of the proof of the \(A_2\) conjecture). On the other hand, this exponential decay (4) has been a crucial step in deriving corresponding sharp \(A_{1}\) estimate in [25, 27].

Our point of view is different and has been motivated by an improved version of inequality (4) due to Karagulyan [21]:

$$\begin{aligned} |\{x\in Q: T^*f(x)>t Mf(x)\}|\le c e^{-\alpha t}\, |Q|, \quad t>0 \end{aligned}$$
(5)

However, it is not clear that the proof can be adapted to other situations.

In the present article we present a new approach flexible enough to derive corresponding estimates for other operators. Furthermore, our approach allows to recognize and distinguish a notion of “order of singularity” for each operator. To be more precise and as a model, we consider a pair of operators \(T_1\) an \(T_2\), and consider for a fixed cube \(Q\) the level set function

$$\begin{aligned} \varphi (t):=\frac{1}{|Q|}|\{x\in Q: |T_1f(x)|> t|T_2f(x)|\}|, \quad t>0 \end{aligned}$$
(6)

where \(f\) stands for a function, an \(m\)-vector of functions or an infinite sequence, depending on the type of operators involved. In any case, all the coordinate functions are assumed to be supported on \(Q\). We will provide sharp estimates on the decay rate for \(\varphi (t)\) in different instances of \(T_1\) and \(T_2\), including the case of C–Z operators, vector-valued extensions of the maximal function or C–Z operators, commutators of singular integrals with BMO functions and higher order commutators. We also provide estimates for dyadic and continuous square functions and for multilinear C–Z operators. We summarize this different decay rates and the maximal operators involved in Table 1 below (see Sect. 3 for the precise definitions). Observe that each operator has its maximal operator acting as a control operator and, further, has its specific decay rate for the corresponding level set function \(\varphi (t)\).

Table 1 Order of singularity for several operators

In this work we will present two different approaches, both based on the use of Lerner’s formula (see Theorem 10), which is a very powerful and successful method as we can see in several recent situations (see [8, 9]). Roughly, the first approach allows us to derive the exponential decay whenever there is a superlinear rate, namely, in all the cases except for the commutators. This method, although it is far from being trivial, can be seen as the natural way to exploit Lerner’s formula to obtain the exponential decay. However, it fails when we consider the case of commutators. Hence, to be able to tackle this latter case, we develop a different method, which is the more original and substantial contribution of the present article: a novel approach and a different type of proof based on weighted estimates. This second approach uses Lerner’s formula to derive suitable local versions of weighted norm inequalities of Coifman–Fefferman type. This, combined with factorization arguments, gives all the results, including commutators of any order. In addition, we present here a sort of “template”, a general scheme that can be applied to any pair of operators fulfilling certain general hypothesis.

The paper is organized as follows. In Sect. 2 we present the precise statement of our results. In Sect. 3 we include some preliminary definitions and tools needed in the sequel. In Sect. 4 we present our first approach and provide the proofs of the “superlinear” results. In Sect. 5 we present our second approach and prove all the results of the paper. In this final section we also include some background on weights and, in addition, some new extensions of classical results.

2 Statement of the main results

In this section we present the precise statement of the main results of this paper. We start with a general result involving a generic a function \(f\) and its local maximal function \(M_{\lambda ;Q}^\#f\) in a given cube \(Q\) (see Sect. 3 for the precise definitions).

\(\bullet \) The key estimate: a John-Strömberg-Fefferman–Stein type inequality

Theorem 2

Let \(Q\) be a cube and let \(f\in L_{c}^\infty (\mathbb{R }^n)\) such that \(supp (f)\subseteq Q\). Then there are constants \(\alpha ,\ c>0\) such that

$$\begin{aligned} |\{x\in Q: |f(x)-m_f(Q)|>tM_{2^{-n-2};Q}^\#(f)(x) \}|\le c e^{-\alpha t}|Q|,\quad t>0. \end{aligned}$$
(7)

Such an estimate involving a function controlled in some sense by its sharp maximal function is surely related to Fefferman–Stein inequality, but the version we present here with the local sharp maximal function goes back to the work of Strömberg [36] and Jawerth and Torchinsky [19].

Once we have such a general theorem, we can derive the results announced in the introduction for a wide class of singular operators. The idea is to apply the theorem to a given singular operator \(\mathcal T \) and then use the key tool: a pointwise estimate of the form \(M_{2^{-n-2};Q}^\#(\mathcal T f)(x)\le c\, \mathfrak M (f)(x)\), where \(\mathfrak M \) is an appropriate maximal operator.

More precisely, we have the following theorems:

\(\bullet \) Calderón–Zygmund operators.

Theorem 3

Let \(T\) be a C–Z operator with maximal singular integral operator \(T^*\). Let \(Q\) be a cube and let \(f\in L_{c}^\infty (\mathbb{R }^n)\) such that \(supp (f)\subseteq Q\). Then there are constants \(\alpha , c >0\) such that

$$\begin{aligned} |\{x\in Q: |T^*f(x)|>tMf(x) \}|\le c e^{-\alpha t}|Q|,\quad t>0. \end{aligned}$$
(8)

\(\bullet \) Calderón–Zygmund multilinear operators.

Theorem 4

Let \(T\) be a \(m\)-linear C–Z operator. Let \(Q\) be a cube and let \(\mathbf{{f}}\,\) be vector of \(m\) functions \(f_j\in L_{c}^\infty (\mathbb{R }^n)\) such that \(supp (f_j)\subseteq Q\) for \(1\le j\le m\). Then there are constants \(\alpha , \ c>0\) such that

$$\begin{aligned} |\{x\in Q: |T\mathbf{{f}}\,(x)|>t\mathcal{M }\mathbf{{f}}\,(x) \}|\le c e^{-\alpha t}|Q|,\quad t>0. \end{aligned}$$
(9)

\(\bullet \) Vector-valued extensions.

Theorem 5

Let \(1<q<\infty \) and let \(\overline{T}_{q}\) be the vector-valued extension of \(T\), where \(T\) is a C–Z operator. Then there are constants \(\alpha , \ c>0\) such that for any cube \(Q\) and any vector-function \(f=\{f_j\}_{j=1}^{\infty }\) with \(supp \,f\subseteq Q\):

$$\begin{aligned} |\{x\in Q: \overline{T}_{q}f(x)> tM(|f|_q)(x)\}|\le c e^{-\alpha t}\, |Q|,\quad t>0. \end{aligned}$$
(10)

Theorem 6

Let \(1<q<\infty \) and let  \(\overline{M}_{q}\) be the vector-valued extension of \(M\). Then there are constants \(\alpha , \ c>0\) such that for any cube \(Q\) and any vector-function \(f=\{f_j\}_{j=1}^{\infty }\) with \(supp \,f\subseteq Q\):

$$\begin{aligned} |\{x\in Q: \overline{M}_{q}f(x)> tM(|f|_q)(x)\}|\le c e^{-\alpha t^q}\, |Q|,\quad t>0. \end{aligned}$$
(11)

\(\bullet \) Littlewood–Paley square functions.

Theorem 7

Let \(S\) be the dyadic square function and let \(g_{\mu }^*\) be the continuous Littlewood–Paley square function, with \(\mu >3\). Let \(Q\) be a cube and let \(f\in L_{c}^\infty (\mathbb{R }^n)\) such that \(supp (f)\subseteq Q\). Then there are constants \(\alpha , \ c>0\) such that

$$\begin{aligned} |\{x\in Q: Sf(x)>tMf(x) \}|\le c e^{-\alpha t^2}|Q|,\quad t>0. \end{aligned}$$
(12)

and

$$\begin{aligned} |\{x\in Q: g_{\mu }^*(f)(x)>tMf(x) \}|\le c e^{-\alpha t^2}|Q|,\quad t>0. \end{aligned}$$
(13)

We also present here the result for commutators, although it will not follow from Theorem 2. We will prove this theorem following the “weighted approach” announced in the introduction.

\(\bullet \) Commutators.

Theorem 8

Let \(T\) be a Calderón–Zygmund operator an let \(b\) be in \(BMO\). Let \(f\) be a function such that \(supp \,f\subseteq Q\). Then there are constants, such that

$$\begin{aligned} |\{x\in Q: |[b,T]f(x)|>tM^2f(x)\}|\le c e^{-\sqrt{\alpha t\Vert b\Vert _{BMO}}}\, |Q|, \quad t>0. \end{aligned}$$
(14)

Similarly, for higher commutators we have

$$\begin{aligned} |\{x\in Q: |T_b^kf(x)|> tM^{k+1}f(x)\}|\le c e^{-(\alpha t\Vert b\Vert _{BMO})^{1/(k+1)}}\, |Q|, \end{aligned}$$
(15)

for all \(t>0\).

3 Preliminaries and notation

In this section we gather some well known definitions and properties which will be used along this paper. We will adopt the usual notation \(f_Q= \frac{1}{|Q|}\int _Q f(y)\, dy\) for the average over a cube \(Q\) of a function \(f\).

3.1 Maximal functions

Given a locally integrable function \(f\) on \(\mathbb{R }^n\), the Hardy–Littlewood maximal operator \(M\) is defined by

$$\begin{aligned} Mf(x)=\sup _{Q\ni x}{\frac{1}{|Q|}}\int _{Q} f(y), dy, \end{aligned}$$

where the supremum is taken over all cubes \(Q\) containing the point \(x\). For \(\varepsilon >0\), we define:

$$\begin{aligned} M_\varepsilon f(x)=(M (|f|^\varepsilon )(x))^{1/\varepsilon }. \end{aligned}$$

The usual sharp maximal function of Fefferman–Stein is defined as:

$$\begin{aligned} M^{\#}(f)(x)=\sup _{Q\ni x}\inf _c\frac{1}{|Q|}\int _Q |f(y)-c|\, dy, \end{aligned}$$

We will also use the following operator:

$$\begin{aligned} M^{\#}_\delta (f)(x)=\sup _{Q\ni x}\inf _c\left( \frac{1}{|Q|}\int _Q |f(y)-c|^\delta \, dy,\right) ^\frac{1}{\delta }. \end{aligned}$$

If the supremum is restricted to the dyadic cubes, we will use respectively the following notation \(M^d\), \(M_\delta ^{\#,d}\) and \(M_\delta ^{d}\). We will also need to consider iterations of maximal functions. Let \(M^k\) be defined as

$$\begin{aligned} M^k:= M\circ \cdot \cdot \circ M \qquad (k\, times). \end{aligned}$$

In addition, for a given cube \(Q\), we will consider local maximal functions. For a fixed cube \(Q\), we will denote by \(\mathcal D (Q)\) to the family of all dyadic subcubes with respect to the cube \(Q\). The maximal function \(M^Q\) is defined by

$$\begin{aligned} M^Qf(x)=\sup _{P\in \mathcal{D }(Q), P\ni x}{\frac{1}{|P|}}\int _{P} f(y), dy. \end{aligned}$$

Similarly, \(M_\delta ^Q\), \(M^{\#,Q}\) and \(M^{\#,Q}_\delta \) are defined in the same way as above.

We introduce the following notation: for a given vector–valued function \({f}=(f_j)_{j=1}^{\infty }\) we denote

$$\begin{aligned} |f(x)|_q:= \left( \sum _{j=1}^{\infty } |f_j(x)|^q \right) ^{1/q}. \end{aligned}$$

Then, the classical vector-valued extension of the maximal function introduced by Fefferman and Stein [10] can be written as follows:

$$\begin{aligned} \overline{M}_qf(x)=\left( \sum _{j=1}^{\infty } (Mf_j(x))^q \right) ^{1/q}=|Mf(x)|_q, \end{aligned}$$

where \({f}=\{f_j\}_{j=1}^{\infty }\) is a vector–valued function.

Within the multilinear setting, the appropriate maximal function \(\mathcal{M }\) for a \(m\)-vector \(\mathbf{{f}}\,\) of \(m\) functions \(\mathbf{{f}}\,=(f_{1},\dots ,f_{m})\) is defined as

$$\begin{aligned} \mathcal{M }(\mathbf{{f}}\,)(x)=\sup _{Q\ni x} \prod _{i=1}^m\frac{1}{|Q|}\int _{Q}|f_{i}(y_{i})|\ dy_{i}. \end{aligned}$$
(16)

Note that this operator is pointwise smaller than the \(m\)-fold product of \(M\). This maximal operator was introduced in [28] where it is shown that is the “correct” maximal operator controlling the multilinear C–Z operators.

3.2 Calderón–Zygmund operators

We will use standard well known definitions, see for instance [12, 20]. Let \(K(x,y)\) be a locally integrable function defined of the diagonal \(x=y\) in \(\mathbb{R }^n\times \mathbb{R }^n\), which satisfies the size estimate

$$\begin{aligned} |K(x,y)|\le {\frac{c}{|x-y|^n}}, \end{aligned}$$
(17)

and for some \(\varepsilon >0\), the regularity condition

$$\begin{aligned} |K(x,y)-K(z,y)|+|K(y,x)-K(y,z)|\le c{\frac{|x-z|^{\varepsilon }}{|x-y|^{n+\varepsilon }}}, \end{aligned}$$
(18)

whenever \(2|x-z|<|x-y|\).

A linear operator \(T:C_{c}^{\infty }(\mathbb{R }^n)\longrightarrow L_{loc}^{1}(\mathbb{R }^n)\) is a Calderón–Zygmund operator if it extends to a bounded operator on \(L^2(\mathbb{R }^n)\), and there is a kernel \(K\) satisfying (17) and (18) such that

$$\begin{aligned} Tf(x)=\int _{\mathbb{R }^n}K(x,y)f(y)\, dy, \end{aligned}$$
(19)

for any \(f\in C_{c}^{\infty }(\mathbb{R }^n)\) and \(x\notin supp(f)\).

Given a C–Z operator \(T\) we define as usual the vector–valued extension \(\overline{T}_{q}\) as

$$\begin{aligned} \overline{T}_{q}f(x)= \left( \sum _{j=1}^{\infty } | Tf_{j}(x) |^{q} \right) ^{1/q}=|Tf(x)|_q, \end{aligned}$$

where \({f}=\{f_j\}_{j=1}^{\infty }\) is a vector–valued function.

We will also study the problem in the multilinear setting, considering multilinear C–Z operators acting on product Lebesgue spaces. Let \(T\) be an operator initially defined on th \(m\)-fold product of Schwartz spaces and taking values into the space of tempered distributions,

$$\begin{aligned} T:\mathcal{S }(\mathbb{R }^n)\times \dots \times \mathcal{S }(\mathbb{R }^n)\rightarrow \mathcal{S }^{\prime }(\mathbb{R }^n). \end{aligned}$$

We say that \(T\) is an \(m\)-linear C–Z operator if, for some \(1\le q_j <\infty \), it extends to a bounded multilinear operator from \(L^{q_1}\times \dots \times L^{q_m}\) to \(L^q\), where \(\frac{1}{q}=\frac{1}{q_1}+\dots + \frac{1}{q_m}\) and if there exists a function \(K\) defined off the diagonal \(x=y_1=\dots =y_m\) in \((\mathbb{R }^n)^{m+1}\), satisfying

$$\begin{aligned} T(f_1,\dots ,f_m)(x)=\int _{(\mathbb{R }^n)^m}K(x,y_1,\dots ,y_m) f_1(y_1)\dots f_m(y_m)\ dy_1\dots dy_m \end{aligned}$$

for all \(x\notin \bigcap _{j=1}^m \text{ supp }f_j\). We refer to [14] and [28] for a detailed treatment of these operators.

3.3 Commutators

Let \(T\) be any operator and let \(b\) be any locally integrable function. The commutator operator \([b, T]\) is defined by

$$\begin{aligned}{}[b,T]f=b\,T(f)-T(bf). \end{aligned}$$

If \(b\in BMO\) and \(T\) is a C–Z operators these operators were considered by Coifman, Rochberg and Weiss. These operators are more singular than a C–Z operator, a fact that can be seen from the following version of the classical result of Coifman and Fefferman (1) for commutators proved by the second author in [32]. One of the main points of this paper is that there is an intimate connection between these commutators and iterations of the Hardy-Littlewood maximal operator.

An important point is that these operators are not of weak type \((1,1)\), but we do have the following substitute inequality.

Theorem 9

[31] Let \(b\) be a \(BMO\) function and let \(T\) be a C–Z operator. Defined the function \(\phi (t)\) as follows \(\phi (t)= t(1+\log ^{+}t)\), there exists a positive constant \(c=c_{\Vert b\Vert _{BMO}}\) such that for all compactly supported function \(f\) and for all \(\lambda >0\),

$$\begin{aligned} \left| \{x\in \mathbb{R }^n: |[b,T]f(x)|>\lambda \}\right| \le c\, \int _{\mathbb{R }^n}\phi \left( \dfrac{|f(x)|}{\lambda }\right) \,dx. \end{aligned}$$

A natural generalization of the commutator \([b,T]\) is given by \(T^k_{b}:=[b,T_{b}^{k-1}]\)\(k\in \mathbb N \) and more explicitly by,

$$\begin{aligned} T^k_b f(x)=\int _\mathbb{R ^n} (b(x)-b(y))^k K(x,y)f(y)\,dy. \end{aligned}$$

We call them higher order commutators and the case \(k=0\) recaptures the Calderón–Zygmund singular integral operator, and for \(k=1\) we get the commutator operator defined before. It is shown in [32] that for any \(0<p<\infty \) and any \(w\in A_{\infty }\) there is a constant \(C\) such that. Again, this inequality is sharp since, \(M^{k+1}\) can not be replaced by the smaller operator \(M^k\).

3.4 Littlewood-Paley square functions

Let \(\mathcal D \) denote the collection of dyadic cubes in \(\mathbb{R }^n\). Given \(Q\in \mathcal D \), let \(\widehat{Q}\) be its dyadic parent, i.e., the unique dyadic cube containing \(Q\) such that \(|\widehat{Q}|=2^n |Q|\). The dyadic square function is the operator

$$\begin{aligned} S_df(x) = \left( \sum _{Q\in \mathcal D }(f_Q-f_{\widehat{Q}})^2\chi _Q(x)\right) ^{1/2}, \end{aligned}$$

where as usual \(f_Q\) denotes the average of \(f\) over \(Q\). For the properties of the dyadic square function we refer the reader to Wilson [37].

We will also use the following continuous and more classical version of the square function:

$$\begin{aligned} g_{\lambda }^*(f)(x) = \left( \int _{0}^{\infty } \int _{\mathbb{R }^n} |\phi _t*f(y)| ^2 \left( \frac{t}{t+|x-y|}\right) ^{n\lambda } \frac{dy\,dt}{t^{n+1}} \right) ^{1/2}, \end{aligned}$$
(20)

where \(\phi \in \mathcal S \), \(\int \phi \,dx=0\), \(\phi _t(x)=\frac{1}{t^n}\phi (\frac{x}{t})\), and \(\lambda >2\) (see [35]).

3.5 Lerner’s formula

In this subsection, we will state a result from [24] which will be fundamental in our proofs. This result is known as “Lerner’s formula”, and allows to obtain a decomposition of a function \(f\) that can be seen as a sophisticated Calderón–Zygmund decomposition of that function at all scales.

In order to state Lerner’s result, we need to introduce the main objects involved. For a given a cube \(Q\), the median value \(m_f(Q)\) of \(f\) over \(Q\) is a, possibly non-unique, number such that

$$\begin{aligned} |\{x\in Q:f(x)>m_f(Q)\}|\le |Q|/2 \end{aligned}$$

and

$$\begin{aligned} |\{x\in Q:f(x)<m_f(Q)\}|\le |Q|/2. \end{aligned}$$

The mean local oscillation of a measurable function \(f\) on a cube \(Q\) is defined by the following expression

$$\begin{aligned} \omega _{\lambda }(f;Q)=\inf _{c\in \mathbb{R }}((f-c)\chi _{Q})^{*}(\lambda |Q|), \end{aligned}$$

for all \(0<\lambda <1\), and the local sharp maximal function on a fixed cube \(Q_0\) is defined as

$$\begin{aligned} M^{\#}_{\lambda ;Q_0}f(x)=\sup _{x\in Q\subset Q_0}\omega _{\lambda }(f;Q), \end{aligned}$$

where the supremum is taken over all cubes \(Q\) contained in \(Q_0\) and such that \(x\in Q\). Here \(f^*\) stands for the usual non-increasing rearrangement of \(f\). We will use several times that for any \(\delta >0\), and \(0<\lambda \le 1\),

$$\begin{aligned} (f\chi _{Q })^*(\lambda |Q|)\le \left( \frac{1}{\lambda |Q|}\int _Q|f|^{\delta }\,dx\right) ^{1/{\delta }}, \end{aligned}$$
(21)

and, as a consequence, that

$$\begin{aligned} |m_f(Q)|\le \left( \frac{2}{|Q|}\int _Q|f(x)|^{\delta }\,dx\right) ^{1/\delta }, \end{aligned}$$
(22)

for any \(\delta >0\).

Recall that, for a fixed cube \(Q_0\), \(\mathcal D (Q_0)\) denotes all the dyadic subcubes with respect to the cube \(Q_0\). As before, if \(Q\in \mathcal D (Q_0)\) and \(Q\ne Q_0\), \(\widehat{Q}\) will be the ancestor dyadic cube of \(Q\), i.e., the only cube in \(\mathcal D (Q_0)\) that contains \(Q\) and such that \(|\widehat{Q}|=2^n |Q|\). We state now Lerner’s formula.

Theorem 10

[24] Let f be a measurable function on \(\mathbb{R }^n\) and let \(Q_0\) be a cube. Then there exists a (possibly empty) collection of cubes \(\{Q_j^k\}_{j,k}\in \mathcal D (Q_0)\) such that:

  1. (i)

    For a.e. \(x\in Q_0\),

    $$\begin{aligned} |f(x)-m_f(Q_0)|\le 4\,M_{1/4; Q_0}^{\#}f(x)+4\,\sum _{k=1}^{\infty }\sum _{j}\omega _{1/2^{n+2}}(f;\hat{Q}_j^k)\chi _{Q_j^k}(x); \end{aligned}$$
    (23)
  2. (ii)

    For each fixed \(k\) the cubes \(Q_j^k\) are pairwise disjoint;

  3. (iii)

    If \(\Omega _k={\bigcup _{j}}Q_j^k\), then \(\Omega _{k+1}\subset \Omega _k\);

  4. (iv)

    \(|\Omega _{k+1}\cap Q_j^k|\le {\frac{1}{2}}|Q_j^k|\).

Let us remark that in any decomposition as in the previous theorem, if we define \(E_j^k:=Q_j^k\backslash \Omega _{k+1}\), then we have that \(\{E_j^k\}\) is a pairwise disjoint subsets family. Moreover,

$$\begin{aligned} |Q_j^k|\le 2|E_j^k|. \end{aligned}$$
(24)

3.6 Pointwise inequalities

In this section we will summarize some important pointwise inequalities involving sharp maximal functions. We start with the following, which is an immediate consequence of the definitions. Given a cube \(Q\), \(\delta >0\) and \(0<\lambda \le 1\), there exists a constant \(c=c_\lambda \) such that

$$\begin{aligned} M_{\lambda ;Q}^{\#}(f\chi _Q)(x)\le c\, M_\delta ^{\#} (f\chi _Q)(x), \end{aligned}$$
(25)

for all \(x\in Q\). We will also use the following result from [30]. If \(0<\delta <\varepsilon <1\), there is a constant \(c=c_{\varepsilon ,\delta }\) such that

$$\begin{aligned} M^{\#,d}_{\delta }(M^d_{\varepsilon }(f))(x)\le c \,M^{\#,d}_{\varepsilon }f(x). \end{aligned}$$
(26)

The idea behind the following list of inequalities is that a sharp maximal type operator acting on several singular operators can be controlled by suitable maximal operators.

Calderón–Zygmund operators and vector valued extensions: Let \(T\) be a Calderón–Zygmund operator with maximal singular operator \(T^*\), and \(0<\varepsilon <1\). Then there exists a constant \(c=c_{\varepsilon }\) such that

$$\begin{aligned} M_{\varepsilon }^{\#}(T^*f)(x)\le c\,Mf(x). \end{aligned}$$
(27)

This follows essentially from [1] where \(T\) is used instead of \(T^*\). Moreover, we know from [33] that if \(1<q<\infty \) and \(0<\varepsilon <1\), then there exists a constant \(c=c_{\varepsilon }>0\) such that

$$\begin{aligned} M_{\varepsilon }^{\#}(\overline{T}_qf)(x)\le c\,M\left( |f|_q\right) (x) \quad x\in \mathbb{R }^n \end{aligned}$$
(28)

for any smooth vector function \(f=\{f_j\}_{j=1}^{\infty }\).

Multilinear C–Z operators: [28] Let \(T\) be a Calderón–Zygmund \(m\)-linear operator and let \(0<\varepsilon <1/m\). Then there exists a constant \(c=c_{\varepsilon }>0\) such that

$$\begin{aligned} M^\#_{\varepsilon }(T(\mathbf{{f}}\,))(x)\le c\,\mathcal{M }(\mathbf{{f}}\,)(x)\quad x\in \mathbb{R }^n \end{aligned}$$
(29)

for any smooth vector function \(\mathbf{{f}}\).

Commutators: [31] Let \(b\in BMO\) and let \(0 < \delta < \varepsilon \). Then there exists a positive constant \(c= c_{\delta ,\varepsilon }\) such that,

$$\begin{aligned} M_{\delta }^{\#,d}(T^k_bf)(x)\le c\, \left\| b \right\| _{BMO}\sum _{j=0}^{k-1}M^d_{\varepsilon }(T^j_b f)(x)+\left\| b \right\| _{BMO}^k M^{k+1}f(x), \end{aligned}$$
(30)

for any \(k\in \mathbb N \) and for all smooth functions \(f\).

Dyadic and continuous square functions: [9, 23] Let \(S_d\) be the dyadic square function operator and let \(0<\lambda <1\). Then for any function \(f\), every dyadic cube \(Q\), and every \(x\in Q\),

$$\begin{aligned} \omega _\lambda ((S_df)^2,Q) \le \frac{c_n}{\lambda ^2}\left( \frac{1}{|Q|}\int _Q |f(x)|\ dx\right) ^2, \end{aligned}$$
(31)

and hence

$$\begin{aligned} M^{\#,d}_{\lambda }(S_{d}(f)^2) (x) \le c_{\lambda }\,Mf(x)^2. \end{aligned}$$
(32)

For the continuous square function \(g_{\mu }^{*}\) we use the following from [23]. For \(\mu >3\) and \(0<\lambda <1\), we have that

$$\begin{aligned} M^{\#}_{\lambda }(g_{\mu }^{*}(f)^2) (x) \le c_{\lambda }\,Mf(x)^2. \end{aligned}$$
(33)

The analogue for the vector-valued extension of the maximal function, from [9] is the following. Fix \(\lambda \), \(0<\lambda <1\) and \(1<q<\infty \). Then for any function \(f=\{f_j\}_{j=1}^{\infty }\), every dyadic cube \(Q\), and every \(x\in Q\),

$$\begin{aligned} \omega _\lambda \left( \left( \overline{M}^d_qf\right) ^q,Q\right) \le \frac{c_{n,q}}{\lambda ^q}\left( \frac{1}{|Q|}\int _{Q} \left\| f(x) \right\| _{l^q}\ dx\right) ^q \end{aligned}$$
(34)

Finally, we include here the well known Kolmogorov’s inequality in the following form. Let \(0<q<p<\infty \). Then there is a constant \(c=c_{p,q}\) such that for any nonnegative measurable function \(f\),

$$\begin{aligned} \left( \frac{1}{|Q|}\int _Q f(x)^q\ dx\right) ^{\frac{1}{q}}\le c \Vert f\Vert _{L^{p,\infty }(Q,\frac{dx}{|Q|})}. \end{aligned}$$
(35)

(See for instance [13, p. 91, ex. 2.1.5]).

4 First approach, proof of linear and superlinear estimates

We prove in this section Theorem 2 and the consequences. The proof is based on Lerner’s formula (23) combined with a new way of handling the sparse cubes \(\{Q_j^k\}\) by means of an exponential vector valued endpoint estimate due to Fefferman–Stein.

4.1 Proof of the key estimate

As already mentioned the proof is based on Lerner’s formula from Theorem 10. The drawback of the method is that it is not clear if this approach allows us to derive such sharp exponential decays for the case of commutators. We remark that a slightly weaker result, involving \(M^\#_\delta \) instead of the local sharp maximal function was proved by the second author in [6, Chapter 3].

Proof of Theorem 2

We consider the distribution set

$$\begin{aligned} E_Q:=\{x\in Q: |f(x)-m_f(Q)|>tM_{2^{-n-2};Q}^\#(f)(x) \}. \end{aligned}$$

Then, by (23) and for appropriate \(c\) we have that

$$\begin{aligned} \left| E_Q\right|&\le |\{x\in Q: \sum _{k,j}\chi _{Q_j^k}(x)\, \inf _{Q_j^k}M_{2^{-n-2};Q}^\#f > ctM_{2^{-n-2};Q}^\#f(x)\}|\\&\le |\{x\in Q: \sum _{k,j}\chi _{Q_j^k}(x)> ct\}|. \end{aligned}$$

Let \(\{E_j^k\}\) be the family of sets from the remark after Lerner’s formula satisfying (24). We have then

$$\begin{aligned} \sum _{j,k}\chi _{Q_j^k}(x)&= \sum _{j,k}\left( {\frac{1}{|Q_j^k|}}\,|Q_j^k|\right) ^q\chi _{Q_j^k}(x)\\&\le c_n^q\,\sum _{j,k}\left( {\frac{1}{|Q_j^k|}}\,|E_j^k|\right) ^q\chi _{Q_j^k}(x)\\&\le c_n^q\,\sum _{j,k}\left( {\frac{1}{|Q_j^k|}}\,\int _{Q_j^k}\chi _{E_j^k}(x)\,dx\right) ^q\chi _{Q_j^k}(x)\\&\le c_n^q\, \left( \overline{M}_q\left( \left\{ \chi _{E_j^k}\right\} _{j,k}\right) (x)\right) ^q\\&\le c_n^q\, \left( \overline{M}_qg(x)\right) ^q, \end{aligned}$$

where \(g=\big \{\chi _{E_j^k}\big \}_{j,k}\). Now, since \(\{E_j^k\}\) is a pairwise disjoint family of subsets, we have that

$$\begin{aligned} \Vert g(x)\Vert _{\ell ^{q}}=\left( \sum _{j,k}\left( \chi _{E_j^k}(x)\right) ^q\right) ^{1/q}\le 1, \end{aligned}$$
(36)

We finish our proof recalling that if \(|g|_{\ell ^{q}}\in L^{\infty }\), then \(\left( \overline{M}_qg(x)\right) ^q\in Exp L\) (see [10]). Therefore, we obtain the desired inequality (7):

$$\begin{aligned} |\{x\in Q: |f(x)-m_f(Q)|>tM_{2^{-n-2};Q}^\#(f)(x) \}|\le c e^{-\alpha t}|Q|,\quad t>0. \end{aligned}$$

\(\square \)

4.2 Proofs for Calderón–Zygmund operators, vector valued extensions and multilinear C–Z operators: first approach

We will combine Theorem 2, replacing \(f\) by the operator, with an appropriate pointwise inequality. We start by proving Theorem 3.

Proof of Theorem 3

We first note the following estimate for the median value of \(T^*f\) over a cube \(Q\). We have that

$$\begin{aligned} m_{T^*f}(Q)&\le \left( \frac{2}{|Q|}\int _{Q}(T^*f)^{\delta }\right) ^{1/\delta }\\&\le c_{\delta }\Vert T^*f\Vert _{L^{1,\infty }(Q,\frac{dx}{|Q|})}\le {\frac{c}{|Q|}}\int _{Q}|f(x)|\,dx\\ \end{aligned}$$

by Kolmogorov’s inequality (35). It follows that

$$\begin{aligned} m_{T^*f}(Q) \le cMf(x), \quad x\in Q. \end{aligned}$$
(37)

This, together with inequality (25) and (27), yields

$$\begin{aligned} \left| \left\{ x\in Q: \frac{|T^*f(x)|}{Mf(x)}>t \right\} \right| \le |\{x\in Q: |T^*f(x)|>ctM_{\lambda _n,Q}^\#(T^*(f))(x) \}| \end{aligned}$$

for \(\lambda _n=2^{-n-2}\) for some constant \(c>0\). We can apply now our general result from Theorem 2 to conclude the proof. \(\square \)

For the proof of Theorem 4 and Theorem 5, we have all the ingredients: we use (respectively) inequalities (29) and (28) instead of (27) and we control the median value by using Kolmogorov’s inequality and the weak type of both vector-valued extensions and multilinear C–Z operators.

4.3 Proof for the square functions and for the vector-valued maximal function: first approach

For the proof of Theorem 7 we start with:

$$\begin{aligned} |\{x\in Q: Sf(x)>tMf(x) \}|=|\{x\in Q: (Sf(x))^2>t^2(Mf(x))^2 \}|. \end{aligned}$$

and we use this time estimates (32) and (33) for the pointwise control. The median value of the square function is also bounded by \(M\) as in the previous cases using, again, Kolmogorov’s inequality and the weak \((1,1)\) type of the operator. From Theorem 2 we will obtain, in this case, a Gaussian decay rate for the level set.

Finally, for the vector-valued extension of the Maximal function, we proceed as in the case of the square function but replacing the “2” by “\(q\)”. The key estimate for the oscillation is in inequality (34).

5 Second approach: weighted estimates and the proof for commutators

As already mentioned the approach considered in the previous section cannot be used in the case of commutators. We introduce here a new approach, combining Lerner’s formula with a variant of Rubio de Francia’s algorithm. In this case, Lerner’s formula is used to derive a certain sharp local weighted estimate (see Theorem 11). This is the first key ingredient. The second key ingredient is to apply Rubio de Francia’s algorithm with a factorization argument for \(A_q\) weights and the use of Coifman–Rochberg theorem (see Lemma 1).

This approach will allow us to derive all the results of this paper, including those proved in the previous section, and also the results for commutators. We will present the general scheme in terms of a pair of generic operators \(T_1\) and \(T_2\) and then emphasize the different kind of hypothesis needed and the estimates obtained on each case.

We start with some preliminaries about weights. We include some classical well known results and some new ones.

5.1 Some extra preliminary on weights

We recall that a weight \(w\) (any non negative measurable function) satisfies the \(A_p\) condition for \(1<p<\infty \) if

$$\begin{aligned}{}[w]_{A_p}= \left( {\displaystyle {\frac{1}{|Q|}}}\int _{Q}w \right) \left( {\displaystyle {\frac{1}{|Q|}}}\int _{Q}w ^{1-p^{\prime }}\right) ^{p-1}<\infty . \end{aligned}$$

Also we recall that \(w\) is an \(A_1\) weight if there is a finite constant \(c\) such that \(Mw\le c\,w\) a.e., and where \([w]_{A_1}\) denotes the smallest of these \(c\). Also, we recall that the \(A_{\infty }\) class of weights is defined by \(A_{\infty }=\bigcup _{{p\ge 1}}A_p\).

We will use that if \(w_1\)  and  \(w_2\)  are \(A_1\) weights then \(w=w_1w_2^{1-p} \in A_p\) and

$$\begin{aligned}{}[w]_{A_p}\le [w_1]_{A_1}[w_2]_{A_1}^{p-1}. \end{aligned}$$
(38)

Another key feature of the \(A_1\) weights that we will use repeatedly is that \((M\mu )^\delta \) is an \(A_1\) weight whenever \(0<\delta <1\) and \(\mu \) is positive Borel measure (this is due to Coifmann and Rochberg, [11, Theorem 3.4]). Furthermore we have

$$\begin{aligned}{}[(Mf)^{\delta }]_{A_1}\le \frac{c}{1-\delta }, \end{aligned}$$

where \(c=c_n\). We will need the following extension of this result for the multilinear maximal operator \(\mathcal{M }\) defined in (16) which may have its own interest.

Lemma 1

Let \(\varvec{\mu }\) be a vector of \(m\) positive Borel measures on \(\mathbb{R }^n\) such that \(\mathcal{M }\varvec{\mu }(x)<\infty \) for a.e. \(x\in \mathbb{R }^n\). Then

$$\begin{aligned} \left( \mathcal{M }(\varvec{\mu })\right) ^\delta \in A_1 \quad \text{ for } \text{ any } \, \,\, 0<\delta <\frac{1}{m} \end{aligned}$$
(39)

Moreover,

$$\begin{aligned} \left[ \left( \mathcal{M }(\varvec{\mu })\right) ^\delta \right] _{A_1}\le \frac{c}{1-m\delta }, \end{aligned}$$
(40)

where \(c=c_n\) is some dimensional constant.

Proof

The idea is the same as in the classical Coifman–Rochberg theorem, but using this time the appropriate the weak type boundedness of \(\mathcal{M }\):

$$\begin{aligned} \mathcal{M }: L^{1}(\mathbb{R }^n) \times \dots \times L^{1}(\mathbb{R }^n) \rightarrow L^{\frac{1}{m},\infty }(\mathbb{R }^n) . \end{aligned}$$

If  \(w=\left( \mathcal{M }(\varvec{\mu })\right) ^\delta \), the aim is to prove that, for a given cube \(Q\),

$$\begin{aligned} \frac{1}{|Q|}\int _Q w(x)\ dx\le \frac{c}{1-m\delta }w(y)\quad \text{ for } \text{ all } \ y\in Q. \end{aligned}$$

Consider \(\tilde{Q}:=3Q\), the dilation of \(Q\) and split the vector \(\varvec{\mu }=\varvec{\mu }^0+\varvec{\mu }^\infty \) as usual with \(\varvec{\mu }^0=(\mu ^0_1,\dots , \mu ^0_m)\) and where \(\mu ^0_j:=\mu _j\chi _{\tilde{Q}}\) for all \(1\le j \le m\). We can handle \(\mathcal{M }(\varvec{\mu }^\infty )\) as in the \(m=1\) case, since in this case the maximal function is essentially constant. For the other part, we have that

$$\begin{aligned} \frac{1}{|Q|}\int _Q \left( \mathcal{M }(\varvec{\mu }^0)(x)\right) ^\delta \ dx&= \frac{\delta }{|Q|}\int _0^\infty t^{\delta }\left| \left\{ x\in Q: \mathcal{M }(\varvec{\mu }^0)(x)^\delta >t \right\} \right| \ \frac{dt}{t}\\&\le R^\delta + \frac{\delta }{|Q|}\int _R^\infty t^{\delta }\left| \left\{ \mathcal{M }(\varvec{\mu }^0)(x)^\delta >t \right\} \right| \ \frac{dt}{t} \end{aligned}$$

for any \(R>0\) since we trivially have that \(\left| \left\{ x\in Q: \mathcal{M }(\varvec{\mu }^0)(x)^\delta >t \right\} \right| \le |Q|\). Now, we recall that \(\mathcal{M }\) is a bounded operator from \(L^1\times \dots \times L^1 \rightarrow L^{1/m,\infty }\). Therefore we can estimate the last integral as

$$\begin{aligned} \frac{\delta }{|Q|}\int _R^\infty t^{\delta }\left| \left\{ \mathcal{M }(\varvec{\mu }^0)(x)^\delta >t \right\} \right| \ \frac{dt}{t}&\le c \frac{\delta }{|Q|}\int _R^\infty t^{\delta -1-\frac{1}{m}}\ dt \prod _{j=1}^m \Vert \mu ^0_j\Vert ^{1/m}_{L^1}\\&\le \frac{c }{1-m\delta }\frac{R^{\delta -\frac{1}{m}}}{|Q|}\prod _{j=1}^m \Vert \mu ^0_j\Vert ^{1/m}_{L^1} \end{aligned}$$

for any \(\delta <\frac{1}{m}\). We obtain that

$$\begin{aligned} \frac{1}{|Q|}\int _Q \left( \mathcal{M }(\varvec{\mu }^0)(x)\right) ^\delta \ dx\le R^\delta \left( 1 + \frac{c }{1-m\delta }\frac{\prod _{j=1}^m \Vert \mu ^0_j\Vert ^{1/m}_{L^1}}{R^{\frac{1}{m}}|Q|}\right) \end{aligned}$$

Now we choose \(R=\displaystyle {\frac{\prod _{j=1}^m \Vert \mu ^0_j\Vert _{L^1}}{|Q|^m}}\) and we get

$$\begin{aligned} \frac{1}{|Q|}\int _Q \left( \mathcal{M }(\varvec{\mu })(x)\right) ^\delta \ dx&\le \left( \frac{c}{1-m\delta }\frac{\prod _{j=1}^m \Vert \mu ^0_j\Vert _{L^1}}{|Q|^m}\right) ^\delta \\&\le \frac{c3^{n}}{1-m\delta }\left( \frac{\prod _{j=1}^m \mu _j(\tilde{Q})_j}{|\tilde{Q}|^m}\right) ^\delta \\&\le \frac{c_{n}}{1-m\delta }\left( \mathcal{M }(\varvec{\mu })(x)\right) ^\delta \end{aligned}$$

\(\square \)

The following Proposition can be viewed as an integral version of the main result of the previous section, namely Theorem 2. It follows from Lerner’s formula as well, but this integral version is the key to obtain the result for commutators, which cannot be obtained by means of the first approach.

Proposition 1

Let \(f\) be a measure function such that \(supp \,f\subset Q\), being \(Q\) a fixed cube. Let \(0<\delta <1\) and let \(w\in A_q\). Then we have that

$$\begin{aligned} \Vert f-m_f(Q)\Vert _{L^1(w,Q)}\le c\, 2^q\, [w]_{A_q}\Vert M^{\#,d}_{\delta }(f)\Vert _{L^1(w,Q)} \end{aligned}$$
(41)

Proof

We start with a pointwise estimate, which follows from Lerner’s formula, taking into account the definition of the oscillation and (21).

$$\begin{aligned} |f(x)-m_f(Q)|\le c\, M_\delta ^\#f(x)+c\,\sum _{k,j}\inf _{Q_j^k}M_\delta ^\#(f)\, \chi _{Q_j^k}(x). \end{aligned}$$

Then, taking norms,

$$\begin{aligned} \left\| f-m_f(Q) \right\| _{L^1(w,Q)}\le c\Vert M^{\#,d}_{\delta }(f)\Vert _{L^1(w,Q)}+c\sum _{k,j} \int _{Q_j^k}\inf _{Q_j^k} M_\delta ^\#(f)w(x)dx. \end{aligned}$$

Now we recall that the family \(\{E_j^k\}\) satisfies (24) and use the following property of the \(A_q\) class of weights: let \(w\in A_q\) and let \(Q\) be a cube, then for each measurable sets such that \(E\subset Q\),

$$\begin{aligned} w(Q)\le \left( \frac{|Q|}{|E|}\right) ^q[w]_{A_q}w(E). \end{aligned}$$

Since for any index \((j,k)\) we have the property \(|Q_j^k|\le 2|E_j^k|\), it follows that

$$\begin{aligned} w(Q_j^k)\le 2^q[w]_{A_q}w(E_j^k). \end{aligned}$$

If we apply this on each term of the sum, we obtain

$$\begin{aligned} \int _{Q_j^k}\inf _{Q_j^k} M_\delta ^\#(f)w(x)dx \le c 2^q\,[w]_{A_q}\inf _{Q_j^k}M_\delta ^\#(f)\, w(E_j^k) \end{aligned}$$

Finally, we obtain that

$$\begin{aligned} \left\| f-m_f(Q) \right\| _{L^1(w,Q)} \le c \,2^q\,[w]_{A_q}\left\| M_\delta ^\#(f) \right\| _{L^1(w,Q)}, \end{aligned}$$

since \(\{E_j^k\}\) is a pairwise disjoint subsets family. \(\square \)

The following lemma gives a way to produce \(A_{1}\) weights with special control on the constant. It is based on the so called Rubio de Francia iteration scheme or algorithm.

Lemma 2

[26] Let \(M\) be the usual Hardy–Littlewood maximal operator and let \(0<r<\infty \). Define the operator \(R:L^r(\mathbb{R }^n)\rightarrow L^r(\mathbb{R }^n)\) as follows. For a given \(h\in L^{r}(\mathbb{R }^n)\), consider the sum:

$$\begin{aligned} R(h)=\sum _{k=0}^{\infty }{\frac{1}{2^k}}{\frac{M^{k}h}{\left\| M \right\| _{L^r(\mathbb{R }^n)}^{k}}}, \end{aligned}$$

Then \(R\) satisfies the following properties:

  1. (i)

    \(h\le R(h)\);

  2. (ii)

    \(\left\| Rh \right\| _{L^r(\mathbb{R }^n)}\le 2\left\| h \right\| _{L^r(\mathbb{R }^n)}\);

  3. (iii)

    For any nonnegative \(h\in L^r(\mathbb{R }^n)\), we have that \(Rh \in A_1\)with

    $$\begin{aligned}{}[Rh]_{A_1} \le 2\left\| M \right\| _{L^r(\mathbb{R }^n)} \le c_n\,r^{\prime }. \end{aligned}$$

5.2 The model case

Consider two nonnegative operators \(T_1\) and \(T_2\), where typically \(T_1\) is the absolute value of a singular operator and \(T_2\) is an appropriate maximal operator that will act as a control operator. As in the introduction we will be slightly vague on the use of the notation, since here \(f\) will stand for a single function, a vector or an infinite sequence of functions, depending on the operators. Assume that, for any cube \(Q\), we have a weighted \(L^1\) local Coifman–Fefferman type inequality. To be more precise we will assume the following:

  1. 1.

    There is an special positive parameter \(\beta \) and an index \(1\le q\le \infty \) for which we can find a constant \(c\) such that for any \(w\in A_q\) and any cube \(Q\),

    $$\begin{aligned} \left\| T_1 f \right\| _{L^1(w,Q)}\le c [w]^\beta _{A_q}\left\| T_2 f \right\| _{L^1(w,Q)}, \end{aligned}$$
    (42)

    for appropriate functions \(f\). The parameter \(\beta \) is key in the sequel.

  2. 2.

    Suppose that the (maximal type) operator \(T_2\) is so that \((T_2f)^\frac{1}{q-1}\in A_1\) with

    $$\begin{aligned}{}[(T_2f)^\frac{1}{q-1}]^{q-1}_{A_1}\le a \end{aligned}$$
    (43)

    where \(a\) is a constant independent of \(f\).

The general purpose is to estimate the level set function \(\varphi \) as in (6)

$$\begin{aligned} \varphi (t):=\frac{1}{|Q|} |\{x\in Q: |T_1f(x)|> t|T_2f(x)|\}|. \end{aligned}$$

We start by applying Chebychev’s inequlity for some \(p>1\) that will be chosen later:

$$\begin{aligned} |\{x\in Q: |T_1(f)(x)|> t\,|T_2(f)(x)|\}|&\le {\frac{1}{t^p}}\int _{Q} \left| {\frac{T_1f(x)}{T_2f(x)}}\right| ^p\, dx\\&= {\frac{1}{t^p}}\left\| \displaystyle {\frac{T_1f(x)}{T_2f(x)}} \right\| _{L^p(Q)}^p\\&\le {\frac{1}{t^p}}\left( \int _{Q} {\frac{T_1f(x)}{T_2f(x)}} \,h(x)\,dx\right) ^p \end{aligned}$$

for some nonnegative \(h\) such that \(\left\| h \right\| _{L^{p^{\prime }}(Q)}=1\). Now we apply Rubio de Francia’s algorithm (Lemma 2) and the key hypothesis (42) to obtain that

$$\begin{aligned} \int _{Q} {\frac{T_1(f)(x)}{T_2(f)(x)}} \,h(x)\,\,dx&\le \int _{Q} T_1(f)(x)\,T_2(f)(x)^{-1}R(h)(x)dx\\&\le c\,[R(h)(T_2f)^{-1}]^\beta _{A_q}\int _{Q}T_2(f)(x)\frac{R(h)(x)}{T_2(f)(x)}\, dx\\&= c\, [R(h)(T_2f)^{-1}]^\beta _{A_q}\int _{Q} R(h)(x)\, dx\\&\le c\, [R(h)(T_2f)^{-1}]^\beta _{A_q}2\,\left\| h \right\| _{L^{p^{\prime }}(Q)}|Q|^{1/p}\\&= 2c\, [R(h)(T_2f)^{-1}]^\beta _{A_q}\,|Q|^{1/p}. \end{aligned}$$

Since \(R(h)\in A_1\) we can use formula (38)

$$\begin{aligned}{}[R(h)(T_2f)^{-1}]_{A_q} \le [R(h)]_{A_1}[(T_2f)^\frac{1}{q-1}]^{q-1}_{A_1} \le p\,a. \end{aligned}$$

since by Lemma 2, (iii)  \([R(h)]_{A_1}\le p\)  and the constant in (43) is uniform on \(f\).

Then, if we choose \(p\) such that \(\displaystyle {e^{-1}=\frac{ (ap)^\beta }{t}}\), we get

$$\begin{aligned} |\{x\in Q: T_1f(x)> t\, T_2f(x)\}|&\le 2\,c\, \left( {\frac{(a\,p)^\beta }{t}}\right) ^p |Q|\\&\le 2c\, e^{-\alpha t^\frac{1}{\beta }}|Q| \end{aligned}$$

where \(\alpha \) depends on \(\beta \) and \(q\) and hence \(\varphi (t)\le 2 e^{-\alpha t^\frac{1}{\beta }}\).

Note that this model example reveals that the two hypothesis that we need to fulfill:

  1. (H1)

    A local Coifman–Fefferman inequality like (42) with the sharpest exponent \(\beta \) on the constant of the weight which controls the decay rate of the level set function \(\varphi (t)\).

  2. (H2)

    An appropriate power of the maximal operator \(T_2\) should be a \(A_1\) weight. This is the case in all the operators we consider in this paper and follows essentially from a suitable variations of Coifman–Rochberg’s theorem (Lemma 1).

This scheme will be followed in the proof of the main results. In each case, we will show how to derive the appropriate local Coifman–Fefferman inequality with the correct exponent and will check that the \(A_1\) constant of the control operator is uniformly bounded.

5.3 Proofs for C–Z operators, vector valued extensions and multilinear C–Z operators: second approach

We start by proving local C–F inequalities. The central tool is Proposition 1.

Theorem 11

Let \(w\in A_{q}\), with \(1\le q<\infty \).

  1. 1.

    Let \(T\) be a Calderón–Zygmund integral operator. Let \(f\) be a function such that \(supp \,{f}\subseteq Q\). Then, there exists a constant \(c=c_{n}\) such that

    $$\begin{aligned} \left\| T^*f \right\| _{L^1(w,Q)}\le c\,2^q\,[w]_{A_q}\left\| Mf \right\| _{L^1(w,Q)}. \end{aligned}$$
  2. 2.

    Let \(\overline{T}_{q}\) the vector–valued extension of Calderón–Zygmund integral operators. Let \(f=\{f_j\}_{j=1}^{\infty }\) be a vector-valued function such that \(supp \,{f}\subseteq Q\). Then, there exists a constant \(c=c_{n}\) such that

    $$\begin{aligned} \left\| \overline{T}_{q}f \right\| _{L^1(w,Q)}\le c\,2^q\,[w]_{A_q}\left\| M(|f|_q) \right\| _{L^1(w,Q)}. \end{aligned}$$
  3. 3.

    Let \(T\) be a \(m\)-linear C–Z operator. Let \(\mathbf{{f}}\,\) is a vector of \(m\) functions such that \(supp \,{f_j}\subseteq Q\). Then, there exists a constant \(c=c_{n}\) such that

    $$\begin{aligned} \left\| T(\mathbf{{f}}\,) \right\| _{L^1(w,Q)}\le c\,2^q\,[w]_{A_q}\left\| \mathcal{M }(\mathbf{{f}}\,) \right\| _{L^1(w,Q)}. \end{aligned}$$

Proof

In all three cases we start with Proposition 1 and use (27), (28) and (29) to control the sharp maximal function. It remains to prove that we can control the median value in each case. For \(T^*\), we already have done it in (37). There we proved that

$$\begin{aligned} m_{T^*f}(Q) \le {\frac{c}{|Q|}}\int _{Q}|f(x)|\,dx\le cMf(y),\quad \text{ for } \text{ all }\ y\in Q, \end{aligned}$$

and hence

$$\begin{aligned} w(Q)\,m_{T^*f}(Q)\le c\,\left\| Mf \right\| _{L^1(w,Q)}. \end{aligned}$$

The case of the vector-valued operators follows the same steps. The multilinear case also follows from Kolmogorov’s inequality and the weak type boundedness of multilinear C–Z operators. \(\square \)

At this point, we have proved Coifman–Fefferman inequalities like (42) with \(\beta =1\) in all the cases. It remains to check that the factorizazion argument can be performed. For C–Z operators and its vector-valued extension, we can use (42) with \(q=3\) (or any larger \(q\)). Therefore, (43) holds in both cases by Lemma 1 for \(m=1\). Note that what we have to control in both cases is \( [M(\mu )^\frac{1}{2}]^2_{A_1}\). For the multilinear case, we have that, for \(q=m+2\),

$$\begin{aligned}{}[\mathcal{M }(\mathbf{{f}}\,)^{\frac{1}{q-1}}]^{q-1}_{A_1}\le \left( \frac{C_n}{1-\frac{m}{m+1}}\right) ^{m+1}. \end{aligned}$$

So we finish the proof of Theorem 3 and Theorem 4.

5.4 Results for commutators

As before, we need to prove an appropriate Coifman–Fefferman inequality.

Theorem 12

Let \(w\in A_{q}\) be, with \(1\le q<\infty \). Let \(T\) be a C–Z operator and \(b\in BMO\). Let \(f\) be a function such that \(supp \,{f}\subseteq Q\). Then, there exists a dimensional constant \(c=c_{n}\) such that

$$\begin{aligned} \left\| [b,T]f \right\| _{L^1(w,Q)}\le c\,\left\| b \right\| _{BMO}2^{2q}[w]_{A_q}^2\left\| M^2f \right\| _{L^1(w,Q)}. \end{aligned}$$
(44)

In the higher order commutator case we get that

$$\begin{aligned} \left\| T_{b}^kf \right\| _{L^1(w,Q)}\le c\,\left\| b \right\| _{BMO}^k2^{(k+1)q}[w]_{A_q}^{k+1}\left\| M^{k+1}f \right\| _{L^1(w,Q)}. \end{aligned}$$
(45)

Proof

Using Proposition 1 we get that

$$\begin{aligned} \left\| T_bf \right\| _{L^1(w,Q)}&\le c2^q [w]_{A_q}\int _{Q}M^{\#,Q}_{\delta }(T_bf)\, w(x)dx+w(Q)m_{[b,T]f}(Q)\\&= I + II \end{aligned}$$

For the first term, by (30) we have that

$$\begin{aligned} I \le c 2^q[w]_{A_q}\Vert b\Vert _{BMO}\left( \left\| M^Q_{\varepsilon }(Tf) \right\| _{L^1(w,Q)} +\left\| M^2f \right\| _{L^1(w,Q)}\right) \end{aligned}$$

Now we write \(L(Q):=w(Q)m_{M^Q_{\varepsilon }(Tf)}(Q) \) and we apply Proposition 1 with some \(0<\delta <\varepsilon \) to the first norm to obtain that

$$\begin{aligned} \left\| M^Q_{\varepsilon }(Tf) \right\| _{L^1(w,Q)}&\le c2^q[w]_{A_q}\left\| M^{\#,Q}_\delta \left( M^Q_{\varepsilon }(Tf)\right) \right\| _{L^1(w,Q)}+ L(Q) \\&\le c2^q[w]_{A_q}\left\| M^{\#,Q}_\varepsilon (Tf) \right\| _{L^1(w,Q)}+ L(Q)\\&\le c2^q[w]_{A_q}\left\| Mf \right\| _{L^1(w,Q)}+ L(Q) \end{aligned}$$

by (26) and (27). Now we have to bound \(L(Q)\). We apply property (22) for the median value with some \(0<\delta <\varepsilon \), Kolmogorov’s inequality twice and the weak type of both \(M\) and \(T\):

$$\begin{aligned} m_{M_{\varepsilon }(\chi _QTf)}(Q)&\le c\left( \frac{1}{|Q|}\int _Q M^Q_{\varepsilon }(Tf)^\delta \ dx\right) ^{\frac{1}{\delta }} \\&\le \left( \frac{1}{|Q|}\int _Q M^Q(|Tf|^\varepsilon )^\frac{\delta }{\varepsilon }\ dx\right) ^{\frac{\varepsilon }{\delta }\frac{1}{\varepsilon }} \\&\le \left\| M^Q(|Tf|^\varepsilon ) \right\| ^\frac{1}{\varepsilon }_{L^{1,\infty }(\frac{dx}{|Q|},Q)}\\&\le \left( \frac{1}{|Q|}\int _Q |Tf|^\varepsilon \ dx\right) ^\frac{1}{\varepsilon }\le Mf(x) \quad \text{ for } \text{ all } x\in Q.\\ \end{aligned}$$

Therefore, we have that \(w(Q)m_{M_{\varepsilon }(Tf)}(Q) \le \left\| Mf \right\| _{L^1(w,Q)}\). Combining all previous estimates, we get

$$\begin{aligned} I \le c 2^{2q}[w]^2_{A_q}\Vert b\Vert _{BMO}\left\| Mf \right\| _{L^1(w,Q)} +c [w]_{A_q}\Vert b\Vert _{BMO}\left\| M^2f \right\| _{L^1(w,Q)} \end{aligned}$$

From this estimate, using that \([w]_{A_{q}}\ge 1\) and by dominating \(M\) by \(M^2\), we obtain the desired result if we can control \(II\), involving the median value of the commutator. To that end, we will use the weak estimate from Theorem 9. For \(\phi (t)=t(1+\log ^+(t))\),

$$\begin{aligned} m_{[b,T]f}(Q)&\le \left( \frac{1}{|Q|}\int _Q |T_bf|^\delta \ dx\right) ^\frac{1}{\delta }\\&\le \left( \frac{1}{|Q|}\int _0^\infty \delta t^{\delta -1} |\{x\in Q: |T_bf|>t\}|\ dt\right) ^\frac{1}{\delta }\\&\le \left( R^\delta + \frac{1}{|Q|}\int _R^\infty \delta t^{\delta -1} \int _Q \phi \left( \frac{|f(x)|}{t}\right) \ dx dt\right) ^\frac{1}{\delta }\\ \end{aligned}$$

for any \(R>0\) (to be chosen). By the submultiplicativity of \(\phi \), we have that

$$\begin{aligned} m_{[b,T]f}(Q)&\le \left( R^\delta + \frac{1}{|Q|}\int _Q \phi \left( |f(x)|\right) \ dx \int _R^\infty \delta t^{\delta -1}\phi (1/t)\ dt \right) ^\frac{1}{\delta }\\&\le \left( R^\delta + \frac{R^{\delta -1} }{|Q|}\int _Q |f(x)|(1+\log ^+ \frac{|f(x)|}{|f|_Q} )\ dx \right) ^\frac{1}{\delta }\\&\le R \left( 1+ \frac{1}{R|Q|}\int _Q Mf(x)\ dx \right) ^\frac{1}{\delta }. \end{aligned}$$

where we have used the following well known estimate essentially due to Stein [34],

$$\begin{aligned} \int _Q w\, \log \left( e + \frac{w}{w_Q}\right) \, dx \le c_n\, \int _{Q} M(w\chi _Q) \, dx \quad w\ge 0. \end{aligned}$$

If we now choose \(R=\frac{1}{|Q|}\int _Q Mf(x)\ dx\), then we obtain that

$$\begin{aligned} m_{[b,T]f}(Q)\le M^2f(x) \quad \text{ for } \text{ all } x\in Q. \end{aligned}$$

This clearly implies that \(w(Q)m_{[b,T]f}(Q)\le \left\| M^2f \right\| _{L^1(w,Q)}\) and the proof of inequality (44) is complete. The higher order commutator bound (45) is technically more complicated, but it follows from the same ideas by using an induction argument. The details can be found in [29]. \(\square \)

Hence, we have the hypothesis (H1) of our model. We finish the proof of Theorem 8 by proving that we have also the second hypothesis (H2). Namely, we have \(M^2\) acting as a control operator, so we have to prove that, for some \(q>1\),

$$\begin{aligned}{}[(M^2f)^\frac{1}{q-1}]^{2(q-1)}_{A_1}\le C. \end{aligned}$$

But since \(M^2(f)=M(M(f))\), this is, once again, Coifman–Rochberg theorem. We only have to pick, for instance, \(q=3\).

5.5 Proof for the square functions and for the vector-valued maximal function: second approach

For the dyadic square function \(S\), the statement of Theorem 7 and the discussion about the “model” of proof suggest that we need a C-F inequality with \(\beta =\frac{1}{2}\) as exponent on the weight. This is essentially what we borrow from [9] in (31) for the dyadic case and from (33) for the continuous case. From those estimates for the oscillation and using Lerner’s formula, we can derive the following Coifman–Fefferman type inequality.

Lemma 3

Let \(S\) be the dyadic square function operator \(S_d\) or the continuous square function \(g_{\mu }^{*}\) and let \(w\in A_q\). Then for any function \(f\) and every cube \(Q\)

$$\begin{aligned} \int _Q(Sf(x))^2 w(x)\ dx \le c\,2^q\, [w]_{A_q}\int _Q(Mf(x))^2 w(x)\ dx \end{aligned}$$
(46)

Proof

It follows directly from Lerner’s formula in both cases. For the median value, we can use Kolmogorov and the weak \((1,1)\) type of the operator as in the previous cases. \(\square \)

Now we prove Theorem 7 by using our model, but with a twist.

Proof of Theorem 7

Motivated by Lemma 3, we write the level set function \(\varphi (t)\) as follows.

$$\begin{aligned} \left| \left\{ x\in Q: \frac{Sf(x)^2}{Mf(x)^2}>t^2\right\} \right|&\le \frac{1}{t^{2p}}\int _{Q} {\frac{Sf(x)^{2p}}{Mf(x)|^{2p}}}\, dx\\&= \frac{1}{t^{2p}}\left\| \displaystyle {\left( {\frac{Sf}{Mf}}\right) ^2} \right\| _{L^p(Q)}^p\\&\le \frac{1}{t^{2p}}\left( \int _{Q}\frac{Sf(x)^2}{Mf(x)^2}\,h(x)\,dx\right) ^p \end{aligned}$$

for some \(h\) such that \(\left\| h \right\| _{L^{p^{\prime }}(Q)}=1\). Now we apply Rubio de Francia’s algorithm and (46) to obtain that

$$\begin{aligned} \int _{Q}\frac{Sf(x)^2}{Mf(x)^2}\,h(x)\,dx&\le \int _{Q} Sf(x)^2Mf(x)^{-2}\,R(h)(x)\,dx \\&\le [R(h)\,(Mf)^{-2}]_{A_q}\int _{Q}R(h)(x)\, dx\\&\le [R(h)\,(Mf)^{-2}]_{A_q}2\,\left\| h \right\| _{L^{p^{\prime }}(Q)}|Q|^{1/p}. \end{aligned}$$

The same factorization argument yields

$$\begin{aligned}{}[Rh(Mf)^{-2}]_{A_q} \le [(Mf)^\frac{2}{q-1}]^{(q-1)}_{A_1}p\le c\,p \end{aligned}$$

for \(q=5\) by Lemma 1.

We now choose \(\displaystyle {e^{-1}=\frac{c_n p}{t^2}}\) to finally obtain that

$$\begin{aligned} |\{x\in Q: Sf(x)> tMf(x)\}|\le \left( {\frac{c_n p}{t^2}}\right) ^p |Q| \le e^{-\alpha t^2}|Q| \end{aligned}$$

\(\square \)

Finally, the proof for the vector valued extension of the maximal function follows the same steps.

Proof of Theorem 6

The proof can be carried by replacing the “2” by “\(q\)” in case of the square function. The key estimate for the oscillation is inequality (34). \(\square \)