1 Introduction

Let \(L=-{{\,\mathrm{div}\,}}(A\nabla )\) be a divergence form elliptic operator on the upper half-space \({\mathbb {R}}^{d+1}_+\). In the present paper we show that if L is reasonably well-behaved then the Green function for L is well approximated by multiples of the distance to \({\mathbb {R}}^d\). There are many predecessors of these results which we will discuss below ([7, 11, 12, 14] to mention only the closer ones). At this point, however, let us underline two important points. First, the class of the operators that we consider is of the nature of the best possible, as shown by the counterexamples in Section 6. The estimates themselves are sharp, and in fact, a weak version of them is equivalent to the uniform rectifiability [6]. We hope to ultimately show that the much stronger estimate proved here is also true for domains with a uniformly rectifiable boundary, thus giving a strong and a weak characterization of uniform rectifiability in terms of approximation of the Green function (or more generally solutions) by distance function, but this will have to be the subject of another paper. Secondly, the method of the proof itself is quite unusual for this kind of bounds. A typical approach is through integrations by parts, which, however, does not allow one to access the optimal class of the coefficients. Roughly speaking, we are working with the square of the second derivatives of the Green function and given the roughness of the coefficients, there are too many derivatives in to control to take advantage of the equation while integrating by parts. Here, instead, we make intricate comparisons with solutions of the constant-coefficient operators, carefully adjusting them from scale to scale. We feel that the method itself is a novelty for this circle of questions and that it illuminates the nature of the Carleson estimates in a completely different way, hopefully opening a door to many other problems.

More generally, we are interested in the relations between an elliptic operator L on a domain \(\varOmega \), the geometry of \(\varOmega \), and the boundary behavior of the Green function. It is easy to see that the Green function with a pole at infinity for the Laplacian on the upper half-space \({\mathbb {R}}^{d+1}_+:=\left\{ (x,t): x\in {\mathbb {R}}^d, t\in {\mathbb {R}}_+\right\} \) is a multiple of t, the distance to the boundary, and more generally the Green function with a pole that is relatively far away is close to the distance function. There have been many efforts to generalize this to more general settings. For instance, in [2] the authors obtain flatness of the boundary from local small oscillations of the gradient of the Green function with a pole sufficiently far away. Philosophically, similar considerations underpin the celebrated results of Kenig and Toro connecting the flatness of the boundary to the property that the logarithm of the Poisson kernel lies in VMO [15]. Much more close to our setting is the study of the so-called Dahlberg-Kenig-Pipher operators (defined in (1.7)–(1.8)) pioneered by Kenig and Pipher [7, 14] in combination with the study of the harmonic measure on uniformly rectifiable sets by Hofmann, Martell, Toro, Tolsa, and others (see [3, 11] and many of their predecessors). Undoubtedly, the behavior of the harmonic measure is connected to the regularity of Green function G, yet the latter is different and surprisingly has been much less studied. In part, this is due to the fact that the harmonic measure is related to the gradient of G at the boundary while the estimates we target in this paper reach out to the second derivatives of G. One could say that the two are related by an integration by parts, but in the world of the rough coefficients this is not so. Indeed, relying on these ideas, [12] establishes second derivatives estimates for the Green function somewhat similar to ours under a much stronger condition that the gradient of the coefficients, rather than its square, satisfies a Carleson condition. It was clear already then that the optimal condition must be a control of the square-Carleson norm, but their methods, using the aforementioned integration by parts, did not give a possibility to overcome this restriction. In this paper we achieve the optimal results and, indeed, demonstrate using the counterexamples that they are the best possible.

In the present paper, we focus on \(\varOmega = {\mathbb {R}}^{d+1}_+\), and show that for the operators satisfying a slightly weaker version of the Dahlberg-Kenig-Pipher condition described below, the Green function is well approximated by multiples of t, in the sense that the gradient of normalized differences satisfies a square Carleson measure estimate. Notice that the class of coefficients authorized below is enough to treat the case when \(\varOmega \) is a Lipschitz graph domain, by a change of variables. As we mentioned above, we plan to pursue more general uniformly rectifiable sets in the upcoming work, which would give a much stronger version of our previous results in [6] and would show that our estimates are equivalent to the uniform rectifiability of the boundary. At this point, restricting to the simple domain \(\varOmega = {\mathbb {R}}^{d+1}_+\) will have the advantage of making the geometry cleaner and focusing on one of the tools of this paper, concerning the dependence of G (or the solutions) on the coefficients. Even in the “simple" case of the half-space, the question of good approximation of G by multiples of t seems, to our surprise, to be widely open, and the traditional methods of analysis break down brutally when trying to achieve such results. Perhaps one could also say that this setting is more classical. Let us pass to the details.

Consider an operator in divergence form \(L=-{{\,\mathrm{div}\,}}(A\nabla )\), where \(A=\begin{bmatrix} a_{ij}(X) \end{bmatrix}\) is an \((d+1)\times (d+1)\) matrix of real-valued, bounded and measurable functions on \({\mathbb {R}}^{d+1}_+\). We say that L is elliptic if there is some \(\mu _0>1\) such that

$$\begin{aligned} \langle A(X)\xi ,\zeta \rangle \le \mu _0\left| \xi \right| \left| \zeta \right| \text{ and } \langle A(X)\xi ,\xi \rangle \ge \mu _0^{-1}\left| \xi \right| ^2 \text { for }X \in {\mathbb {R}}^{d+1}_+ \text { and }\xi , \eta \in {\mathbb {R}}^{d+1}. \nonumber \\ \end{aligned}$$
(1.1)

We use lower case letters for points in \({\mathbb {R}}^d\), for example \(x\in {\mathbb {R}}^d\), and capital letters for points in \({\mathbb {R}}^{d+1}\), for example \(X=(x,t)\in {\mathbb {R}}^{d+1}\). We identify \({\mathbb {R}}^d\) with \({\mathbb {R}}^d \times \{ 0 \} \subset {\mathbb {R}}^{d+1}\) so, when \(t=0\), we may write x instead of \((x,0)\in {\mathbb {R}}^{d+1}\).

For \(x\in {\mathbb {R}}^d\) and \(r > 0\), we denote by \(\varDelta (x,r)\) the surface ball \(B_r(x) \cap \left\{ t=0\right\} \subset {\mathbb {R}}^d\). Thus \(\varDelta (x,r)\) is a ball in \({\mathbb {R}}^d\) while B(xr) is the ball of radius r in \({\mathbb {R}}^{d+1}\). We denote by

$$\begin{aligned} T(x,r):=B_r(x)\cap {\mathbb {R}}^{d+1}_+ \quad \text {and} \quad W(x,r):=\varDelta (x,r)\times \Bigl (\frac{r}{2},r\Bigr ] \subset {\mathbb {R}}^{d+1}_+ \end{aligned}$$
(1.2)

the corresponding Carleson box and Whitney cube. Note that T(xr) is a half ball in \({\mathbb {R}}^{d+1}_+\) over \(\varDelta (x,r)\). We may simply write \(T_\varDelta \) for a half ball over \(\varDelta \subset {\mathbb {R}}^d\).

Definition 1.3

(Carleson measure) We say that a nonnegative Borel measure \(\mu \) is a Carleson measure in \({\mathbb {R}}_+^{d+1}\), if its Carleson norm

$$\begin{aligned} \left\| \mu \right\| _{{\mathcal {C}}}:=\underset{\varDelta \subset {\mathbb {R}}^d}{\sup }\frac{\mu (T_\varDelta )}{\left| \varDelta \right| } \end{aligned}$$

is finite, where the supremum is over all the surface balls \(\varDelta \) and \(\left| \varDelta \right| \) is the Lebesgue measure of \(\varDelta \) in \({\mathbb {R}}^d\). We use \({\mathcal {C}}\) to denote the set of Carleson measures on \({\mathbb {R}}^{d+1}_+\).

For any surface ball \(\varDelta _0\subset {\mathbb {R}}^d\), we use \({\mathcal {C}}(\varDelta _0)\) to denote the set of Borel measures satisfying the Carleson condition restricted to \(\varDelta _0\), that is, such that

$$\begin{aligned} \left\| \mu \right\| _{{\mathcal {C}}(\varDelta _0)}:=\underset{\varDelta \subset \varDelta _0}{\sup }\frac{\mu (T_\varDelta )}{\left| \varDelta \right| } < +\infty .\end{aligned}$$

Next we want to define a (weaker) version of the Dahlberg-Kenig-Pipher conditions in the form which is convenient for the point of view taken in this paper. We would like to say that the matrix \(A = A(X)\) is often close to a constant coefficient matrix. The simplest way to measure this is to use the numbers

$$\begin{aligned} \alpha _\infty (x,r) = \inf _{A_0 \in {\mathfrak {A}}_0(\mu _0)} \, \sup _{(y,s) \in W(x,r)} |A(y,s)-A_0|, \end{aligned}$$
(1.4)

where the infimum is taken over the class \({\mathfrak {A}}_0(\mu _0)\) of (constant!) matrices \(A_0\) that satisfy the ellipticity condition (1.1). Notice that the matrix \(A_0\) is allowed to depend on (xr), so \(\alpha _\infty (x,r)\) is a measure of the oscillation of A in W(xr), similarly to [7]. We require \(A_0\) to satisfy (1.1) for convenience, but if we did not, we could easily replace \(A_0\) by one of the A(ys), \((y,s)\in W(x,r)\), which satisfies (1.1) by definition, at the price of multiplying \(\alpha _\infty (x,r)\) by at most 2. The same remark is valid for the slightly more general numbers

$$\begin{aligned} \alpha _q(x,r) = \inf _{A_0 \in {\mathfrak {A}}_0(\mu _0)} \bigg \{\fint _{(y,s) \in W(x,r)} |A(y,s)-A_0|^q \bigg \}^{1/q}, \end{aligned}$$
(1.5)

where, in fact, q will be chosen equal to 2.

Definition 1.6

(Weak DKP condition) We say that the coefficient matrix A satisfies the weak DKP condition with constant \(M > 0\), when \(\alpha _2(x,r)^2 \frac{\mathrm{d}x\,\mathrm{d}r}{r}\) is a Carleson measure on \({\mathbb {R}}^{d+1}_+\), with norm

$$\begin{aligned} {\mathfrak {N}}_2(A) : = \left\| \alpha _2(x,r)^2 \frac{\mathrm{d}x\,\mathrm{d}r}{r}\right\| _{{\mathcal {C}}} \le M. \end{aligned}$$
(1.7)

We may also say that \(\alpha _2(x,r)^2\) satisfies a Carleson measure estimate. Recall that this implies that \(\alpha _2(x,r)^2\) is small most of the time (to the point of being integrable against the infinite invariant measure \(\frac{\mathrm{d}x\,\mathrm{d}r}{r}\)), but does not vanish at any specific speed given in advance.

The name comes from a condition introduced by Dahlberg, Kenig, and Pipher, which instead demands that \(\widetilde{\alpha }(x,r)^2\) satisfy a Carleson estimate, where

$$\begin{aligned} \widetilde{\alpha }(x,r) = r \sup _{(y,s) \in W(x,r)} |\nabla A(y,s)|. \end{aligned}$$
(1.8)

In 1984, Dahlberg first introduced this condition, and conjectured that such a Carleson condition guarantees the absolute continuity of the elliptic measure with respect to the Lebesgue measure in the upper half-space. In 2001, Kenig and Pipher [14] proved Dahlberg’s conjecture. Since it is obvious that \(\alpha _2(x,r)\le \alpha _\infty (x,r) \le 2\widetilde{\alpha }(x,r)\), we see that our condition is weaker than the classical DKP condition, but importantly they have the same homogeneity. A similar weakening of the DKP condition, pertaining to the oscillations of the coefficients, has been considered, for example in [7]. We could also have chosen an exponent \(q\in (2,\infty ]\) for \(\alpha _q\) in Definition 1.6, but there is no point doing so as the Hölder inequality implies that the current condition is the weakest. Surprisingly, our theorem is easier to prove under this weaker condition.

We now say what we mean by good approximation by affine functions. On domains other than \({\mathbb {R}}^{d+1}_+\), we would use other models than the function \((y,t) \mapsto t\), such as (functions of) the distance to the boundary, but here we are interested in (approximation by) the affine function \((y,t) \mapsto \lambda t\), with \(\lambda > 0\).

We said earlier that we wanted to study the approximation of the Green functions (and we did not mention the poles too explicitly), but in fact our properties will also be valid for positive solutions u of \(Lu = 0\) that vanish at the boundary.

In addition, given such a solution u, when we are considering a given Carleson box T(xr), we do not want to assume any a priori knowledge on the average size of u in T(xr), so we just want to measure the approximation of u, in T(xr), by the best affine function \(a_{x,r}\) that we can think of, and it is reasonable to pick

$$\begin{aligned} a_{x,r}(z,t) = \lambda _{x,r} t, \, \text { where } \lambda _{x,r} = \lambda _{x,r}(u) = \fint _{T(x,r)}\partial _t u(z,t)\,\mathrm{d}z\,\mathrm{d}t \end{aligned}$$
(1.9)

is the average on T(xr) of the vertical derivative; see the beginning of Section 3 for more details about this choice of \(\lambda _{x,r}\). We measure the proximity of the two functions by the \(L^2\) average of the difference of the gradients (we seem to forget u but after all, it is easy to recuperate the functions from their gradients because they both vanish on the boundary), which we divide by the local energy of u because we do want the same result for u as for \(\lambda u\). That is, we set

$$\begin{aligned} J_u(x,r)= & {} \fint _{T(x,r)} |\nabla _{z,t} (u(z,t)-a_{x,r}(z,t))|^2\,\mathrm{d}z\,\mathrm{d}t\nonumber \\= & {} \fint _{T(x,r)} |\nabla _{z,t} u(z,t) - \lambda _{x,r}(u) {\mathbf {e}}_{d+1}|^2\,\mathrm{d}z\,\mathrm{d}t, \end{aligned}$$
(1.10)

where \({\mathbf {e}}_{d+1} = (0,\ldots , 1)\) is the vertical unit vector, and then divide by

$$\begin{aligned} E_u(x,r)=\fint _{T(x,r)} |\nabla u|^2 \end{aligned}$$
(1.11)

to get the number

$$\begin{aligned} \beta _u(x,r) = \frac{J_u(x,r)}{E_u(x,r)}. \end{aligned}$$
(1.12)

This number measures the normalized non-affine part of the energy of u in T(xr). We want to say that u is often close \(a_{x,r}\), that is that \(\beta _u(x,r)\) is often small, and this will be quantified by a Carleson measure condition on \(\beta _u\). We won’t need to square \(\beta _u\), because \(J_u\) is already quadratic.

The simplest version of our main result is the following:

Theorem 1

Let A be a \((d+1)\times (d+1)\) matrix of real-valued functions on \({\mathbb {R}}^{d+1}_+\) satisfying the ellipticity condition (1.1). If A satisfies the weak DKP condition with some constant \(M\in (0,\infty )\), and if we are given \(x_0 \in {\mathbb {R}}^d\), \(R>0\), and a positive solution u of \(Lu=-{{\,\mathrm{div}\,}}\left( A\nabla u\right) =0\) in \(T(x_0,R)\), with \(u=0\) on \(\varDelta (x_0,R)\), then the function \(\beta _u\) defined by (1.12) satisfies a Carleson condition in \(T(x_0,R/2)\), and more precisely

$$\begin{aligned} \left\| \beta _u(x,r) \frac{\mathrm{d}x\,\mathrm{d}r}{r}\right\| _{{\mathcal {C}}(\varDelta (x_0,R/2))} \le C+C\,M, \end{aligned}$$
(1.13)

where C depends only on d and \(\mu _0\).

That is, u is locally well approximated by affine functions in \(T(x_0,R/2)\), with essentially uniform Carleson bounds. Here “solution” means “weak solution”, and the values of u on \({\mathbb {R}}^d\) are well defined because solutions are locally Hölder continuous up to the boundary; this will be explained better in the next section.

Notice that the constant \(M>0\) can take any values, and we explicitly underlined the norm dependence. The result applies when u is the Green function for L, with a pole anywhere in \({\mathbb {R}}^{d+1}_+ \setminus {\overline{T}}(x_0,R)\). Even in the case of the Laplacian, the smallness of M does not guarantee the smallness of (1.13), that is, u is not necessarily so close to an affine function at the scale R. This is natural (the impact of what happens outside of \(T(x_0,R)\) could be substantial), and this effect will be ameliorated in the next statement, at the price of some additional quantifiers; the point is that the Green function with a pole at \(\infty \), or even a positive solution in a much larger box than \(T(x_0,R)\), behaves better and has a better approximation. The next theorem says that we can have Carleson norms for \(\beta _u\) that are as small as we want, provided that we take a small DKP constant and a large security box where u is a positive solution that vanishes on the boundary.

Theorem 2

Let d, \(\mu _0\) be given, let u and \(\varDelta (x_0,R)\) satisfy the assumptions of Theorem 1, and let A satisfy the weak DKP condition in \(\varDelta (x_0,R)\). Then for \(\tau \le 1/2\) we have the more precise estimate

$$\begin{aligned} \left\| \beta _u(x,r) \frac{\mathrm{d}x\,\mathrm{d}r}{r}\right\| _{{\mathcal {C}}(\varDelta (x_0, \tau R))} \le C \tau ^a + C \left\| \alpha _2(x,r)^2 \frac{\mathrm{d}x\,\mathrm{d}r}{r}\right\| _{{\mathcal {C}}(\varDelta (x_0,R))}, \end{aligned}$$
(1.14)

where C and \(a > 0\) depends only on d and \(\mu _0\).

This way the right-hand side can be made as small as we want. Notice that we only need A to satisfy the weak DKP condition in \(\varDelta (x_0,R)\); the values of A outside of \(T(x_0,R)\) should be irrelevant anyway, because we do not know anything about u there.

We observed earlier that this result applies to the Green function with a pole at \(\infty \) (see Lemma 6.1 for the precise definition), and to operators that satisfy the classical Dahlberg-Kenig-Pipher condition where the square of the function \(\widetilde{\alpha }\) of (1.8) satisfies a Carleson measure estimate. Notice that when u is the Green function with pole at \(\infty \) for L, Theorem 2 implies that the Carleson norm of \(\beta \) is simply less than \(C {\mathfrak {N}}_2(A)\), with \({\mathfrak {N}}_2(A)\) as in (1.7).

A rather direct consequence of our results is a Carleson measure estimate on the second derivatives of the Green function for DKP operators.

Corollary 1.15

Let A be a \((d+1)\times (d+1)\) matrix of real-valued functions on \({\mathbb {R}}^{d+1}_+\) satisfying the ellipticity condition (1.1). Suppose A satisfies the classical DKP condition with constant \(C_0\in (0,\infty )\), that is,

$$\begin{aligned} \left\| \widetilde{\alpha }(x,r)^2\frac{\mathrm{d}x\,\mathrm{d}r}{r}\right\| _{{\mathcal {C}}}\le C_0, \end{aligned}$$
(1.16)

where \(\widetilde{\alpha }(x,r)\) is defined in (1.8). If we are given \(x_0 \in {\mathbb {R}}^d\), \(R>0\), and a positive solution u of \(Lu=-{{\,\mathrm{div}\,}}\left( A\nabla u\right) =0\) in \(T(x_0,R)\), with \(u=0\) on \(\varDelta (x_0,R)\), then there exists some constant C depending only on d, \(\mu _0\) and \(C_0\) such that

$$\begin{aligned} \int _{T_\varDelta }\frac{\left| \nabla ^2u(y,t)\right| ^2}{u(y,t)^2}\,t^3\,\mathrm{d}y\,\mathrm{d}t\le C\left| \varDelta \right| \end{aligned}$$
(1.17)

for any \(\varDelta \subset \varDelta (x_0,R/2)\).

We state this corollary on the upper half-space for simplicity, but it can be generalized to Lipschitz domains by a change of variables that preserves the DKP class operators. In fact, the change of variables will be a bi-Lipcshitz mapping whose second derivatives satisfy a Carleson measure estimate. With such regularity of the change of variables, as well as our estimates for \(\beta _u\) in the main theorems, it reduces to the case of the upper half-space.

In Section 6, we construct an operator that does not satisfy the DKP condition, for which the precise approximation estimates of Theorems 1 and 2 fail.

In conclusion, let us point out that we extend the results above to domains with lower dimensional boundaries in [5]. In that case, there are currently no known free boundary results, in particular, it is not known whether the absolute continuity of elliptic measure with respect to the Hausdorff measure, or square function estimates, or the well-posedness of the Dirichlet problem imply the rectifiability of the boundary, and we hope that the correct condition is, in fact, an analogue of the property that the Green function is almost affine. The first and the third authors of the paper started such a study in [6], but if we want precise approximation results for the Green functions, the first significant step in the positive direction should be a version of main results of the present paper in the higher co-dimensional context, and their extension to uniformly rectifiable sets.

The rest of this paper is organized as follows: in the next section we recall some notation and the general properties of solutions that we need later. In Section 3 we comment the definition of \(J_u\) and \(\beta _u\), prove some decay estimates for \(\beta _u\) when u is a weak solution of a constant coefficient operator, and extend this to the general case with a variational argument. The rest of the proof of our main theorems, which consists in Carleson measure estimates with no special relations with solutions, is done in Section 4. We prove Corollary 1.15 in Section 5 using Theorem 1 and a Caccioppoli type argument. In Section 6, we discuss the optimality of our results.

2 Preliminaries and Properties of the Weak Solutions

In this section we recall some classical results for solutions of elliptic operators in divergence form.

Recall the notation B(Xr) for open balls centered at \(X\in {\mathbb {R}}^{d+1}\), \(\varDelta (x,r)\) for surface balls, T(xr) for Carleson boxes, and W(xr) for Whitney cubes (see near (1.2)). Also denote by \(\fint _Bf(x)\,\mathrm{d}x:=\frac{1}{\left| B\right| }\int _Bf(x)\,\mathrm{d}x\) the average of f on a set B.

Let us collect some well-known estimates for solutions of \(L=-{{\,\mathrm{div}\,}}(A\nabla )\), where A is a matrix of real-valued, measurable and bounded functions, satisfying the ellipticity condition (1.1).

Definition 2.1

(Weak solutions) Let \(\varOmega \) be a domain in \({\mathbb {R}}^n\). A function \(u\in W^{1,2}(\varOmega )\) is a weak solution to \(Lu=0\) in \(\varOmega \) if for any \(\varphi \in W^{1,2}_0(\varOmega )\),

$$\begin{aligned} \int _{\varOmega }A(X)\nabla u(X)\cdot \nabla \varphi (X)\,\mathrm{d}X=0. \end{aligned}$$

We will only be interested in the simple domains \(\varOmega = {\mathbb {R}}^{d+1}_+\) and \(\varOmega = {\mathbb {R}}^{d+1}_+ \cap B(x,r)\), with \(x\in {\mathbb {R}}^d\) and \(r>0\). The space \(W^{1,2}_0(\varOmega )\) is the closure in \(W^{1,2}(\varOmega )\) of the compactly supported smooth functions in \(\varOmega \). Conventional or strong solutions are obviously weak solutions as well. In this paper, our solutions are always taken in the sense of Definition 2.1.

From now on, u is a (weak) solution in \(\varOmega \). When we say that \(u=0\) on some surface ball \(\varDelta = \varDelta (x,r) \subset \varOmega \), we mean this in the sense of \(W^{1,2}(T_{\varDelta })\). This means that u is a limit in \(W^{1,2}(T_{\varDelta })\) of a sequence of functions in \(C_0^1(\overline{T_{\varDelta }}{\setminus }\varDelta )\). We could also say that the trace of u, which is defined and lies in \(H^{1/2}(\varDelta )\), is equal to 0 on \(\varDelta \). Ultimately, the De Giorgi-Nash-Moser theory (cf. Lemma 2.3) shows that under this assumption, the weak solution u is in fact continuous in \(T_{2r}\cup \varDelta _{2r}\), and, in particular, u vanishes on \(\varDelta \). Hence, in the rest of this paper the distinction is immaterial, but for now we will try to be precise.

We refer the readers to [13] for proofs and references for the following lemmas:

Lemma 2.2

(Boundary Caccioppoli Inequality) Let \(u\in W^{1,2}(T(x,2r))\) be a solution of L in T(x, 2r), with \(u=0\) on \(\varDelta (x,2r)\). There exists some constant C depending only on the dimension and the ellipticity constant of L, such that

$$\begin{aligned} \fint _{T(x,r)}\left| \nabla u(X)\right| ^2\,\mathrm{d}X\le \frac{C}{r^2}\fint _{T(x,2r)}\left| u(X)\right| ^2\,\mathrm{d}X. \end{aligned}$$

Lemma 2.3

(Boundary De Giorgi-Nash-Moser inequalities) Let u be as in Lemma 2.2. Then

$$\begin{aligned} \sup _{T(x,r)}\left| u\right| \le C \left( \fint _{T(x,2r)}u(X)^2\,\mathrm{d}X\right) ^{1/2}, \end{aligned}$$

where \(C=C(d,\mu _0)\). Moreover, for any \(0<\rho <r\), we have, for some \(\alpha =\alpha (d,\mu _0)\in (0,1]\),

$$\begin{aligned} \underset{T(x,\rho )}{{{\,\mathrm{osc\ }\,}}}u \le C\left( \frac{\rho }{r}\right) ^\alpha \left( \fint _{T(x,2r)}u(X)^2\,\mathrm{d}X\right) ^{1/2}, \end{aligned}$$

where \(\underset{\varOmega }{{{\,\mathrm{osc\ }\,}}}u:=\underset{\varOmega }{\sup \,}u-\underset{\varOmega }{\inf \,}u\).

Lemma 2.4

(Boundary Harnack Inequality) Let \(u\in W^{1,2}(T(x,2r))\) be a nonnegative solution of L in T(x, 2r) with \(u=0\) on \(\varDelta (x,2r)\). Then

$$\begin{aligned} u(X)\le Cu(X_r) \qquad \forall \, X\in T(x,r), \end{aligned}$$

where \(C>0\) depends only on the dimension and \(\mu _0\).

Of course, each of these statements has an interior analogue where we would replace T(xr) by a ball B(Xr) such that \(B(X,2R) \subset \varOmega \) and we would not have to specify the boundary conditions. The interior Harnack inequality reads as follows:

Lemma 2.5

(Harnack Inequality) There is some constant C, depending only on the dimension and the ellipticity constant for A, such that if \(u\in W^{1,2}(\varOmega )\) is a nonnegative solution of \(Lu=0\) in \(B(X,2r)\subset \varOmega \), then

$$\begin{aligned} \sup _{B(X,r)} u\le C\inf _{B(X,r)} u. \end{aligned}$$

We will also use the Comparison Principle.

Lemma 2.6

(Comparison Principle) Let \(u,v\in W^{1,2}(T(x,2r))\) be two nonnegative solutions of L in T(x, 2r), such that \(u=v=0\) on \(\varDelta (x,2r)\) and v is not identically null. Set \(X_{x,r} = (x,r)\) (a corckscrew point for T(x, 2r)). Then

$$\begin{aligned} C^{-1}\frac{u(X_{x,r})}{v(X_{x,r})} \le \frac{u(X)}{v(X)}\le C \frac{u(X_{x,r})}{v(X_{x,r})} \quad \text { for all } X\in T(x,r), \end{aligned}$$

where \(C=C(n,\mu _0)\ge 1\).

Lemma 2.7

(Reverse Hölder Inequality on the boundary) We can find an exponent \(p > 2\) and a constant \(C \ge 1\), that depend only on d and the ellipticity constant \(\mu _0\) for A, such that if u and T(x, 2r) are as in Lemma 2.2, then

$$\begin{aligned} \left( \fint _{T(x,r)}\left| \nabla u(X)\right| ^p\,\mathrm{d}X\right) ^{1/p} \le C\left( \fint _{T(x,2r)}\left| \nabla u(X)\right| ^2\,\mathrm{d}X\right) ^{1/2}. \end{aligned}$$

See [9], Chapter V for the proof of this Lemma.

We prove the following simple consequence of the above for reader’s convenience:

Lemma 2.8

Let \(u\in W^{1,2}(T(x,R))\) be a nonnegative solution of L in T(xR), with \(u=0\) on \(\varDelta (x,R)\). Then for all \(0<r<R/2\),

$$\begin{aligned} \fint _{T(x,r)}\left| \nabla u(X)\right| ^2\,\mathrm{d}X \approx \frac{u^2(X_{x,r})}{r^2}, \end{aligned}$$
(2.9)

where \(X_{x,r} = (x,r)\) as above and the implicit constant depends only on d and \(\mu _0\).

Proof

By translation invariance, we may assume that \(x_0\) is the origin.

To prove the \(\gtrsim \) inequality in (2.9), we apply Lemmas 2.3, 2.4, and the Poincaré inequality, and get

$$\begin{aligned} u(X_{x,r})^2 \le C\,\sup _{T_{x, r/2}} u^2\le C\fint _{T_{x,r}}u^2(X)\,\mathrm{d}X \le C r^2 \fint _{T_{x,r}} \left| \nabla u\right| ^2. \end{aligned}$$

For the \(\lesssim \) inequality in (2.9), simply combine the boundary Caccioppoli and boundary Harnack inequalities. \(\square \)

We now record a basic regularity estimate for constant coefficient operators. This will be used in the next section to get decay estimates for \(J_u\), and then extended partially to our more general operators L, with comparison arguments. We shall systematically use \(A_0\) to denote a constant real \((d+1)\times (d+1)\) matrix, which we always assume to satisfy the ellipticity condition (1.1), and write \(L_0:= -{{\,\mathrm{div}\,}}\left( A_0\nabla \right) \). Solutions to such operators enjoy additional regularity and in particular, we will use the following result. We state it in \(T_1 = T(0,1)\) to simplify the notation. More generally, set \(T_r = T(0,r)\) for \(r > 0\).

Lemma 2.10

Let \(u\in W^{1,2}(T_1)\) be a solution to \(L_0u=0\) in \(T_1\) with \(u=0\) on \(\varDelta _1\). Then for any multiindex \(\alpha \), \(\left| \alpha \right| \in {\mathbb {Z}}_+\),

$$\begin{aligned} \underset{T_{\frac{1}{2}}}{\sup }\left| D^{\alpha }u\right| \le C\left( \fint _{T_1}\left| \nabla u(X)\right| ^2\,\mathrm{d}X\right) ^{1/2}, \end{aligned}$$
(2.11)

where \(C=C(d,\mu _0,\left| \alpha \right| )\). In particular, for any \(T(x,r)\subset T_{1/2}\),

$$\begin{aligned} \underset{T(x,r)}{{{\,\mathrm{osc\ }\,}}}\partial _iu\le Cr\left( \fint _{T_{1}} \left| \nabla u(X)\right| ^2\,\mathrm{d}X\right) ^{1/2}, \quad i=1,2,\dots , d+1, \end{aligned}$$
(2.12)

where the constant C depends only on the dimension and \(\mu _0\).

Proof

First we claim that the standard local estimates on solutions for constant-coefficient operators in \({\mathbb {R}}^{d+1}_+\) ensure that

$$\begin{aligned} \Vert D^\alpha u\Vert _{L^2 (T_{1/2})}\lesssim \Vert \nabla u\Vert _{L^2 (T_1)}+ \Vert u\Vert _{L^2 (T_1)}. \end{aligned}$$
(2.13)

This is due to the fact that any weak solution to \(Lu=f\) on a smooth bounded domain \(\varOmega \) and with zero Dirichlet boundary data satisfies

$$\begin{aligned} \Vert u\Vert _{W^{m+2,2}(\varOmega )} \lesssim \Vert f\Vert _{W^{m,2}(\varOmega )} +\Vert u\Vert _{L^2(\varOmega )}, \quad m=0,1,2,...; \end{aligned}$$

see, for example [8], § 6.3, Theorems 4, 5. Here, \(W^{m,2}(\varOmega )\) is the Sobolev space of functions whose derivatives up to the order m lie in \(L^2(\varOmega )\). With this at hand, we observe that for any smooth cutoff function \(\eta \) equal to 1 on \(B_{1/2}\) and supported in \(B_{3/4}\) we have

$$\begin{aligned} L_0(u\eta )=-A_0 \nabla \eta \cdot \nabla u-A_0\nabla u\cdot \nabla \eta +u\,L_0 \eta , \end{aligned}$$

and hence the estimate above applied consecutively with \(m=0,1,2...\) in some smooth domain \(T_{3/4}\subset \varOmega \subset T_1\) gives (2.13). Applying Poincaré’s inequality, we conclude that

$$\begin{aligned} \Vert D^\alpha u\Vert _{L^2 (T_{1/2})} \lesssim \Vert \nabla u\Vert _{L^2 (T_1)} \end{aligned}$$
(2.14)

for any multiindex \(\alpha \) with \(|\alpha |\in {\mathbb {Z}}_+\). On the other hand, by the Sobolev embedding theorem ([1] Theorem 4.12), for any multiindex \(\alpha \),

$$\begin{aligned} \sup _{T_{1/2}}\left| D^{\alpha }u\right| \le C \left\| u\right\| _{W^{|\alpha |+n,2}(T_{1/2})} , \end{aligned}$$

where C depends on n and \(|\alpha |\). We combine this with (2.14) and get (2.11).

The estimate (2.12) is an immediate consequence of (2.11), since

$$\begin{aligned} \underset{T(x,r)}{{{\,\mathrm{osc\ }\,}}} \partial _i u \le r \sup _{T(x,r)} \left| \nabla \partial _i u\right| \le r\,\sup _{T_{1/2}}\left| \nabla \partial _i u\right| \le Cr \left( \fint _{T_{1}}\left| \nabla u\right| ^2\right) ^{1/2}, \end{aligned}$$

as desired. \(\square \)

Remark 2.15

Lemma 2.10 is more than enough to prove Theorems 1 and 2 in the special case of constant-coefficient operators. Indeed it says that \(\nabla u\) is Lipschitz in \(T_{1/2}\), so in particular \(\nabla u - \nabla u(0)\) is small near the origin. Notice that \(\nabla u(0) = (0,\partial _t u(0))\) because u vanishes on the boundary; with this and similar statements for other surface balls, it would be rather easy to control \(\beta _u\) and prove the theorems in the case of constant-coefficient operators. We don’t do this here because we need more general estimates anyway.

3 Approximations and the Main Conditional Decay Estimate

We observed in Remark 2.15 that our theorems should be easy to prove when L is a constant coefficient operator. In this section, we use the results of the previous section, together with an approximation argument, to prove some decay estimate for \(\beta _u\) in regions where A is nearly constant. See Corollary 3.45.

At the center of the proof is an estimate for \(||\nabla u - \nabla u_0||_2\), where u is a solution for L in some Carleson box T(xr), and \(u_0\) is a solution for a close enough constant coefficient operator \(L_0\), with the same boundary values on \(\partial T(x,r)\). See Lemma 3.11.

3.1 A little more about orthogonality, \(J_u\), and \(\beta _u\)

First return to the approximation of a solution u by the affine function \(a_{x,r}(z,t) = \lambda _{x,r} t\) of (1.9). Let us check what we said earlier, that \(a_{x,r}\) is the best affine approximation of this type in T(xr). Recall from (1.10) that

$$\begin{aligned} \begin{aligned} J_u(x,r)&= \fint _{T(x,r)} |\nabla (u(z,t)-a_{x,r}(z,t))|^2\,\mathrm{d}z\,\mathrm{d}t = \fint _{T(x,r)} |\nabla u - \lambda _{x,r}(u) {\mathbf {e}}_{d+1}|^2\,\mathrm{d}z\,\mathrm{d}t \\&= \fint _{T(x,r)} |\nabla _z u(z,t)|^2\,\mathrm{d}z\,\mathrm{d}t + \fint _{T(x,r)} |\partial _t u(z,t) - \lambda _{x,r}(u)|^2\,\mathrm{d}z\,\mathrm{d}t \end{aligned} \end{aligned}$$
(3.1)

where \({\mathbf {e}}_{d+1} = (0,\ldots , 1)\) is the vertical unit vector, and we split the full gradient \(\nabla u\) into the horizontal gradient \(\nabla _x u\) and the vertical part \(\partial _t u\). Now \(\lambda _{x,r}(u) = \fint _{T(x,r)} \partial _t u\) by (1.9), so \(\partial _t u - \lambda _{x,r}(u)\) is orthogonal to constants in \(L^2(T(x,r))\), hence for any other \(\lambda \),

$$\begin{aligned} \fint _{T(x,r)} |\partial _t u - \lambda |^2 = |\lambda - \lambda _{x,r}(u)|^2 + \fint _{T(x,r)} |\partial _t u - \lambda _{x,r}(u)|^2, \end{aligned}$$

and, by the same computation as above,

$$\begin{aligned} \begin{aligned} \fint _{T(x,r)} |\nabla (u- \lambda t)|^2&= |\lambda - \lambda _{x,r}(u)|^2 + \fint _{T(x,r)} |\nabla u - \lambda _{x,r}(u) {\mathbf {e}}_{d+1}|^2 \\&= |\lambda - \lambda _{x,r}(u)|^2 + J_u(x,r). \end{aligned} \end{aligned}$$
(3.2)

We may find it convenient to use the fact that, as a consequence,

$$\begin{aligned} \beta _u(x,r) = \inf _{\lambda \in {\mathbb {R}}} \, \frac{\fint _{T(x,r)} |\nabla (u- \lambda t)|^2}{\fint _{T(x,r)} |\nabla u|^2} \le 1 \end{aligned}$$
(3.3)

(compare with (1.12), and for the second part try \(\lambda = 0\)).

For most of the rest of this section, we concentrate on balls centered at the origin; to save notation, we set \(B_r = B(0,r)\), \(T_r = T(0,r) = B_r \cap {\mathbb {R}}^{d+1}_+\), and \(W_r = W(0,r)\) (see (1.2)). Similarly, it will be convenient to use the notation

$$\begin{aligned} J_u(r) = J_u(0,r)= \fint _{T_r}\left| \nabla \left( u(x,t)-\lambda _r(u)\, t\right) \right| ^2\,\mathrm{d}x\,\mathrm{d}t, \end{aligned}$$

where

$$\begin{aligned} \lambda _r(u) = \lambda _{0,r}(u) =\fint _{T_r} \partial _s u(y,s)\,\mathrm{d}y\,\mathrm{d}s \end{aligned}$$

(see (1.9) and (1.10)), and we set \(E_u(r) = E_u(0,r)\), \(\beta _u(r) = \beta _u(0,r)\) (see (1.11) and (1.12)).

3.2 Decay estimates for constant-coefficient operators

We shall now prove a few estimates on solutions of constant-coefficient equation, which will be useful when we try to replace L by a constant-coefficient operator. We start with a consequence of Lemma 2.10.

Lemma 3.4

Let \(A_0\) be a constant matrix that satisfies the ellipticity condition (1.1), set \(L_0 = -{{\,\mathrm{div}\,}}\left( A_0\nabla \right) \), and and let u be a solution to \(L_0 u = 0\) in \(T_1\) such that \(u=0\) on \(\varDelta _1\). There exists some constant C, depending only on the dimension and \(\mu _0\), such that for \(0<r<1/2\),

$$\begin{aligned} J_u(r)\le Cr^2J_u(1)\le Cr^2 E_u(1). \end{aligned}$$
(3.5)

Proof

The second inequality follows at once from (3.2) (with \(\lambda = 0\)) for u. Next let \(v(x,t)=u(x,t)-\lambda _r(u)\, t\). Since t is a solution for the constant coefficient operator \(L_0\), v is a solution for \(L_0\) as well in the domain in \(T_1\), with \(v(x,0)=0\) for all \(x\in \varDelta _1\). We claim that

$$\begin{aligned} \text {there exists some } (x',t')\in T_r \text { for which } \partial _tv(x',t')=0. \end{aligned}$$
(3.6)

To see this, we observe first that \(\partial _tv(x,t)=\partial _tu(x,t)-\fint _{T_r}\partial _tu(x,t)\,\mathrm{d}x\,\mathrm{d}t\) has mean value 0. Since u is a solution of the constant-coefficient equation \(L_0u=0\), \(\partial _tu\) is also a solution of the same equation. Therefore, by the De Giorgi-Nash-Moser theory, \(\partial _tu\) is continuous in \(T_r\), and thus so is \(\partial _tv\). Then (3.6) follows from the connectedness of \(T_r\) and the mean value theorem. Thanks to (3.6), \(\underset{T_r}{\sup }\left| \partial _tv\right| \le \underset{T_r}{{{\,\mathrm{osc\ }\,}}}\partial _tv\), and thus by (2.12) and because adding a constant does not change the oscillation,

$$\begin{aligned} \fint _{T_r}\left| \partial _tv\right| ^2\le & {} \Bigl (\underset{T_r}{{{\,\mathrm{osc\ }\,}}}\partial _tv\Bigr )^2=\Bigl (\underset{T_r}{{{\,\mathrm{osc\ }\,}}}(\partial _tv+\lambda _r(u)-\lambda _1(u))\Bigr )^2\\= & {} \Bigl (\underset{T_r}{{{\,\mathrm{osc\ }\,}}}\partial _t(u-\lambda _1(u)\, t)\Bigr )^2 \le C r^2\fint _{T_1}\left| \nabla (u(x,t)-\lambda _1(u)\, t)\right| ^2\,\mathrm{d}x\,\mathrm{d}t. \end{aligned}$$

For the rest of the gradient, notice that for \(1\le j\le d\),

$$\begin{aligned} \partial _jv(x,t)=\partial _j\left( v(x,t)+\lambda _r(u)\, t -\lambda _1(u)\, t\right) , \end{aligned}$$

and \(\partial _jv(x,0)=0\). Therefore,

$$\begin{aligned} \fint _{T_r}\left| \partial _jv\right| ^2\le & {} \Bigl (\underset{T_r}{{{\,\mathrm{osc\ }\,}}}\partial _j\left( v(x,t)+\lambda _r(u)\, t -\lambda _1(u)\, t\right) \Bigr )^2\\\le & {} C r^2\fint _{T_1}\left| \nabla (u(x,t)-\lambda _1(u)\, t)\right| ^2\,\mathrm{d}x\,\mathrm{d}t =Cr^2J_u(1). \end{aligned}$$

Now (3.5) follows from the two estimates above. \(\square \)

Remark 3.7

The proof of Lemma 3.4 also works when we replace \(J_u(r)\) in (3.5) with \(\fint _{T_r}\left| \nabla _{x,t}\left( u(x,t)-\lambda _s(u)\, t\right) \right| ^2\), for any \(0<s\le r\). That is, we also get that

$$\begin{aligned} \fint _{T_r}\left| \nabla _{x,t}\left( u(x,t)-\lambda _s(u)\, t\right) \right| ^2\,\mathrm{d}x\,\mathrm{d}t \le C r^2J_u(1). \end{aligned}$$
(3.8)

This may be a better estimate, since (3.2) says that for any \(\lambda \),

$$\begin{aligned} J_u(r) \le \fint _{T_r}\left| \nabla \left( u(x,t)-\lambda \, t\right) \right| ^2\,\mathrm{d}x\,\mathrm{d}t. \end{aligned}$$

We will need a lower bound for the ratio \(\frac{E_u(r)}{E_u(1)}\) for positive solutions of \(L_0 u = 0\).

Lemma 3.9

Let the matrix \(A_0\) be constant and satisfy the ellipticity condition (1.1), set \(L_0 = -{{\,\mathrm{div}\,}}\left( A_0\nabla \right) \), and let u be a positive solution to \(L_0 u = 0\) in \(T_1\) such that \(u=0\) on \(\varDelta _1\). Then

$$\begin{aligned} E_u(r) \ge C(1-C' r^2)E_u(1) \qquad \text { for } 0<r<1/2, \end{aligned}$$
(3.10)

where C and \(C'\) are positive constants depending only on the dimension and \(\mu _0\).

Notice that when r is small, the lower bound (3.10) does not depend much on r. This is better than what we would get by simply applying Lemma 2.8 and the Harnack inequality to the positive solution u. The proof exploits the fact that t is a solution for the constant-coefficient operator \(L_0\) and the comparison principle.

Proof

Define \(\lambda _0 =\partial _tu(0,0)\). Then by (2.12),

$$\begin{aligned} \left| \lambda _r(u)-\lambda _0\right| \le \underset{T_r}{{{\,\mathrm{osc\ }\,}}}\partial _tu \le Cr\left( \fint _{T_1}\left| \nabla u\right| ^2\right) ^{1/2}. \end{aligned}$$

Since t is a solution for \(L_0\) that vanishes on \(\varDelta _1\), the comparison principle and Lemma 2.8 give (with the corkscrew point \(X_{x,t} = (x,t)\))

$$\begin{aligned} \frac{u(x,t)}{t} \ge C^{-1}u(X_{0,1}) \ge C^{-1}\left( \fint _{T_1}\left| \nabla u\right| ^2\right) ^{1/2} \qquad \text { for } (x,t)\in T_{1/2}, \end{aligned}$$

which implies, by taking a limit and using the existence of \(\nabla u\) at 0, that

$$\begin{aligned} \lambda _0=\partial _tu(0,0)\ge C^{-1}\left( \fint _{T_1}\left| \nabla u\right| ^2\right) ^{1/2}. \end{aligned}$$

Then

$$\begin{aligned} E_u(r)\ge \lambda _r(u)^2 \ge \frac{\lambda _0^2}{2}-(\lambda _r(u)-\lambda _0)^2 \ge ((2C)^{-1}-C'r^2)\fint _{T_1}\left| \nabla u\right| ^2 \end{aligned}$$

(use the fact that \(a^2 \ge \frac{b^2}{2} - (a-b)^2\)). This completes the proof of Lemma 3.9. \(\square \)

3.3 Extension to general elliptic operators L

We now return to a solution of our original equation \(Lu=0\), and compare it with solutions \(u^0\) of \(L_0 u^0=0\) of a constant coefficient operator \(L_0 = -{{\,\mathrm{div}\,}}\left( A_0\nabla \right) \), with the same boundary data. For the moment we do not say who is the constant matrix \(A_0\) (except that we require it to satisfy the ellipticity condition (1.1)), but of course our estimates will be better if we choose a good approximation of A in \(T_1\).

Even though it does not look like much, the next lemma is probably the central estimate of this paper. We do not need \(A_0\) to have constant coefficients here.

Lemma 3.11

Let \(L = -{{\,\mathrm{div}\,}}\left( A\nabla \right) \) and \(L_0 = -{{\,\mathrm{div}\,}}\left( A_0\nabla \right) \) be two elliptic operators, and assume that A and \(A_0\) satisfy the ellipticity condition (1.1). Let u be a solution to \(Lu=0\) in \(T_1\), with \(u=0\) on \(\varDelta _1\), and let \(u^0\) be a solution of \(L_0 u^0=0\) in \(T_1\) with \(u^0=u\) on \(\partial T_1\). Then there is some constant \(C>0\) depending only on d and the ellipticity constant \(\mu _0\), such that

$$\begin{aligned} \int _{T_1}\left| \nabla u- \nabla u^0\right| ^2 \le \mu _0^2 \min \left\{ \int _{T_1}\left| A-A_0\right| ^2\left| \nabla u\right| ^2\,\mathrm{d}X, \int _{T_1}\left| A-A_0\right| ^2\left| \nabla u^0\right| ^2\,\mathrm{d}X\right\} .\nonumber \\ \end{aligned}$$
(3.12)

Proof

The solutions are in the space \(W^{1,2}(T_1)\) by definition, and \(u^0=u\) on the boundary should be interpreted as \(u^0-u=0\) in the sense of \(W^{1,2}(T_1)\), or equivalently, \(u^0-u\in W_0^{1,2}(T_1)\). So the existence of \(u^0\in W^{1,2}(T_1)\) as above is guaranteed by the Lax-Milgram Theorem. Alternatively, it is possible to find \(u^0\) because the trace of u lies in \(H^{1/2}(\partial B)\). In addition, \(u^0\) is nonnegative by the maximum principle.

Since \(u-u^0\) lies in the set \(W^{1,2}_0\) of test functions allowed in Definition 2.1,

$$\begin{aligned} \frac{1}{\mu _0} \int _{T_1}\left| \nabla (u-u^0)\right| ^2&\le \int _{T_1} A\nabla (u-u^0)\cdot \nabla (u-u^0)=-\int _{T_1}A\nabla u^0\cdot \nabla (u-u^0)\\&=\int _{T_1}(A_0-A)\nabla u^0\cdot \nabla (u-u^0)\\&\le \frac{\mu _0}{2} \int _{T_1}\left| A-A_0\right| ^2\left| \nabla u^0\right| ^2 +\frac{1}{2\mu _0}\int _{T_1}\left| \nabla (u-u^0)\right| ^2, \end{aligned}$$

where we use (1.1), the fact that u is a solution of \({{\,\mathrm{div}\,}}(A\nabla )u=0\) in \(T_1\) (and \(u-u^0\) vanishes on the boundary), then the fact that \(u^0\) is a solution of \({{\,\mathrm{div}\,}}(A_0\nabla )u^0=0\) in \(T_1\), followed by the inequality \(2ab \le \mu _0 a^2 + \mu _0^{-1} b^2\). Then

$$\begin{aligned} \int _{T_1}\left| \nabla (u-u^0)\right| ^2\le \mu _0^2 \int _{T_1}\left| A-A_0\right| ^2\left| \nabla u^0\right| ^2. \end{aligned}$$

This gives the bound by one of the expressions in the minimum in (3.12). Interchanging the roles of u and \(u^0\), and A and \(A_0\), we also obtain the other bound. \(\square \)

A similar proof also gives the following (which can be applied even if \(A-A_0\) is not small):

Lemma 3.13

Let A, \(A_0\), u, and \(u^0\) be as in Lemma 3.11. Then

$$\begin{aligned} \mu _0^{-4}\int _{T_1}\left| \nabla u^0(X)\right| ^2\,\mathrm{d}X \le \int _{T_1}\left| \nabla u(X)\right| ^2\,\mathrm{d}X\le \mu _0^4\int _{T_1}\left| \nabla u^0(X)\right| ^2\,\mathrm{d}X,\nonumber \\ \end{aligned}$$
(3.14)

where \(\mu _0\) still denotes the ellipticity constant.

We shall immediately see that u being a solution is not necessary for the first inequality to hold, and similarly, \(u^0\) being a solution is not necessary for the second inequality. But the condition \(u-u^0\in W_0^{1,2}(T_1)\) is essential.

Proof

We estimate

$$\begin{aligned} \mu _0^{-1}\int _{T_1}\left| \nabla u\right| ^2&\le \int _{T_1}A\nabla u\cdot \nabla u=\int _{T_1}A\nabla u \cdot \nabla (u-u^0) +\int _{T_1}A\nabla u \cdot \nabla u^0\\&=\int _{T_1}A\nabla u \cdot \nabla u^0\le \mu _0\left( \int _{T_1}\left| \nabla u\right| ^2\right) ^{1/2}\left( \int _{T_1}\left| \nabla u^0\right| ^2\right) ^{1/2}. \end{aligned}$$

Hence,

$$\begin{aligned} \int _{T_1}\left| \nabla u\right| ^2\le \mu _0^4\int _{T_1}\left| \nabla u^0\right| ^2. \end{aligned}$$

The left-hand side of (3.14) follows from the same argument, interchanging the roles of u and \(u^0\), A and \(A_0\), respectively. \(\square \)

Let us announce how we intend to estimate the right-hand side of (3.12). The simplest would be to estimate \(\left| A-A_0\right| ^2\) in \(L^\infty \) norm and use the \(L^2\) norm of \(\nabla u\), but if we do this we will get quantities that do not seem to be controlled even by the \(\alpha _\infty \) of (1.4). So instead we decide to use the quantity

$$\begin{aligned} \gamma (x,r) = \inf _{A_0 \in {\mathfrak {A}}_0(\mu _0)} \bigg \{\fint _{(y,s) \in T(x,r)} |A(y,s)-A_0|^2\,\mathrm{d}y\,\mathrm{d}s \bigg \}^{1/2}, \end{aligned}$$
(3.15)

where as before the infimum is taken over the class \({\mathfrak {A}}_0(\mu _0)\) of constant matrices \(A_0\) that satisfy the ellipticity condition (1.1). Notice that the domain of integration fits the domain of integration of (3.12), but it is larger than what we have in (1.5). Nonetheless, the next lemma, to be proved in the next section, will allow us to use the \(\gamma (x,r)\).

Lemma 3.16

If the matrix-valued function A satisfies the weak DKP condition of Definition 1.6, with constant \(\varepsilon > 0\), then \(\gamma (x,r)^2 \frac{\mathrm{d}x\,\mathrm{d}r}{r}\) is Carleson measure on \({\mathbb {R}}^{d+1}_+\), with norm

$$\begin{aligned} \left\| \gamma (x,r)^2 \frac{\mathrm{d}x\,\mathrm{d}r}{r}\right\| _{{\mathcal {C}}} \le C {\mathfrak {N}}_2(A) \le C \varepsilon , \end{aligned}$$
(3.17)

where \({\mathfrak {N}}_2(A) = \left\| \alpha _2(x,r)^2 \frac{\mathrm{d}x\,\mathrm{d}r}{r}\right\| _{{\mathcal {C}}}\) as in (1.7), and

$$\begin{aligned} \gamma (x,r)^2 \le C {\mathfrak {N}}_2(A) \le C \varepsilon \quad \text { for } (x,r) \in {\mathbb {R}}^{d+1}_+. \end{aligned}$$
(3.18)

Here C depends only on d and \(\mu _0\).

See the next section for the proof.

Since we do not have a small \(L^\infty \) control on A, we need a better estimate on \(\nabla u\). This will be achieved by reverse Hölder estimates (for example Lemma 2.7), which gives us an exponent \(p>2\) that depends only on d and \(\mu _0\). We first state the needed estimate for the unit box \(T_1\).

Lemma 3.19

Let u be a positive solution to \(Lu=0\) in \(T_5\), with \(u=0\) on \(\varDelta _5\), choose a constant matrix \(A_0 \in {\mathfrak {A}}_0(\mu _0)\) that attains the infimum in the definition (3.15) of \(\gamma (0,1)\), and let \(u^0\) be as in Lemma 3.11 (with this choice of \(A_0\)). Then for any \(\delta >0\),

$$\begin{aligned} \int _{T_1} \left| \nabla u- \nabla u^0\right| ^2 \,\mathrm{d}X \le \left( \delta +C_\delta \gamma (0,1)^2\right) E_u(1), \end{aligned}$$
(3.20)

where \(C_\delta \) depends on d, \(\mu _0\), and \(\delta \).

Proof

We discussed the existence of \(u^0\) when we proved Lemma 3.11. We start from (3.12), which reads

$$\begin{aligned} \int _{T_1} \left| \nabla u- \nabla u^0\right| ^2 \le C \int _{T_1}\left| A-A_0\right| ^2\left| \nabla u\right| ^2. \end{aligned}$$
(3.21)

Let us cut off and consider first the set

$$\begin{aligned} Z:=\left\{ X\in T_1: \left| \nabla u(X)\right| ^2 \le KE_u(1)\right\} , \end{aligned}$$

with \(K>0\) to be determined soon. We pull out the gradient and get a contribution

$$\begin{aligned} \int _Z\left| A-A_0\right| ^2\left| \nabla u\right| ^2 \le KE_u(1)\int _{Z}\left| A-A_0\right| ^2 \le K \gamma (0,1)^2 E_u(1). \end{aligned}$$
(3.22)

In the region \(T_1\setminus Z\) where \(\left| \nabla u\right| ^2> K E_u(1)\), we see that

$$\begin{aligned} \left| \nabla u\right| ^2=\left| \nabla u\right| ^p \left| \nabla u\right| ^{2-p} \le \left| \nabla u\right| ^p (K E_u(1))^{\frac{2-p}{2}}, \end{aligned}$$

where \(p>2\) and will be chosen as in Lemma 2.7. Then

$$\begin{aligned} \int _{T_1\setminus Z} \left| A-A_0\right| ^2\left| \nabla u\right| ^2 \le 2\mu _0^2 \int _{T_1\setminus Z} \left| \nabla u\right| ^2 \le 2\mu _0^2 (K E_u(1))^{\frac{2-p}{2}} \int _{T_1} \left| \nabla u\right| ^p\,\mathrm{d}X.\nonumber \\ \end{aligned}$$
(3.23)

We required u to be a nice solution in the larger set \(T_5\), so that we can use the following estimates from Section 2. First,

$$\begin{aligned} \big \{\fint _{T_1} \left| \nabla u\right| ^p\,\mathrm{d}X\big \}^{\frac{2}{p}} \le C \fint _{T_2} \left| \nabla u\right| ^2\,\mathrm{d}X \end{aligned}$$

by Lemma 2.7. Now we apply Lemma 2.8 to \(T_2\) (with \(X_2 = (0,2)\)) and later \(T_1\) (with \(X_1 = (0,1)\)), to find that

$$\begin{aligned} \fint _{T_2} \left| \nabla u\right| ^2 \le C u^2(X_2) \le C u^2(X_1) \le C\fint _{T_1} \left| \nabla u\right| ^2, \end{aligned}$$

where the intermediate inequality follows from Harnack’s inequality. From these estimates and (3.23), the contribution from \(T_1\setminus Z\) is

$$\begin{aligned} \int _{T_1\setminus Z} \left| A-A_0\right| ^2\left| \nabla u\right| ^2\le CK^{\frac{2-p}{2}}E_u(1). \end{aligned}$$

Now we choose K so that \(CK^{\frac{2-p}{2}}=\delta \), and the desired estimate (3.20) follows at once. \(\square \)

We now have enough information to derive the same sort of decay estimates for the non-affine part of our solution u that we proved, at the beginning of this section, for solutions \(u^0\) of constant coefficient operators. We start with an analogue of Lemma 3.4.

Lemma 3.24

Let u be a solution to \(Lu=0\) in \(T_1\) with \(u=0\) on \(\varDelta _1\). Then for \(0<r<1/4\),

$$\begin{aligned} J_u(r) \le C \left( r^2+K^{\frac{2-p}{2}}r^{-d-1}\right) J_u(1) + \frac{C_K}{r^{d+1}}\gamma (0,1)^2E_u(1), \end{aligned}$$
(3.25)

where \(K>0\) is arbitrary, \(p=p(d,\mu _0)>2\), C depends only on d, \(\mu _0\) and p, and \(C_K\) depends additionally on K.

Notice that we do not require the positivity of u yet, which is why we don’t use Lemma 3.19 for the moment.

Proof

We write u as affine plus orthogonal on \(T_1\), that is

$$\begin{aligned} u(x,t)=v(x,t)+\lambda _1(u)t. \end{aligned}$$

Note that \(\lambda _1(u)^2\le E_u(1)\), and \(E_v(1)=J_u(1)\).

Choose a constant matrix \(A_0 \in {\mathfrak {A}}_0(\mu _0)\) that attains the infimum in the definition (3.15) of \(\gamma (0,1)\), and let \(L_0=-{{\,\mathrm{div}\,}}{A_0\nabla }\) as usual. Now consider the \(L_0\)-harmonic extension to \(T_{1/2}\) of the restriction of u to \(\partial T_{1/2}\), which can be written as

$$\begin{aligned} u_0(x,t) = v_0(x,t) + \lambda _1(u) t, \end{aligned}$$
(3.26)

where we use the fact that t is a solution of the constant-coefficient equation, and \(v_0\) is the \(L_0\)-harmonic extension of \(v_{\vert \partial T_{1/2}}\). These extensions are well-defined since u is Hölder continuous on \(\overline{T_{1/2}}\), and the Lax-Milgram Theorem guarantees the existence and uniqueness of the \(W^{1,2}(T_{1/2})\) solution. In particular, \(L_0 u_0=0\) in \(T_{1/2}\), with \(u_0=u\) on \(\partial T_{1/2}\).

We claim that for any fixed \(0<r<1/4\),

$$\begin{aligned} J_u(r) \le C r^2 J_u(1) + \frac{C}{r^{d+1}}\fint _{T_{1/2}}\left| A(x,t)-A_0\right| ^2\left| \nabla u_0(x,t)\right| ^2\,\mathrm{d}x\,\mathrm{d}t. \end{aligned}$$
(3.27)

To see this, we use the inequality \((a+b+c)^2 \le 3(a^2 + b^2 + c^2)\) to write

$$\begin{aligned} J_u(r)= & {} \fint _{T_r}\left| \nabla \left( u-\lambda _r(u)\, t\right) \right| ^2 \le 3\fint _{T_r}\left| \nabla (u_0 -\lambda _r(u_0)\, t)\right| ^2 \nonumber \\&+ 3\fint _{T_r}\left| \nabla (u-u_0)\right| ^2+3\fint _{T_r}\left| \nabla (\lambda _r(u_0)\, t-\lambda _r(u)\, t)\right| ^2, \end{aligned}$$
(3.28)

where \(\lambda _r(u_0) = \fint _{T_r} \partial _t u_0\) is defined as for u. Notice that

$$\begin{aligned} \fint _{T_r}\left| \nabla (\lambda _r(u_0)\, t-\lambda _r(u)\, t)\right| ^2= & {} (\lambda _r(u_0)-\lambda _r(u))^2 =\left( \fint _{T_r}\left( \partial _tu- \partial _tu_0\right) \,\mathrm{d}x\,\mathrm{d}t\right) ^2 \nonumber \\\le & {} \fint _{T_r}\left| \nabla (u-u_0)\right| ^2 \le \frac{C}{r^{d+1}}\fint _{T_{1/2}}\left| \nabla (u-u_0)\right| ^2,\nonumber \\ \end{aligned}$$
(3.29)

simply enlarging the domain of integration. So by (3.28), Lemma 3.4 and Lemma 3.11,

$$\begin{aligned} J_u(r)\le & {} 3 \fint _{T_r}\left| \nabla (u_0-\lambda _r(u_0)\, t)\right| ^2 + \frac{C}{r^{d+1}}\fint _{T_{\frac{1}{2}}}\left| \nabla (u-u^0)\right| ^2\nonumber \\= & {} 3 J_{u_0}(r) + \frac{C}{r^{d+1}}\fint _{T_{\frac{1}{2}}}\left| \nabla (u-u_0)\right| ^2 \le C r^2 J_{u_0}(1/2) + \frac{C}{r^{d+1}}\fint _{T_{\frac{1}{2}}}\left| \nabla (u-u_0)\right| ^2\nonumber \\\le & {} C r^2 J_{u_0}(1/2) + \frac{C}{r^{d+1}}\fint _{T_{\frac{1}{2}}} \left| A-A_0\right| ^2\left| \nabla u_0\right| ^2. \end{aligned}$$
(3.30)

However, the same sort of computation as above yields

$$\begin{aligned} J_{u_0}(1/2)= & {} \fint _{T_{\frac{1}{2}}}\left| \nabla (u_0-\lambda _{1/2}(u_0)t)\right| ^2\\\le & {} 3\fint _{T_{\frac{1}{2}}}\left| \nabla (u-u_0)\right| ^2+3\fint _{T_{\frac{1}{2}}}\left| \nabla (u-\lambda _{1/2}(u)t)\right| ^2+3(\lambda _{1/2}(u)-\lambda _{1/2}(u_0))^2\\\le & {} C\fint _{T_{\frac{1}{2}}}\left| \nabla (u-u_0)\right| ^2+3\fint _{T_{\frac{1}{2}}}\left| \nabla (u-\lambda _{1/2}(u)t)\right| ^2\\= & {} C\fint _{T_{\frac{1}{2}}}\left| \nabla (u-u_0)\right| ^2+3 J_u(1/2). \end{aligned}$$

We plug this into (3.30), use the last part of (3.29), and get

$$\begin{aligned} J_u(r) \le C r^2 J_u(1/2) + \frac{C}{r^{d+1}}\fint _{T_{1/2}}\left| A(x,t)-A_0\right| ^2\left| \nabla u_0(x,t)\right| ^2\,\mathrm{d}x\,\mathrm{d}t. \end{aligned}$$

Now the claim (3.27) follows because

$$\begin{aligned} J_u(1/2)\le \fint _{T_{1/2}}\left| \nabla (u(x,t)-\lambda _1(u)t)\right| ^2\,\mathrm{d}x\,\mathrm{d}t\le CJ_u(1), \end{aligned}$$

where in the first inequality we have used that \(\lambda _{1/2}(u)\,t\) is the best affine approximation in \(T_{1/2}\) (see the discussion in Section 3.1).

Recall that \(u_0\) is decomposed as in (3.26), and thus

$$\begin{aligned}&\fint _{T_{1/2}}\left| A-A_0\right| ^2\left| \nabla u_0\right| ^2\nonumber \\&\quad \le 2\fint _{T_{1/2}}\left| A-A_0\right| ^2\left| \nabla v_0\right| ^2+2\lambda _1(u)^2\fint _{T_{1/2}}\left| A-A_0\right| ^2\left| \nabla t\right| ^2\nonumber \\&\quad \le 2\fint _{T_{1/2}}\left| A-A_0\right| ^2\left| \nabla v_0\right| ^2+2E_u(1)\gamma (0,1)^2. \end{aligned}$$
(3.31)

We now estimate the first term on the right-hand side of (3.31). For \(K>0\), consider the set

$$\begin{aligned} Z_K:=\left\{ X\in T_{1/2}: \left| \nabla v_0(X)\right| ^2\le KE_u(1)\right\} . \end{aligned}$$

The contribution of \(Z_K\) to the integral is

$$\begin{aligned}&\int _{Z_K}\left| A-A_0\right| ^2\left| \nabla v_0\right| ^2\le KE_u(1)\int _{Z_K}\left| A-A_0\right| ^2 \le CK\gamma (0,1)^2E_u(1). \end{aligned}$$

We are left with the complement of \(Z_K\). As in (3.23) in the proof of Lemma 3.19, we get that

$$\begin{aligned} \int _{T_{1/2}\setminus Z_K}\left| A-A_0\right| ^2\left| \nabla v_0\right| ^2\le C(KE_u(1))^{\frac{2-p}{2}}\int _{T_{1/2}}\left| \nabla v_0\right| ^p \end{aligned}$$
(3.32)

where \(p>2\) will be chosen close to 2. To control the term \(\int _{T_{1/2}}\left| \nabla v_0\right| ^p\), we use the following two reverse Hölder type estimates: for some \(p=p(d,\mu _0)>2\) sufficiently close to 2,

$$\begin{aligned} \left( \int _{T_{1/2}}\left| \nabla v_0\right| ^p\right) ^{1/p}&\lesssim \left( \int _{T_{1/2}}\left| \nabla v_0\right| ^2\right) ^{1/2} +\left( \int _{T_{1/2}}\left| \nabla v\right| ^p\right) ^{1/p}, \end{aligned}$$
(3.33)
$$\begin{aligned} \left( \int _{T_{1/2}}\left| \nabla v\right| ^p\right) ^{1/p}&\lesssim \left( \int _{T_1}\left| \nabla v\right| ^2\right) ^{1/2}+\left| \lambda _1(u)\right| \left( \fint _{T_1}\left| A-A_0\right| ^p\right) ^{1/p}, \end{aligned}$$
(3.34)

where the implicit constants depend on d, \(\mu _0\) and p. We postpone the proof of these two inequalities to the end of the proof of this lemma.

Now by (3.33) and (3.34), we obtain

$$\begin{aligned} \int _{T_{1/2}}\left| \nabla v_0\right| ^p\lesssim E_{v_0}(1/2)^{p/2}+E_{v}(1)^{p/2}+\left| \lambda _1(u)\right| ^p\fint _{T_1}\left| A-A_0\right| ^p. \end{aligned}$$

Since \(v-v_0\in W_0^{1,2}(T_{1/2})\) and \(v_0\) is \(L_0\)-harmonic, we have

$$\begin{aligned} E_{v_0}(1/2)\le C_{\mu _0}E_v(1/2)\le CE_v(1)=CJ_u(1), \end{aligned}$$

where the first inequality comes from Lemma 3.13. Notice also that

$$\begin{aligned} \fint _{T_1}\left| A-A_0\right| ^p\le C_{\mu _0,p}\fint _{T_1}\left| A-A_0\right| ^2=C\gamma (0,1)^2. \end{aligned}$$

Thus our estimate on \(\int _{T_{1/2}}\left| \nabla v_0\right| ^p\) can be simplified as

$$\begin{aligned} \int _{T_{1/2}}\left| \nabla v_0\right| ^p\lesssim J_{u}(1)^{p/2}+E_u(1)^{p/2}\gamma (0,1)^2. \end{aligned}$$

Plugging this into (3.32), we get

$$\begin{aligned} \int _{T_{1/2}\setminus Z_K}\left| A-A_0\right| ^2\left| \nabla v_0\right| ^2\le & {} CK^{\frac{2-p}{2}}E_u(1)^{\frac{2-p}{2}}J_u(1)^{p/2}+CK^{\frac{2-p}{2}}\gamma (0,1)^2E_u(1)\\\le & {} CK^{\frac{2-p}{2}}J_u(1)+CK^{\frac{2-p}{2}}\gamma (0,1)^2E_u(1), \end{aligned}$$

where in the last inequality we have used \(E_u(1)\ge J_u(1)\), and thus \(E_u(1)^{\frac{2-p}{2}}\le J_u(1)^{\frac{2-p}{2}}\). Combining this with the contribution on \(Z_K\), we get

$$\begin{aligned} \int _{T_{1/2}}\left| A-A_0\right| ^2\left| \nabla v_0\right| ^2 \le CK^{\frac{2-p}{2}}J_u(1)+C\left( K+K^{\frac{2-p}{2}}\right) \gamma (0,1)^2E_u(1). \end{aligned}$$

From this and (3.27), the desired estimate (3.25) follows. \(\square \)

We now prove the two Hölder type estimates. Let us first prove (3.34).

Proof of (3.34)

Set \(R_0=10^{-2}n^{-1/2}\) as before. For any \(X_0=(x_0,t_0)\in T_{1/2}\), any \(0<R\le R_0\), choose \(\eta \in C_0^1(Q_{R}(X_0))\), with \(\eta \equiv 1\) in \(Q_{2R/3}(X_0)\), \(\left| \nabla \eta \right| \lesssim 1/R\). Here, \(Q_R(X_0)\) is a cube centered at \(X_0\) with side length R, and we shall write \(Q_R\) for \(Q_R(X_0)\) when this does not cause confusion. Using \(Lu=0\) in \(T_1\), \(v(x,t)=u(x,t)-\lambda _1(u)t\), and \(L_0 t=0\), we have for any \(w\in W_0^{1,2}(T_1)\),

$$\begin{aligned} 0= & {} \int _{T_1}A\nabla u\cdot \nabla w \,\mathrm{d}x\,\mathrm{d}t=\int _{T_1}A\nabla v\cdot \nabla w\,\mathrm{d}x\,\mathrm{d}t + \int _{T_1}A\nabla (\lambda t)\cdot \nabla w \,\mathrm{d}x\,\mathrm{d}t\nonumber \\= & {} \int _{T_1}A\nabla v\cdot \nabla w + \int _{T_1}(A-A_0)\nabla (\lambda t)\cdot \nabla w, \end{aligned}$$
(3.35)

where \(\lambda =\lambda _1(u)\).

Now we choose \(w(X)=v(X)\eta ^2(X)\) when \(t_0\le \frac{R}{2}\), and \(w=\Big (v-\fint _{Q_R}v(Y)\,\mathrm{d}Y\Big )\eta ^2\) when \(t_0>\frac{R}{2}\). Notice that \(v(x,0)=0\), and thus \(w\in W_0^{1,2}(T_1)\) (because \(Q_R \subset B_1\)) as required. We plug w into (3.35), compute the derivatives, estimate some terms brutally, and finally use Cauchy–Schwarz, and get the following estimates.

Case 1: \(t_0\le \frac{R}{2}\). Here we obtain

$$\begin{aligned}&\frac{1}{\mu _0}\int _{T_1}\left| \nabla v\right| ^2\eta ^2\,\mathrm{d}X\\&\quad \le \frac{1}{2\mu _0}\int _{T_1}\left| \nabla v\right| ^2\eta ^2\,\mathrm{d}X +C_{\mu _0}\int _{T_1}v^2\left| \nabla \eta \right| ^2\,\mathrm{d}X + C_{\mu _0}\left| \lambda \right| ^2\int _{T_1}\left| A-A_0\right| ^2\eta ^2\,\mathrm{d}X. \end{aligned}$$

Extending v by zero below \(t=0\), this yields

$$\begin{aligned} \int _{Q_{2R/3}}\left| \nabla v\right| ^2\,\mathrm{d}X \le \frac{C_{\mu _0}}{R^2}\int _{Q_R}v^2\,\mathrm{d}X+ C_{\mu _0}\left| \lambda \right| ^2\int _{Q_R}\left| A-A_0\right| ^2\,\mathrm{d}X. \end{aligned}$$

We apply the Poincaré-Sobolev inequality to control \(\int _{Q_R}v^2\,\mathrm{d}X\) and deduce from the above that

$$\begin{aligned} \fint _{Q_{2R/3}}\left| \nabla v\right| ^2\,\mathrm{d}X \le C\left( \fint _{Q_R}\left| \nabla v\right| ^{\frac{2n}{n+2}}\,\mathrm{d}X\right) ^{\frac{n+2}{n}}+ C\left| \lambda \right| ^2\fint _{Q_R}\left| A-A_0\right| ^2\,\mathrm{d}X.\nonumber \\ \end{aligned}$$
(3.36)

Case 2: \(t_0> \frac{R}{2}\). The same computation as in Case 1 gives

$$\begin{aligned} \int _{Q_{2R/3}}\left| \nabla v\right| ^2\,\mathrm{d}X \le \frac{C}{R^2}\int _{Q_R}\Big |v(X)-\fint _{Q_R}v(Y)\,\mathrm{d}Y\Big |^2\,\mathrm{d}X+ C\left| \lambda \right| ^2\int _{Q_R}\left| A-A_0\right| ^2\,\mathrm{d}X. \end{aligned}$$

Then by the Poincaŕe-Sobolev inequality, (3.36) holds again in this case.

Now we apply [9] V. Proposition 1.1 to obtain

$$\begin{aligned} \fint _{Q_{R_0/2}}\left| \nabla v\right| ^p\,\mathrm{d}X \le C\left( \fint _{Q_{R_0}}\left| \nabla v\right| ^2\,\mathrm{d}X\right) ^{\frac{p}{2}} + C\left| \lambda \right| ^p\fint _{Q_{R_0}}\left| A-A_0\right| ^p\,\mathrm{d}X \end{aligned}$$

for some \(p=p(d,\mu _0)>2\).

The desired estimate (3.34) follows as \(T_{1/2}\) can be covered by finitely many \(Q_{R_0/2}\). \(\square \)

Now we turn to (3.33).

Proof of (3.33)

We will use \(L^p\) boundary estimates for solutions. Recall that \(L_0 v_0=0\) in \(T_{1/2}\), with \(v_0-v\in W_0^{1,2}(T_{1/2})\). Set \(R_0=10^{-2}n^{-1/2}\). Then by the boundary estimates in [9] p.154, we have for any \(X_0\in T_{1/2}\),

$$\begin{aligned} \fint _{Q_{R_0/2}(X_0)\cap T_{1/2}}\left| \nabla v_0\right| ^p\lesssim & {} \left( \fint _{Q_{R_0}(X_0)\cap T_{1/2}}\left| \nabla v_0\right| ^2\right) ^{p/2}+\fint _{Q_{R_0}(X_0)\cap T_{1/2}}\left| \nabla v\right| ^p\\\lesssim & {} \left( \fint _{T_{1/2}}\left| \nabla v_0\right| ^2\right) ^{p/2}+\fint _{T_{1/2}}\left| \nabla v\right| ^p \end{aligned}$$

for some \(p>2\). Since \(T_{1/2}\) can be covered by finitely many cubes \(Q_{R_0/2}(X_0)\), we obtain (3.33). \(\square \)

We now prove an analogue of Lemma 3.9 for positive solutions to \(Lu=0\).

Lemma 3.37

Let u be a positive solution of \(Lu=-{{\,\mathrm{div}\,}}(A\nabla )u=0\) in \(T_5\), with \(u=0\) on \(\varDelta _5\). Then for any \(\delta >0\), \(0<r<1/2\),

$$\begin{aligned} E_u(r) \ge \left( \frac{1-C' r^2}{C} -\frac{C''\left( \delta +C_\delta \gamma (0,1)^2\right) }{r^{d+1}}\right) E_u(1) \end{aligned}$$
(3.38)

where C, \(C'\), \(C''\) are positive constants depending only on d and \(\mu _0\).

Proof

As before, we will only find this useful when the parenthesis is under control. Let \(A_0\) and \(u^0\) be as in Lemma 3.19. By (3.20),

$$\begin{aligned} \fint _{T_r}\left| \nabla u\right| ^2\ge & {} \frac{1}{2}\fint _{T_r}\left| \nabla u^0\right| ^2-\fint _{T_r}\left| \nabla (u-u^0)\right| ^2 \nonumber \\\ge & {} \frac{1}{2}\fint _{T_r}\left| \nabla u^0\right| ^2 - \frac{1}{r^{d+1}}\fint _{T_1}\left| \nabla (u-u^0)\right| ^2 \nonumber \\\ge & {} \frac{1}{2}\fint _{T_r}\left| \nabla u^0\right| ^2 - \frac{C\left( \delta +C_\delta \gamma (0,1)^2\right) }{r^{d+1}}\fint _{T_1}\left| \nabla u\right| ^2. \end{aligned}$$
(3.39)

Divide both sides of (3.39) by \(\fint _{T_1}\left| \nabla u(X)\right| ^2\), and then observe that

$$\begin{aligned} \fint _{T_1}\left| \nabla u^0(X)\right| ^2\approx \fint _{T_1}\left| \nabla u(X)\right| ^2 \end{aligned}$$

by Lemma 3.13; this yields

$$\begin{aligned} \frac{\fint _{T_r}\left| \nabla u\right| ^2}{\fint _{T_1}\left| \nabla u\right| ^2}\ge & {} \frac{1}{2}\frac{\fint _{T_r}\left| \nabla u^0\right| ^2}{\fint _{T_1}\left| \nabla u\right| ^2} -\frac{C\left( \delta +C_\delta \gamma (0,1)^2\right) }{r^{d+1}}\\\ge & {} C^{-1}\frac{\fint _{T_r}\left| \nabla u^0\right| ^2}{\fint _{T_1}\left| \nabla u^0\right| ^2} -\frac{C\left( \delta +C_\delta \gamma (0,1)^2\right) }{r^{d+1}}. \end{aligned}$$

Since \(u^0>0\) in \(T_1\) (by the maximum principle), we can apply Lemma 3.9 to \(u^0\) and obtain the desired estimate. \(\square \)

We are finally ready to prove the announced decay estimate for the quantity

$$\begin{aligned} \beta _u(x,r) = \frac{J_u(x,r)}{E_u(x,r)} \end{aligned}$$
(3.40)

(the proportion of non-affine energy) defined in (1.12). We just need to organize ourselves with the constants.

We intend to apply the estimates above, with a single value of \(r = \tau _0\) which will be chosen small enough, depending on d and \(\mu _0\), and then we will require that

$$\begin{aligned} \gamma (0,1) \le \varepsilon _0, \end{aligned}$$
(3.41)

for some \(\varepsilon _0 > 0\) that we shall choose momentarily, depending on \(r = \tau _0\), d, and \(\mu _0\).

Our first requirement for \(r = \tau _0\) is that \(C' r^2 < \frac{1}{2}\) in (3.38) (there will be another one of this type soon), and we choose \(\varepsilon _0\) and \(\delta \) so small (depending on \(\tau _0\)) that if (3.41) holds, then

$$\begin{aligned} \frac{C''\left( \delta +C_\delta \gamma (0,1)^2\right) }{r^{d+1}} < \frac{1}{4C} \end{aligned}$$

in (3.38). This way, (3.38) implies that

$$\begin{aligned} E_u(r) \ge \frac{1}{4C} E_u(1). \end{aligned}$$
(3.42)

Let u be as in Lemma 3.37. We divide both sides of (3.25) by \(E_u(r)\) and get that

$$\begin{aligned} \beta _u(0,r) \le C \left( r^2+K^{\frac{2-p}{2}}r^{-d-1}\right) \frac{J_u(1)}{E_u(r)} +\frac{C_K}{r^{d+1}}\gamma (0,1)^2\frac{E_u(1)}{E_u(r)} \end{aligned}$$
(3.43)

Then we choose K to satisfy \(K^{\frac{2-p}{2}}=r^{d+3}=\tau _0^{d+3}\), assume that (3.41) holds, apply (3.42), and deduce from (3.43) that (maybe with a larger constant C)

$$\begin{aligned} \beta _u(0,\tau _0) \le C \tau _0^2\beta _u(1) + C_{\tau _0}\gamma (0,1)^2. \end{aligned}$$
(3.44)

Finally we choose \(\tau _0\) so small that (in addition to our earlier constraint) \(C\tau _0^2 < \frac{1}{2}\) in (3.44), and finally choose \(\varepsilon _0\) as above.

We recapitulate what we obtained so far in the next corollary. Of course, by translation and dilation invariance, what was done with the unit box \(T_1\) can also be done with any other T(xR), \((x,R) \in {\mathbb {R}}^{d+1}_+\). We use the opportunity to state the general case, which of course can easily be deduced from the case of \(T_1\) by homogeneity (or we could copy the proof).

Corollary 3.45

We can find constants \(\tau _0 \in (0,10^{-1})\) and \(C > 0\) which depend only on d and \(\mu _0\), such that if u is a positive solution of \(Lu=-{{\,\mathrm{div}\,}}(A\nabla )u=0\) in T(x, 5R), with \(u=0\) on \(\varDelta (x,5R)\), then

$$\begin{aligned} \beta _u(x,\tau _0 R) \le \frac{1}{2} \beta _u(x,R) + C \gamma (x,R)^2. \end{aligned}$$
(3.46)

See (1.12) and (3.15) for the definitions of \(\beta _u(x,\tau _0 R)\) and \(\gamma (x,R)\).

Proof

The discussion above gives the result under the additional condition that \(\gamma (x,R) \le \varepsilon _0\). But we now have chosen \(\tau _0\) and \(\varepsilon _0\), and if \(\gamma (x,R) > \varepsilon _0\), (3.46) holds trivially (maybe with a larger constant), because \(\beta _u(x,\tau _0 R) \le 1\) by (3.3). \(\square \)

Remark 3.47

As we remarked before, the complication of the decay estimate for \(J_u(r)\) comes mainly from the lack of a small control of \(\left\| A-A_0\right\| _{L^\infty }\). If we knew \(\gamma _\infty (x,R) \le \varepsilon _1\), where

$$\begin{aligned} \gamma _\infty (x,r)=\inf _{A_0\in {\mathfrak {A}}_0(\mu _0)}\sup _{T(x,r)}\left| A-A_0\right| , \end{aligned}$$

then we could simplify the proof of Corollary 3.45 significantly.

To see this, we start with an estimate similar to (3.27)

$$\begin{aligned} J_u(r) \le C r^2 J_u(1) + \frac{C}{r^{d+1}}\fint _{T_{1/2}}\left| A(x,t)-A_0\right| ^2\left| \nabla u(x,t)\right| ^2\,\mathrm{d}x\,\mathrm{d}t, \end{aligned}$$
(3.48)

which can be obtained as (3.27). Our estimate for \(\fint _{T_1}\left| A-A_0\right| ^2\left| \nabla u\right| ^2\) now becomes rather simple. We still choose \(A_0\) as to minimize in the definition of \(\gamma (0,1)\), but observe that by Chebyshev, we can find \((x,t) \in T_1\) such that

$$\begin{aligned} |A(x,t) - A_0| \le C \gamma (0,1) \le C\gamma _\infty (0,1). \end{aligned}$$

Since \(|A(y,s) - A(x,t)| \le 2 \gamma _\infty (0,1)\) for \((y,s) \in T_1\), we see that \(|A-A_0| \le C \gamma _\infty (0,1)\le C\varepsilon _1\) on \(T_1\). Then

$$\begin{aligned} \fint _{T_1}\left| A-A_0\right| ^2\left| \nabla u\right| ^2&\le 2\fint _{T_1}\left| A-A_0\right| ^2\left| \nabla ( u-\lambda _1(u)t)\right| ^2+2\lambda _1(u)^2\fint _{T_1}\left| A-A_0\right| ^2\\&\le 2\fint _{T_1}\left| A-A_0\right| ^2\left| \nabla ( u-\lambda _1(u)t)\right| ^2 +2E_u(1)\fint _{T_1}\left| A-A_0\right| ^2 \\&\le 2\varepsilon _1 J_u(1) + 2 \gamma (0,1)^2 E_u(1) \end{aligned}$$

and by (3.48),

$$\begin{aligned} J_u(r) \le C \left( r^2 + \frac{C \varepsilon _1}{r^{d+1}}\right) J_u(1) + \frac{C\gamma (0,1)^2}{r^{d+1}} E_u(1). \end{aligned}$$

This is our analogue of (3.25); the rest of the proof is the same.

4 Carleson Measure Estimates

In this section we complete the proof of our two theorems. We already have our main decay estimate (3.46), which says that \(\beta _u(x,r)\) tends to get smaller and smaller, unless \(\gamma (x,r)^2\) is large. This is a way of saying that \(\gamma ^2\) dominates \(\beta _u\), and it is not surprising that a Carleson measure estimate on the first function implies a similar estimate on the second one. The fact that \(\beta _u\) comes from a solution u will not play any role in this argument (see the second part of this section).

4.1 Proof of Lemma 3.16

Before we deal with decay, let us prove Lemma 3.16, which is another fact about Carleson measures where u plays no role.

Let A be as in the statement. We want to show that \(\gamma (x,r)^2 \frac{\mathrm{d}x\,\mathrm{d}r}{r}\) is Carleson measure on \({\mathbb {R}}^{d+1}_+\), and our first move is to estimate \(\gamma (x,r)\) in terms of the \(\alpha _2(y,s)\).

For each pair (xr), we choose a constant matrix \(A_{x,r}\) such that

$$\begin{aligned} \fint _{W(x,r)} |A-A_{x,r}|^2 = \alpha _2(x,r)^2 . \end{aligned}$$
(4.1)

The interested reader may check that we can choose the \(A_{x,r}\) so that they depend on (xr) in a measurable way, and in fact are constant on pieces of a measurable partition of \({\mathbb {R}}^{d+1}_+\), maybe at the price of replacing \(\alpha _2(x,r)^2\) in (4.1) with \(2\alpha _2(x,r)^2\), and making the W(xr) a little larger first to allow extra room to move x and r.

Let \(\varDelta _0 = \varDelta (x_0,r_0)\) be given; we want to estimate \(\gamma (x_0,r_0)\), and we try the constant matrix \(A_0 = A_{x_0,r_0}\). Thus

$$\begin{aligned} \gamma (x_0,r_0)^2 \le \fint _{T_0} |A-A_0|^2 \le C \fint _{Q_0} |A-A_0|^2, \end{aligned}$$
(4.2)

where we set \(T_0 = T(x_0,r_0)\) and \(Q_0 = \varDelta (x_0,r_0) \times (0,r_0]\). We will cut this integral into horizontal slices, using the radii \(r_m = \rho ^m r_0\), \(m \ge 0\). Let us choose \(\rho = \frac{4}{5}\), rather close to 1, to simplify the communication between slices.

We first estimate how fast the \(A_{x,r}\) change. We claim that

$$\begin{aligned} |A_{x,r} - A_{y,s}| \le C \alpha _2(x,r) + C \alpha _2(y,s) \quad \text { when }|x-y| \le \frac{3}{2} r\text { and }\frac{2}{3} r \le s \le r. \end{aligned}$$
(4.3)

Indeed, with these constraints there is a box R in \(W(x,r) \cap W(y,s)\) such that \(|R| \ge C^{-1} r^{d+1}\), and then

$$\begin{aligned} |A_{x,r} - A_{y,s}|&= \fint _R |A_{x,r} - A_{y,s}| \le \fint _R |A_{x,r} - A| + \fint _R |A - A_{y,s}| \\&\le C \fint _{W(x,r)} |A_{x,r} - A| + C \fint _{W(y,s)} |A - A_{y,s}| \le C \alpha _2(x,r) + C \alpha _2(y,s) \end{aligned}$$

by the triangle inequality, the fact that \(|R| \simeq |W(x,r)| \simeq |W(y,s)|\), and Hölder’s inequality. We can iterate this and get that for \(y\in {\mathbb {R}}^d\) and \(m \ge 0\),

$$\begin{aligned} |A_{y,r_m} - A_{y,r_0}| \le C \sum _{j=0}^m \alpha _2(y,r_j). \end{aligned}$$
(4.4)

Now consider \(y\in \varDelta '_0 = \varDelta (x_0,3r_0/2)\) and notice that by (4.3), \(|A_{y,r_0}-A_0| \le C \alpha _2(y,r_0) + C \alpha _2(x_0,r_0)\), so (4.4) also yields

$$\begin{aligned} |A_{y,r_m} - A_0| \le C \alpha _2(x_0,r_0) + C \sum _{j=0}^m \alpha _2(y,r_j). \end{aligned}$$
(4.5)

Set \(H_m = \varDelta _0 \times (r_{m+1},r_m]\) for \( m \ge 0\); thus \(Q_0\) is the disjoint union of the \(H_m\). We claim that

$$\begin{aligned} \int _{H_m} |A-A_0|^2 \le C r_m \alpha _2(x_0,r_0)^2 |\varDelta _0| + C r_m \int _{\varDelta '_0} \Big \{\sum _{j=0}^m \alpha _2(y,r_j) \Big \}^2 \,\mathrm{d}y. \end{aligned}$$
(4.6)

We tried to discretize our estimates as late as possible, but this has to happen at some point. Cover \(\varDelta _0\) with disjoint cubes \(R_i\) of sidelength \((10\sqrt{d})^{-1}r_m\) that meet \(\varDelta _0\), and for each one choose a point \(x_i \in R_i\) such that \(\alpha _2(x_i,r_m)\) is minimal. Then set \(A^i = A_{x_i, r_m}\) and \(W_i = R_i \times (r_{m+1},r_m]\); notice that the \(W_i\) cover \(H_m\).

The contribution of \(R_i\) to the integral in (4.6) is

$$\begin{aligned}&\int _{W_i} |A(y,t)-A_0|^2\,\mathrm{d}y\,\mathrm{d}t \nonumber \\&\quad \le C \int _{W_i} |A(y,t)-A^i|^2 + |A^i-A_{y,r_m}|^2 + |A_{y,r_m}-A_0|^2 \,\mathrm{d}y\,\mathrm{d}t. \end{aligned}$$
(4.7)

For the first term,

$$\begin{aligned} \int _{W_i}|A(y,t)-A^i|^2 \,\mathrm{d}y\,\mathrm{d}t \le C |W(x_i,r_m)| \alpha _2(x_i,r_m)^2 \end{aligned}$$
(4.8)

because \(W_i \subset W(x_i,r_m)\) and by definition of \(\alpha _2\). Next

$$\begin{aligned} \int _{W_i}|A^i-A_{y,r_m}|^2 \,\mathrm{d}y\,\mathrm{d}t \le C \int _{W_i} (\alpha _2(x_i,r_m) + \alpha _2(y,r_m))^2 \,\mathrm{d}y\,\mathrm{d}t \le C r_m \int _{R_i} \alpha _2(y,r_m)^2 \,\mathrm{d}y \end{aligned}$$

by (4.3) and because \(\alpha _2(x_i,r_m)\), by the choice of \(x_i\), is smaller. This integral is at least as large as the previous one, again because \(\alpha _2(x_i,r_m)\) is smaller. When we sum all these terms over i, we get a contribution bounded by \(C r_m \int _{\varDelta '_0} \alpha _2(y,r_m)^2\), which is dominated by the right hand side of (4.6) (just keep the last term in the sum). We are left with the third integral in (4.7). But \(|A_{y,r_m}-A_0|\) is majorized in (4.5), and the corresponding contribution, when we sum over i, is also dominated by the right-hand side of (4.6). Our claim (4.6) follows.

Because of (4.6) and the fact that the \(H_m\) cover \(Q_0\), we see that (4.2) yields

$$\begin{aligned} \gamma (x_0,r_0) \le C \fint _{Q_0} |A-A_0|^2 \le C |Q_0|^{-1} \sum _m \int _{H_m} |A-A_0|^2 \le S_1 + S_2,\nonumber \\ \end{aligned}$$
(4.9)

where

$$\begin{aligned} S_1 = |Q_0|^{-1} \sum _m r_m \alpha _2(x_0,r_0)^2 |\varDelta _0| \le C \alpha _2(x_0,r_0)^2, \end{aligned}$$
(4.10)

and

$$\begin{aligned} S_2 = |Q_0|^{-1} \sum _m r_m \int _{\varDelta '_0} \Big \{\sum _{j=0}^m \alpha _2(y,r_j) \Big \}^2\,\mathrm{d}y \le C \fint _{\varDelta '_0} \sum _m \rho ^m \Big \{\sum _{j=0}^m \alpha _2(y,r_j) \Big \}^2\,\mathrm{d}y\nonumber \\ \end{aligned}$$
(4.11)

because \(r_m = \rho ^m r_0\) and \(|Q_0| \simeq r_0 |\varDelta '_0|\). We are about to apply Hardy’s inequality, which says that for \(1< q < +\infty \),

$$\begin{aligned} \sum _{m=0}^\infty \Big \{\frac{1}{m+1}\sum _{j=0}^m a_j \Big \}^q \le C_q \sum _{m} a_m^q \end{aligned}$$
(4.12)

for any infinite sequence \(\{ a_m \}\) of nonnegative numbers. Here we take \(q=2\) and \(a_j = a_j(y) = \rho ^{\frac{j}{4}}\alpha _2(y,r_j)\). Then

$$\begin{aligned} \begin{aligned} \sum _m \rho ^m \Big \{\sum _{j=0}^m \alpha _2(y,r_j) \Big \}^2&\le \sum _m \rho ^{m/2} \Big \{\sum _{j=0}^m \rho ^{m/4} \alpha _2(y,r_j) \Big \}^2 \\&\le \sum _m \rho ^{m/2} \Big \{\sum _{j=0}^m \rho ^{j/4} \alpha _2(y,r_j) \Big \}^2 \\&= \sum _m (m+1)^2 \rho ^{m/2} \Big \{\frac{1}{m+1}\sum _{j=0}^m a_j \Big \}^2 \le C \sum _m a_m^2 \end{aligned} \end{aligned}$$
(4.13)

so that

$$\begin{aligned} S_2 \le C \fint _{\varDelta '_0}\sum _m a_m^2(y) \,\mathrm{d}y=C\sum _{m} \rho ^{\frac{m}{2}} \fint _{\varDelta '_0} \alpha _2(y,r_m)^2 \,\mathrm{d}y. \end{aligned}$$
(4.14)

We return to (4.9), use (4.10), and see that

$$\begin{aligned} \gamma (x_0,r_0)^2 \le C \alpha _2(x_0,r_0)^2 + C \sum _{m} \rho ^{\frac{m}{2}} \fint _{\varDelta '_0} \alpha _2(y,\rho ^m r_0)^2 \,\mathrm{d}y \end{aligned}$$
(4.15)

We kept the squares because our Carleson measure condition is in terms of squares. Recall that by assumption, \(\alpha _2^2\) satisfies a Carleson measure condition, with norm \({\mathfrak {N}}_2(A)\). At this stage, deducing that the same thing holds for \(\gamma ^2\) will only be a matter of applying the triangle inequality. We write this because of the varying average in the second term of (4.15), but not much will happen. Pick a surface ball \(\varDelta = \varDelta (x_1,r_1)\). It is enough to bound

$$\begin{aligned} I = \int _\varDelta \int _0^{r_1} \gamma (x,r)^2 \frac{\mathrm{d}x\,\mathrm{d}r}{r} \le C \int _\varDelta \int _0^{r_1} \alpha _2(x,r)^2 \frac{\mathrm{d}x\,\mathrm{d}r}{r} + C \sum _m \rho ^{\frac{m}{2}} I_m, \end{aligned}$$
(4.16)

where

$$\begin{aligned} I_m = \int _{x\in \varDelta } \int _{r=0}^{r_1} \fint _{y \in \varDelta (x,3r/2)} \alpha _2(y,\rho ^m r)^2 \,\mathrm{d}y \frac{\mathrm{d}x\,\mathrm{d}r}{r}. \end{aligned}$$
(4.17)

Since

$$\begin{aligned} \int _\varDelta \int _0^{r_1} \alpha _2(x,r)^2 \frac{\mathrm{d}x\,\mathrm{d}r}{r} \le C {\mathfrak {N}}_2(A) r_1^d \end{aligned}$$
(4.18)

by definition, we may concentrate on \(I_m\). Of course we apply Fubini. First notice that \(y \in \varDelta ' = \varDelta (x_1,5r_1/2)\) when \(y \in \varDelta (x,3r/2)\) and \(x\in \varDelta \); since \(x\in \varDelta (y,3r/2)\), the integral in the dummy variable x cancels with the normalization in the average, and we get that

$$\begin{aligned} I_m = \int _{y\in \varDelta '} \int _{r=0}^{r_1} \alpha _2(y,\rho ^m r)^2 \frac{\mathrm{d}y\,\mathrm{d}r}{r} = \int _{y\in \varDelta '} \int _{t=0}^{\rho ^m r_1} \alpha _2(y,t)^2 \frac{\mathrm{d}y\,\mathrm{d}t}{t}, \end{aligned}$$
(4.19)

where the second identity is a change of variable (and we used the invariance of \(\frac{\mathrm{d}t}{t}\) under dilations). The definition also yields \(I_m \le C {\mathfrak {N}}_q(A) r_1^d\), so we can sum the series, and we get that \(I \le C {\mathfrak {N}}_q(A) r_1^d\). This completes our proof of (3.17).

We still need to check the second statement (3.18) (the pointwise estimate), and this will follow from the fact that \(\gamma \) is not expected to vary too much. Indeed, we claim that

$$\begin{aligned} \gamma (x,r) \le C \gamma (y,s) \quad \text {whenever }|x-y| \le r\text { and }2r \le s \le 3r. \end{aligned}$$
(4.20)

This is simply because \(T(x,r) \subset T(y,s)\), so if A is well approximated by a constant coefficient matrix \(A_0\) in T(ys), this is also true in T(xr). Now we square, average, and get that

$$\begin{aligned} \begin{aligned} \gamma ^2(x,r)&\le C \fint _{y\in \varDelta (x,r)}\fint _{s\in (2r,3r)} \gamma ^2(y,s) \,\mathrm{d}y\,\mathrm{d}s \\&\le C r^{-d} \int _{y\in \varDelta (x,r)}\int _{s\in (2r,3r)} \gamma ^2(y,s) \frac{\mathrm{d}y\,\mathrm{d}s}{s} \le C ||\gamma ^2(y,s) \frac{\mathrm{d}y\,\mathrm{d}s}{s}||_{{\mathcal {C}}} \le C {\mathfrak {N}}_2(A). \end{aligned} \end{aligned}$$
(4.21)

This completes our proof of Lemma 3.16. \(\square \)

Remark 4.22

There is also a local version of Lemma 3.16, with the same proof. It says that if \(\alpha _2(x,r)^2 \frac{\mathrm{d}x\,\mathrm{d}r}{r}\) is Carleson measure relative to some surface ball \(3\varDelta _0\) (see Definition 1.3) , then \(\gamma (x,r)^2 \frac{\mathrm{d}x\,\mathrm{d}r}{r}\) is Carleson measure on \(T_{\varDelta _0}\), with norm

$$\begin{aligned} \left\| \gamma (x,r)^2 \frac{\mathrm{d}x\,\mathrm{d}r}{r}\right\| _{{\mathcal {C}}(\varDelta _0)} \le C \left\| \alpha _2(x,r)^2 \frac{\mathrm{d}x\,\mathrm{d}r}{r}\right\| _{{\mathcal {C}}(3\varDelta _0)}. \end{aligned}$$
(4.23)

As usual, C depends only on d. For this the simplest is to observe that since we use nothing more than the estimate (4.15), and for (3.17) we only care about \((x_0,r_0) \in T_{\varDelta _0}\), we may replace \(\alpha _2(y,t)\) with 0 when \((y,t) \notin T_{3\varDelta _0}\). Then the replaced function \(\alpha _2\) satisfies a global square Carleson measure estimate and we can conclude as above.

The fact that

$$\begin{aligned} \gamma (x,r)^2 \le C \left\| \alpha _2(x,r)^2 \frac{\mathrm{d}x\,\mathrm{d}r}{r}\right\| _{{\mathcal {C}}(3\varDelta _0)} \end{aligned}$$
(4.24)

for \((x,r)\in T_{\varDelta _0}\) can be proved as (3.18) above, using the fact that (4.23) also holds for a slightly larger ball \(\frac{11}{10} \varDelta _0\).

4.2 Proof of Theorems 1 and 2

We will just need to prove Theorem 2, which is more general. Let the matrix A be as in the statement of both theorems.

We recently completed our proof of Corollary 3.45, which says that

$$\begin{aligned} \beta _u(x,\tau _0 r) \le \frac{1}{2} \beta _u(x,r) + C \gamma (x,r)^2 \end{aligned}$$
(4.25)

whenever u is a positive solution of \(Lu=-{{\,\mathrm{div}\,}}(A\nabla )u=0\) in T(x, 5r), with \(u=0\) on \(\varDelta (x,5r)\).

In the statement of our theorems, u is assumed to be a positive solution of \(Lu=0\) in \(T(x_0,R)\), with \(u=0\) on \(\varDelta (x_0,R)\), so (4.25) holds as soon as \(\varDelta (x,5r) \subset \varDelta (x_0,R)\). We pick such a pair (xr) and iterate (4.25); this yields

$$\begin{aligned} \beta _u(x,\tau _0^k r) \le 2^{-k} \beta _u(x,r) + C \sum _{j=0}^{k-1} 2^{-j} \gamma (x, \tau _0^{k-j-1}r)^2. \end{aligned}$$
(4.26)

Hence (writing r in place of \(\tau _0^{-k} r\))

$$\begin{aligned} \beta _u(x, r) \le 2^{-k} \beta _u(x,\tau _0^{-k} r) + C \sum _{j=0}^{k-1} 2^{-j} \gamma (x, \tau _0^{-j-1}r)^2 \end{aligned}$$
(4.27)

as soon as \(\varDelta (x,5 \tau _0^{-k} r) \subset \varDelta (x_0,R)\).

We want to prove the Carleson bound (1.14) on \(\beta _u\) in \(\varDelta (x_0,\tau R)\), so we give ourselves a surface ball \(\varDelta = \varDelta (y,r) \subset \varDelta (x_0,\tau R)\). We want to show that

$$\begin{aligned} \int _{T_\varDelta }\beta _u(x,s)\frac{\mathrm{d}x\,\mathrm{d}s}{s} \le C \tau ^a r^d + C {\mathfrak {N}}r^d, \end{aligned}$$
(4.28)

where we set \({\mathfrak {N}}= \left\| \alpha _2(x,r)^2 \frac{\mathrm{d}x\,\mathrm{d}r}{r}\right\| _{{\mathcal {C}}(\varDelta (x_0,R))}\).

Let us first check that

$$\begin{aligned} \beta _u(x,s) \le C \tau ^a + C {\mathfrak {N}}\quad \text { when }x\in \varDelta \text { and }0<s \le r. \end{aligned}$$
(4.29)

When \(\tau \ge 10^{-1}\), this is true just because \((x,s) \in T(x_0,\tau R)\) and (3.3) says that \(\beta _u(x,s) \le 1\). Otherwise, let k be the largest integer such that \(\tau _0^{-k} r < 10^{-1} R\) (notice that \(k \ge 0\)); then \(\varDelta (x,5 \tau _0^{-k} r) \subset \varDelta (x_0,R)\), so (4.27) holds. In addition, all the intermediate radii \(\tau _0^{-j-1}r\) are also smaller than \(10^{-1} R\), so \(\gamma (x, \tau _0^{-j-1}r)^2\le C {\mathfrak {N}}\) by (3.18) or (4.24) in Remark 4.22. Then (4.27) says that \(\beta _u(x,s) \le 2^{-k} + C {\mathfrak {N}}\), and (4.29) follows, with a constant a that depends only on \(\tau _0\) (which itself depends only on d and \(\mu _0\)). This is because our choice of k gives \(\tau _0^{k+1}\le 10r/R\le 10\tau \).

Call I the integral in (4.28), and write \(I = \sum _{k= -1}^\infty I_k\), with

$$\begin{aligned} I_k = \int _{T_\varDelta } \mathbb {1}_{\tau _0^{k+2} r < s \le \tau _0^{k+1} r }(s) \beta _u(x,s)\frac{\mathrm{d}x\,\mathrm{d}s}{s}. \end{aligned}$$
(4.30)

We single out \(I_{-1}\) because we do not have enough room for the argument below when \(\tau \) is large, but anyway we just need to observe that

$$\begin{aligned} I_{-1} \le C (\tau ^a + {\mathfrak {N}}) |\varDelta | \int _{\tau _0 r}^r \frac{\mathrm{d}s}{s} \le C (\tau ^a + {\mathfrak {N}}) r^d \end{aligned}$$
(4.31)

by (4.29), which is enough for (4.28). We are left with \(k \ge 0\) and

$$\begin{aligned} I_k \le \int _{x\in \varDelta } \int _{s = \tau _0^{k+2} r}^{\tau _0^{k+1} r } \beta _u(x,s)\frac{\mathrm{d}x\,\mathrm{d}s}{s}. \end{aligned}$$
(4.32)

Because of our small precaution, we now have that for (xs) in the domain of integration, \(\tau _0^{-k}s \le \tau _0 r \le 10^{-1}r \le 10^{-1} \tau R\) (because we took \(\tau _0 \le 10^{-1}\)), so \(\varDelta (x,5\tau _0^{-k}s) \subset \varDelta (x_0,R)\) and we can apply (4.27). In addition, all the surface balls \(5\varDelta (x, \tau _0^{-j-1}s)\) that arise from (4.27) are contained in \(\varDelta (x_0,R)\), so we will be able to use Remark 4.22 to estimate them as in Lemma 3.16. Thus

$$\begin{aligned} \begin{aligned} I_k&\le \int _{x\in \varDelta } \int _{s = \tau _0^{k+2} r}^{\tau _0^{k+1} r } \big [2^{-k} \beta _u(x,\tau _0^{-k} s) + C \sum _{j=0}^{k-1} 2^{-j} \gamma (x, \tau _0^{-j-1}s)^2\big ] \frac{\mathrm{d}x\,\mathrm{d}s}{s} \\&\le C 2^{-k} (\tau ^a + {\mathfrak {N}}) r^d + C \sum _{j=0}^{k-1} 2^{-j} \int _{\varDelta } \int _{\tau _0^{k+2} r}^{\tau _0^{k+1} r } \gamma (x, \tau _0^{-j-1}s)^2\frac{\mathrm{d}x\,\mathrm{d}s}{s} \\&= C 2^{-k} (\tau ^a + {\mathfrak {N}}) r^d + C \sum _{j=0}^{k-1} 2^{-j} \int _{\varDelta } \int _{\tau _0^{k-j+1} r}^{\tau _0^{k-j} r } \gamma (x, t)^2 \frac{\mathrm{d}x\,\mathrm{d}t}{t}, \end{aligned} \end{aligned}$$
(4.33)

where we set \(t = \tau _0^{-j-1}s\) and use the invariance of \(\frac{\mathrm{d}s}{s}\).

Set \(\ell = k-j\), which runs between 1 and \(+\infty \). And for each value of \(\ell \ge 0\), we have that \(\sum _{k, j ; k-j = \ell } 2^{-j} \le 2\). Hence when we sum over k, we get that

$$\begin{aligned} \sum _{k \ge 0} I_k&\le C \sum _{k\ge 0} 2^{-k} (\tau ^a + {\mathfrak {N}}) r^d + C \sum _{\ell \ge 1} \int _{\varDelta } \int _{\tau _0^{\ell +1} r}^{\tau _0^{\ell } r } \gamma (x, t)^2 \frac{\mathrm{d}x\,\mathrm{d}t}{t} \\&= C (\tau ^a + {\mathfrak {N}}) r^d + C \int _{\varDelta } \int _{0}^{\tau _0 r} \gamma (x, t)^2 \frac{\mathrm{d}x\,\mathrm{d}t}{t} \le C (\tau ^a + {\mathfrak {N}}) r^d, \end{aligned}$$

by Lemma 3.16 or Remark 4.22. This completes our proof of (4.28), and the theorems follow.

5 Proof of Corollary 1.15

Let us first prove a Caccioppoli type result for solutions on Whitney balls. Since it is an interior estimate, it holds on any domain \(\varOmega \subset {\mathbb {R}}^{d+1}\). For \(X\in \varOmega \), denote by \(\delta (X)\) the distance of X to \(\partial \varOmega \).

Lemma 5.1

Let A be a \((d+1)\times (d+1)\) matrix of real-valued functions on \({\mathbb {R}}^{d+1}\) satisfying the ellipticity condition (1.1), and for some \(C_0\in (0,\infty )\),

$$\begin{aligned} \left| \nabla A(X)\right| \delta (X)\le C_0 \qquad \text {for any } X\in \varOmega . \end{aligned}$$
(5.2)

Let \(X_0\in \varOmega \subset {\mathbb {R}}^{d+1}\) be given, and \(r=\delta (X_0)\). Let \(u\in W^{1,2}(B_{r}(X_0))\) be a solution of \(Lu=-{{\,\mathrm{div}\,}}(A\nabla u)=0\) in \(B_{r}(X_0)\). Then for any \(\lambda \in {\mathbb {R}}\),

$$\begin{aligned} \int _{B_{r/4}(X_0)}\left| \nabla ^2u(X)\right| ^2\,\mathrm{d}X\le & {} \frac{C}{r^2}\int _{B_{r/2}(X_0)}\left| \nabla u(X)-\lambda \, {\mathbf {e}}_{d+1}\right| ^2\,\mathrm{d}X\nonumber \\&+C\lambda ^2\int _{B_{r/2}(X_0)}\left| \nabla A(X)\right| ^2\,\mathrm{d}X, \end{aligned}$$
(5.3)

where C depends only on d, \(\mu _0\) and \(C_0\).

Proof

By (5.2), \(\left| \nabla A(X)\right| \le 8C_0/r\) for any \(X\in B_{7r/8}(X_0)\), which means A is Lipschitz in \(B_{7r/8}(X_0)\). So from [10] Theorem 8.8, it follows that \(u\in W^{2,2}(B_{\frac{3}{4}r}(X_0))\). Let \(\varphi \in C_0^\infty (B_{r/2}(X_0))\), with \(\varphi =1\) on \(B_{r/4}(X_0)\), \(\left\| \nabla \varphi \right\| _{L^\infty }\le \frac{C}{r}\). Write “\(\partial \)" to denote a fixed generic derivative. Since \(u\in W^{2,2}(B_{\frac{3}{4}}(X_0))\), \(\partial (u-\lambda t)\varphi ^2\in W_0^{1,2}(B_{r/2}(X_0))\) for any \(\lambda \in {\mathbb {R}}\). Therefore, there exists \(\left\{ v_k\right\} \subset C_0^\infty (B_{r/2}(X_0))\) such that \(v_k\) converges to \(\partial (u-\lambda t)\varphi ^2\) in \(W^{1,2}(B_{r/2}(X_0))\). Set \(I=\int \left| \nabla \partial u(X)\right| ^2\varphi (X)^2 \,\mathrm{d}X\). Observe that for any \(\lambda \in {\mathbb {R}}\),

$$\begin{aligned} I=\int _{{\mathbb {R}}^{d+1}}\left| \nabla \partial ( u(x,t)-\lambda \, t)\right| ^2\varphi (x,t)^2 \,\mathrm{d}x\,\mathrm{d}t. \end{aligned}$$

By ellipticity, we have

$$\begin{aligned} I\le & {} \mu _0\int _{{\mathbb {R}}^{d+1}}A(x,t)\nabla \partial ( u(x,t)-\lambda \, t)\cdot \nabla \partial ( u(x,t)-\lambda \, t)\varphi (x,t)^2 \,\mathrm{d}x\,\mathrm{d}t\\= & {} \mu _0\int _{{\mathbb {R}}^{d+1}}A\nabla \partial ( u-\lambda \, t)\cdot \nabla \left( \partial ( u-\lambda \, t)\varphi ^2\right) \,\mathrm{d}x\,\mathrm{d}t\\&-2\mu _0\int _{{\mathbb {R}}^{d+1}}A\nabla \partial ( u-\lambda \, t)\cdot \nabla \varphi \, \partial ( u-\lambda \, t)\varphi \,\mathrm{d}x\,\mathrm{d}t\\=: & {} \mu _0 I_1-2\mu _0 I_2. \end{aligned}$$

For \(I_2\), we use Cauchy–Schwarz to get

$$\begin{aligned} \left| I_2\right|\le & {} \mu _0 I^{1/2}\left( \int _{{\mathbb {R}}^{d+1}}\left| \partial ( u-\lambda \, t)\right| ^2\left| \nabla \varphi \right| ^2\,\mathrm{d}x\,\mathrm{d}t\right) ^{1/2}\\\le & {} \frac{1}{8}I+\frac{C_{\mu _0}}{r^2}\int _{B_{r/2}(X_0)}\left| \nabla (u-\lambda \,t)\right| ^2\,\mathrm{d}x\,\mathrm{d}t. \end{aligned}$$

For \(I_1\), we use the sequence \(\left\{ v_k\right\} \) and write

$$\begin{aligned} I_1^k:= & {} \int _{{\mathbb {R}}^{d+1}}A\nabla \partial ( u-\lambda \, t)\cdot \nabla v_k \,\mathrm{d}x\,\mathrm{d}t\\= & {} \int _{{\mathbb {R}}^{d+1}}\partial \left( A\nabla ( u-\lambda \, t)\cdot \nabla v_k\right) \,\mathrm{d}x\,\mathrm{d}t-\int _{{\mathbb {R}}^{d+1}}A\nabla ( u-\lambda \, t)\cdot \nabla \partial v_k \,\mathrm{d}x\,\mathrm{d}t \\&-\int _{{\mathbb {R}}^{d+1}}\partial A(x,t) \nabla ( u-\lambda \, t))\cdot \nabla v_k \,\mathrm{d}x\,\mathrm{d}t. \end{aligned}$$

Note that the first term on the right-hand side vanishes because it is a derivative of a \(W^{1,2}({\mathbb {R}}^{d+1})\) compactly supported function. Moreover, since \(Lu=0\) and \(\partial v_k\in C_0^\infty (B_{r/2}(X_0))\) is a valid test function, we have

$$\begin{aligned} I_1^k=\lambda \int _{{\mathbb {R}}^{d+1}}A\nabla t\cdot \nabla \partial v_k \,\mathrm{d}x\,\mathrm{d}t -\int _{{\mathbb {R}}^{d+1}}\partial A(x,t) \nabla ( u-\lambda \, t))\cdot \nabla v_k \,\mathrm{d}x\,\mathrm{d}t. \end{aligned}$$

Let \({\mathbf {a}}_{d+1}\) be the last column vector of A, then we have

$$\begin{aligned} \int _{{\mathbb {R}}^{d+1}}A\nabla t\cdot \nabla \partial v_k \,\mathrm{d}x\,\mathrm{d}t =\int _{{\mathbb {R}}^{d+1}}{\mathbf {a}}_{d+1}\cdot \nabla \partial v_k \,\mathrm{d}x\,\mathrm{d}t=-\int _{{\mathbb {R}}^{d+1}}{\text {div}}{\mathbf {a}}_{d+1}\, \partial v_k \,\mathrm{d}x\,\mathrm{d}t. \end{aligned}$$

Hence,

$$\begin{aligned} \left| I_1\right|= & {} \left| \lim _{k\rightarrow \infty }I_1^k\right| \le \left| \lambda \int _{{\mathbb {R}}^{d+1}}{\text {div}}{\mathbf {a}}_{d+1}\, \partial (\partial u\varphi ^2) \,\mathrm{d}x\,\mathrm{d}t\right| \\&+ \left| \int _{{\mathbb {R}}^{d+1}}\partial A(x,t) \nabla ( u-\lambda \, t))\cdot \nabla (\partial (u-\lambda t)\varphi ^2) \,\mathrm{d}x\,\mathrm{d}t\right| =: I_{11}+I_{12}. \end{aligned}$$

For \(I_{11}\), we use Cauchy–Schwarz, \(\left| {\text {div}}{\mathbf {a}}_{d+1}\right| \le (d+1)\left| \nabla A\right| \), and Young’s inequality to get

$$\begin{aligned} I_{11}\le & {} \left| \lambda \right| \int _{{\mathbb {R}}^{d+1}}\left| {\text {div}}{\mathbf {a}}_{d+1}\right| \partial ^2u\varphi ^2\,\mathrm{d}x\,\mathrm{d}t +2\left| \lambda \right| \int _{{\mathbb {R}}^{d+1}}\left| {\text {div}}{\mathbf {a}}_{d+1}\right| \partial (u-\lambda t)\varphi \partial \varphi \,\mathrm{d}x\,\mathrm{d}t\\\le & {} \left| \lambda \right| \left( \int \left| \partial ^2u\right| ^2\varphi ^2\,\mathrm{d}x\,\mathrm{d}t\right) ^{1/2}\left( \int \left| {\text {div}}{\mathbf {a}}_{d+1}\right| ^2\varphi ^2\,\mathrm{d}x\,\mathrm{d}t\right) ^{1/2}\\&+2\left| \lambda \right| \left( \int \left| \partial (u-\lambda t)\right| ^2\left| \nabla \varphi \right| ^2\,\mathrm{d}x\,\mathrm{d}t\right) ^{1/2}\left( \int \left| {\text {div}}{\mathbf {a}}_{d+1}\right| ^2\varphi ^2\,\mathrm{d}x\,\mathrm{d}t\right) ^{1/2}\\\le & {} \frac{1}{8}I+\frac{C}{r^2}\int _{B_{r/2}(X_0)}\left| \partial (u-\lambda t)\right| ^2\,\mathrm{d}x\,\mathrm{d}t+C\lambda ^2\int _{B_{r/2}(X_0)}\left| \nabla A\right| ^2\,\mathrm{d}x\,\mathrm{d}t. \end{aligned}$$

For \(I_{12}\), we have

$$\begin{aligned} I_{12}\le & {} \int _{{\mathbb {R}}^{d+1}} |\partial A(x,t)| \, |\nabla (u-\lambda t)| \, |\nabla (\partial u)\varphi ^2| \,\mathrm{d}x\,\mathrm{d}t\\&+2\int _{{\mathbb {R}}^{d+1}}\left| \partial A(x,t)\nabla (u-\lambda t)\cdot \nabla \varphi \partial (u-\lambda t)\varphi \right| \,\mathrm{d}x\,\mathrm{d}t\\\le & {} I^{1/2}\left( \int _{B_{r/2}(X_0)}\left| \partial A\right| ^2\left| \nabla (u-\lambda t)\right| ^2\,\mathrm{d}x\,\mathrm{d}t\right) ^{1/2}\\&+\frac{C}{r}\int _{B_{r/2}(X_0)}\left| \partial A\right| \left| \nabla (u-\lambda t)\right| ^2\,\mathrm{d}x\,\mathrm{d}t. \end{aligned}$$

By (5.2), and because for any \(X\in B_{r/2}(X_0)\), \(\delta (X)\ge r/2\), one sees

$$\begin{aligned} I_{12}\le \frac{1}{8}I+ \frac{C(d, C_0)}{r^2}\int _{B_{r/2}(X_0)}\left| \nabla (u-\lambda t)\right| ^2\,\mathrm{d}x\,\mathrm{d}t. \end{aligned}$$

Collecting all the estimates, we can hide I to the left-hand side and obtain the desired estimate. \(\square \)

Let us point out that the assumption (5.2) on A in Lemma 5.1 is harmless, as it is a consequence of the classical DKP condition (1.16). We are now ready to prove Corollary 1.15.

Proof of Corollary 1.15

Observe that (1.16) implies \(\left| \nabla A(x,t)\right| t\le CC_0\) for any \((x,t)\in {\mathbb {R}}^{d+1}_+\) for some C depending only on the dimension.

Fix \(\varDelta \subset \varDelta (x_0,R/2)\). Consider any \((x,3r)\in T_\varDelta \), and write \(X=(x,3r/2)\). Let \(\lambda _{x,3r}=\lambda _{x,3r}(u)\) be defined as in (1.9). By Lemma 5.1,

$$\begin{aligned} \fint _{B_{r/4}(X)}\left| \nabla ^2u(y,t)\right| ^2\,\mathrm{d}y\,\mathrm{d}t\le & {} \frac{C}{r^2}\fint _{B_{r/2}(X)}\left| \nabla ( u(y,t)-\lambda _{x,3r}t)\right| ^2\,\mathrm{d}y\,\mathrm{d}t\\&+C\lambda _{x,3r}^2\fint _{B_{r/2}(X)}\left| \nabla A(y,t)\right| ^2\,\mathrm{d}y\,\mathrm{d}t. \end{aligned}$$

Notice that \(B_{r/2}(X)\subset W(x,2r)=\varDelta (x,2r)\times (r,2r]\) and \(B_{r/2}(X)\subset T(x,3r)\). Hence we can enlarge the region of the integrals on the right-hand side and then multiply both sides by \(u(x,3r)^{-2}r^3\) to get

$$\begin{aligned} \frac{\fint _{B_{r/4}(X)}\left| \nabla ^2u(y,t)\right| ^2\,\mathrm{d}y\,\mathrm{d}t}{ u(x,3r)^2}r^3\le & {} \frac{C r \fint _{T(x,3r)}\left| \nabla ( u(y,t)-{\lambda _{x,3r}}t)\right| ^2\,\mathrm{d}y\,\mathrm{d}t}{u(x,3r)^2}\\&+\frac{Cr^3\lambda _{x,3r}^2}{u(x,3r)^2}\fint _{W(x,2r)}\left| \nabla A(y,t)\right| ^2\,\mathrm{d}y\,\mathrm{d}t. \end{aligned}$$

By Lemma 2.8, and then the definitions (1.8)–(1.10) of \(\widetilde{\alpha }(x,r)\), \(\lambda _{x,3r}\) and \(\beta _u(x,3r)\),

$$\begin{aligned}&\frac{\fint _{B_{r/4}(X)}\left| \nabla ^2u(y,t)\right| ^2\,\mathrm{d}y\,\mathrm{d}t}{ u(x,3r)^2} \; r^3\\&\quad \le \frac{C\fint _{T(x,3r)}\left| \nabla ( u(y,t)-\lambda _{ x,3r}t)\right| ^2\,\mathrm{d}y\,\mathrm{d}t}{r\fint _{T(x,3r)}\left| \nabla u(y,t)\right| ^2\,\mathrm{d}y\,\mathrm{d}t} +\frac{C\left( \fint _{T(x,3r)}\partial _t u(y,t)\,\mathrm{d}y\,\mathrm{d}t\right) ^2 \widetilde{\alpha }(x,2r)^2}{r\fint _{T(x,3r)}\left| \nabla u(y,t)\right| ^2\,\mathrm{d}y\,\mathrm{d}t} \\&\quad \le \frac{C\beta _u(x,3r)}{r}+\frac{C\widetilde{\alpha }(x,2r)^2}{r}. \end{aligned}$$

Now we apply Theorem 1 and the DKP assumption (1.16) and get

$$\begin{aligned}&\int _{T_\varDelta }\frac{\fint _{B_{r/4}(X)}\left| \nabla ^2u(y,t)\right| ^2\,\mathrm{d}y\,\mathrm{d}t}{u(x,3r)^2} \; r^3\,\mathrm{d}x\,\mathrm{d}r\nonumber \\&\quad {\le C\int _{T_\varDelta }\beta _u(x,3r)\frac{\mathrm{d}x\,\mathrm{d}r}{r}+C\int _{T_{\varDelta }}\widetilde{\alpha }(x,2r)^2\frac{\mathrm{d}x\,\mathrm{d}r}{r}} \le C_{d,\mu _0}(1+C_0)\left| \varDelta \right| .\nonumber \\ \end{aligned}$$
(5.4)

We now use Fubini and Harnack’s inequality to obtain a lower bound for the left-hand side of (5.4). By Fubini,

$$\begin{aligned}&\int _{T_\varDelta }\frac{\fint _{B_{r/4}(X)}\left| \nabla ^2u(y,t)\right| ^2\,\mathrm{d}y\,\mathrm{d}t}{u(x,3r)^2} \; r^3\,\mathrm{d}x\,\mathrm{d}r\\&\quad = C_d\int _{(y,t)\in {\mathbb {R}}^{d+1}_+}\left| \nabla ^2u(y,t)\right| ^2\int _{(x,r)\in T_\varDelta }\mathbb {1}_{B_{r/4}(X)}(y,t)\frac{r^{2-d}}{u(x,3r)^2} \; \,\mathrm{d}x\,\mathrm{d}r\,\mathrm{d}y\,\mathrm{d}t. \end{aligned}$$

Observe that if \(\left| (y,t)-(x,3r/2)\right| \le \frac{t}{7}\), then \(t\approx r\), \(t\le \frac{7r}{4}\), and the latter implies that \(\mathbb {1}_{B_{r/4}(X)}(y,t)=1\). So the right-hand side is bounded from below by

$$\begin{aligned} C_d\int _{(y,t)\in T_\varDelta }\left| \nabla ^2u(y,t)\right| ^2 \int _{(x,r) ; (x,3r/2)\in B_{t/7}(y,t)}\frac{r^{2-d}}{u(x,3r)^2} \; \,\mathrm{d}x\,\mathrm{d}r\,\mathrm{d}y\,\mathrm{d}t. \end{aligned}$$

By Harnack, \(u(x,3r)\le C u(y,t)\) when \((x,3r/2)\in B_{t/7}(y,t)\). Hence

$$\begin{aligned} \int _{T_\varDelta }\frac{\fint _{B_{r/4}(X)}\left| \nabla ^2u(y,t)\right| ^2\,\mathrm{d}y\,\mathrm{d}t}{u(x,3r)^2}r^3\,\mathrm{d}x\,\mathrm{d}r \ge C_d\int _{(y,t)\in T_\varDelta }\left| \nabla ^2u(y,t)\right| ^2\frac{t^3}{u(y,t)^2}\,\mathrm{d}y\,\mathrm{d}t. \end{aligned}$$

From this and (5.4), the desired result follows. \(\square \)

Remark 5.5

If we apply the more precise estimate (1.14) in (5.4), we can get the following stronger result. For \(\tau \in (0,1/2)\), we have

$$\begin{aligned} \left\| \frac{\left| \nabla ^2 u(x,t)\right| ^2t^3}{u(x,t)^2}\,\mathrm{d}x\,\mathrm{d}t\right\| _{{\mathcal {C}}(\varDelta (x_0,\tau R)}\le C\tau ^a+C\left\| \widetilde{\alpha }(x,r)^2\frac{\mathrm{d}x\,\mathrm{d}r}{r}\right\| _{{\mathcal {C}}(\varDelta (x_0,R))}, \end{aligned}$$

for some C and \(a>0\) depending only on d and \(\mu _0\). As a consequence, if u is the Green function with pole at infinity (see Lemma 6.1 for the definition), then we have that

$$\begin{aligned} \left\| \frac{\left| \nabla ^2 G^\infty (x,t)\right| ^2t^3}{G^\infty (x,t)^2}\,\mathrm{d}x\,\mathrm{d}t\right\| _{{\mathcal {C}}}\le C\left\| \widetilde{\alpha }(x,r)^2\frac{\mathrm{d}x\,\mathrm{d}r}{r}\right\| _{{\mathcal {C}}}. \end{aligned}$$

6 Optimality

In this section, we construct an operator that does not satisfy the DKP condition and such that \(\beta _{G^\infty }(x,r)\frac{\mathrm{d}x\,\mathrm{d}r}{r}\) fails to be a Carleson measure. Moreover, we find a sequence of operators \(\{L_n\}\) that satisfy the DKP condition with constants increasing to infinity as n goes to infinity, and for any fixed \(1<R_0<\infty \), \(\left\| \beta _n(x,r)\frac{\mathrm{d}x\,\mathrm{d}r}{r}\right\| _{{\mathcal {C}}(\varDelta _{R_0})}\ge C(n-1)\), where \(\beta _n(x,r)=\beta _{G_n^\infty }(x,r)\), and \(G_n^\infty \) is the Green function with pole at infinity for \(L_n\). A similar construction is used in [6] Remark 3.2 and [4]. As we shall see, it is very simple to get a bad oscillating behaviour for \(G^\infty \) in the vertical direction; it is typically harder to get oscillation in the horizontal variables, as would be needed for bad harmonic measure estimates.

Let us give the precise definition of the Green function with pole at infinity. One can prove the following lemma as in [15], Lemma 3.7.

Lemma 6.1

Let \(L=-{{\,\mathrm{div}\,}}{A\nabla }\) be an elliptic operator on \({\mathbb {R}}^{d+1}_+\). Then there exists a unique function \(U\in C(\overline{{\mathbb {R}}^{d+1}_+})\) such that

$$\begin{aligned} {\left\{ \begin{array}{ll} L^\top U = 0 \quad \text {in } {\mathbb {R}}^{d+1}_+\\ U>0 \quad \text {in }{\mathbb {R}}^{d+1}_+\\ U(x,0)=0 \quad \text {for all }x\in {\mathbb {R}}^d,\\ \end{array}\right. } \end{aligned}$$

and \(U(0,1)=1\). We call the unique function U the Green function with pole at infinity for L.

Let \(A(x,t)=a(t)I\) for \((x,t)\in {\mathbb {R}}^{d+1}_+\), where I is the \(d+1\) identity matrix, and a(t) is a positive scalar function on \({\mathbb {R}}_+\). Let \(L=-{{\,\mathrm{div}\,}}{A(x,t)\nabla }\). We claim that the Green function with pole at infinity for L in \({\mathbb {R}}^{d+1}_+\) is (modulo a harmless multiplicative constant)

$$\begin{aligned} G(x,t)=g(t) \qquad \text {with}\quad g(0)=0,\quad g'(t)=\frac{1}{a(t)}. \end{aligned}$$
(6.2)

In fact, it is easy to check that \(L^\top G=0\) in \({\mathbb {R}}^{d+1}_+\), \(G(x,0)\equiv 0\), and the uniqueness of \(G^\infty \) does the rest. The derivatives of G are simple. They are

$$\begin{aligned} \nabla _xG(x,t)=0, \qquad \partial _tG(x,t)=\frac{1}{a(t)}. \end{aligned}$$
(6.3)

Now we set

$$\begin{aligned} a(t)= {\left\{ \begin{array}{ll} \frac{3}{2} \qquad \text {when }t\ge 2^{100},\\ 1 \qquad \text {when }2^{2k}+c_02^{2k-1}\le t\le 2^{2k+1}-c_02^{2k},\\ 2 \qquad \text {when }2^{2k+1}+c_02^{2k}\le t\le 2^{2k+2}-c_02^{2k+1}, \end{array}\right. } \end{aligned}$$

for all \(k\in {\mathbb {Z}}\) with \(k\le 49\), and a(t) is smooth in the remaining strips \(S_k = (2^k-c_02^{k-1},2^k+c_02^{k-1})\), with

$$\begin{aligned} \left| a'(t)\right| \le \frac{100}{c_02^k}\quad \text { for } t\in S_k= (2^k-c_02^{k-1},2^k+c_02^{k-1}). \end{aligned}$$

Here, \(c_0>0\) is a constant that will be taken sufficiently small and fixed. Additionally, we can make sure that \(a(t)=\frac{3}{2}\) in a small neighborhood of \(t=2^k\) to simplify our computations.

We construct the approximation of a(t) as follows. Set

$$\begin{aligned} a_n(t)={\left\{ \begin{array}{ll} a(t)\qquad \text {when } t\ge 2^{-2n},\\ \frac{3}{2} \qquad \text {when }0<t<2^{-2n}. \end{array}\right. } \end{aligned}$$

Then \(a_n\) converges to a pointwise in \({\mathbb {R}}^{d+1}_+\).

Let \(L_n=-{{\,\mathrm{div}\,}}{A_n(x,t)\nabla }=-{{\,\mathrm{div}\,}}\left( a_n(t)\nabla \right) \), and let \(G_n\) be the Green function with pole at infinity for \(L_n\), whose formula are given in (6.2).

We now compute the DKP constant for \(A_n\). Notice that \(\left| \nabla A_n\right| \ne 0\) only in the strips near \(2^k\) with width \(c_02^k\) for \(-2n\le k\le 100\), so it is easy to get the following estimate.

$$\begin{aligned}&\left\| \sup _{(y,t)\in W(x,r)}\left| \nabla A_n(y,t)\right| ^2r \,\mathrm{d}x\,\mathrm{d}r\right\| _{{\mathcal {C}}}\approx \left\| \left| a'_n(t)\right| ^2t\,\mathrm{d}x\,\mathrm{d}t\right\| _{{\mathcal {C}}}\\&\quad \approx \sum _{k=-2n}^{100}\frac{2^k}{(c_02^k)^2}c_02^k\approx \frac{2n+100}{c_0}. \end{aligned}$$

Similarly, we can compute the DKP constant for A.

$$\begin{aligned} \left\| \sup _{(y,t)\in W(x,r)}\left| \nabla A(y,t)\right| ^2r \,\mathrm{d}x\,\mathrm{d}r\right\| _{{\mathcal {C}}}\approx \left\| \left| a'(t)\right| ^2t\,\mathrm{d}x\,\mathrm{d}t\right\| _{{\mathcal {C}}}\approx c_0^{-1}\sum _{k=-\infty }^{100}1=\infty . \end{aligned}$$

Now we turn to \(\beta _n\). Recall the definition of \(\beta (x,r)\) (1.12) and the simple expressions for the derivatives of \(G_n\) (6.3). Set \(b_n(t)=\frac{1}{a_n(t)}\) and compute \(\beta _n(x,r)\) with T(xr) replaced by \(\varDelta (x,r)\times (0,r)\) in the definition of \(\beta (x,r)\); then

(6.4)

The estimates with our initial definition of T(xr) would be very similar, or could be deduced from the estimates with \(\varDelta (x,r)\times (0,r)\) because \(T(x,r/10) \subset \varDelta (x,r)\times (0,r) \subset T(x,10)\).

Notice that \(\beta _n(x,r)=0\) when \(r<2^{-2n}\). We estimate \(\left\| \beta _n(x,r)\frac{\mathrm{d}x\,\mathrm{d}r}{r}\right\| _{{\mathcal {C}}(\varDelta _{R_0})}\) for some fixed \(R_0\ge 1\). For simplicity, we only do the calculation when \(R_0<2^{100}\).

The main observation is that for any \(2^{-2n+2}\le r \le R_0\),

$$\begin{aligned} \left| b_n(t)-\fint _0^rb_n(s)\,\mathrm{d}s\right| ^2\ge \frac{1}{1000} \qquad \text {for } t\in [2^{-2n},r]{\setminus } (\cup _k S_k). \end{aligned}$$
(6.5)

Once we have (6.5), we can obtain the lower bound for \(\left\| \beta _n(x,r)\frac{\mathrm{d}x\,\mathrm{d}r}{r}\right\| _{{\mathcal {C}}(\varDelta _{R_0})}\) as follows. First, observe that the total measure of those \(S_k\) that intersects \([2^{-2n},r]\) is controlled. Namely,

$$\begin{aligned} \left| \cup _k S_k\cap [2^{-2n},r]\right| \le \sum _{k=-2n}^{-2n+j+1}c_02^k\le c_02^{-2n+j+2}\le 4c_0r, \end{aligned}$$

where j is the integer that \(2^{-2n+j}\le r<2^{-2n+j+1}\). Therefore,

$$\begin{aligned} \int _0^r\left| b_n(t)-\fint _0^rb_n(s)\,\mathrm{d}s\right| ^2\,\mathrm{d}t\ge \int _{[2^{-2n},r]{\setminus } \left( \cup _k S_k\right) }\frac{1}{1000}\,\mathrm{d}t\ge \frac{\frac{3}{4}-4c_0}{1000}r=:C_0 r. \end{aligned}$$

On the other hand, we have \( \int _0^r\left| b_n(t)\right| ^2\,\mathrm{d}t\le r \) since \(\left| b_n\right| \le 1\). Then by the formula (6.4) for \(\beta _n\), we obtain

$$\begin{aligned} \beta _n(x,r)\ge C_0 \qquad \text {for }r\in [2^{-2n+2}, R_0]. \end{aligned}$$

Thus,

$$\begin{aligned}&\sup _{0<R\le R_0}\frac{1}{R^d}\int _{\varDelta _R}\int _0^R\beta _n(x,r)\frac{\mathrm{d}x\,\mathrm{d}r}{r}\ge \frac{\left| \varDelta _{R_0}\right| }{R_0^d}\int _{2^{-2n+2}}^{R_0}C_0\frac{\mathrm{d}r}{r}\\&\quad =C_{d,c_0}\left( (2n-2)\ln 2+\ln R_0\right) \ge C_{d,c_0}(2n-2). \end{aligned}$$

Now we justify (6.5). This is true simply because the average \(\fint _0^rb_n(s)\,\mathrm{d}s\) takes value strictly between 1 and \(\frac{1}{2}\), so when t is away from the strips \(S_k\), \(b_n(t)\) should be different than \(\fint _0^rb_n(s)\,\mathrm{d}s\). We just need to make sure that the lower bound does not depend on n in a way that would cancel the blow up.

We first simplify our computation of \(\fint _0^rb_n(s)\,\mathrm{d}s\) by observing that we can take \(c_0=0\). This is because if \(c_0\ne 0\), we can always require the average of \(b_n\) in (0, r) to be the same as the case when \(b_n\) is not smoothed out (that is \(c_0=0\)), as long as r does not lie in any strip \(S_k\), by choosing our \(a_n\) carefully. But if \(r\in S_k\), this should not affect \(\fint _0^rb_n(s)\,\mathrm{d}s\) much if we take \(c_0\) to be sufficiently small.

Fix \(2^{-2n+2}\le r\le R_0\). If \(2^{2k_0}\le r<2^{2k_0+1}\) for some \(k_0\in {\mathbb {Z}}\), then a direct computation shows

$$\begin{aligned} \fint _0^rb_n(s)\,\mathrm{d}s=1+\frac{2^{-2n}}{2r}-\frac{2^{2k_0}}{3r}. \end{aligned}$$

If \(2^{2k_0}\le r<2^{2k_0+1}\) for some \(k_0\in {\mathbb {Z}}\), then

$$\begin{aligned} \fint _0^rb_n(s)\,\mathrm{d}s=\frac{1}{2}+\frac{2^{-2n}}{2r}+\frac{2^{2k_0+1}}{3r}. \end{aligned}$$

Since \(b_n\) is either 1 or 1/2 in \((0,r){\setminus } S_k\), a case-by-case computation shows that for any \(2^{-2n+2}\le r\le R_0\), \(\left| b_n(t)-\fint _0^rb_n(s)\,\mathrm{d}s\right| \ge \frac{1}{12}\) for \(t\in [2^{-2n},r]{\setminus } S_k\). Then with \(c_0>0\) sufficiently small, we have (6.5).