1 Introduction

Nonlinear approximation from piecewise polynomials (splines) in dimensions \(d>1\) is important from theoretical and practical points of view. We are interested in characterizing the rates of nonlinear spline approximation in \(L^p\). While this theory is simple and well understood in the univariate case, it is underdeveloped and challenging in dimensions \(d>1\).

In this article, we focus on nonlinear approximation in \(L^p\), \(0<p<\infty \), from regular piecewise polynomials in \(\mathbb {R}^2\) or on compact subsets of \(\mathbb {R}^2\) with polygonal boundaries. Our goal is to obtain complete characterization of the rates of approximation (the associated approximation spaces). To describe our results, we begin by introducing in more detail our

Setting and approximation tool. We are interested in approximation in the space \(L^p\), \(0<p<\infty \), from the class of regular piecewise polynomials \(\mathcal {S}(n, k)\) of degree \(k-1\) with \(k\ge 1\) of maximum smoothness over n rings. More specifically, with \(\Omega \) being a compact polygonal domain in \(\mathbb {R}^2\) or \(\Omega =\mathbb {R}^2\), we denote by \(\mathcal {S}(n, k)\) the set of all piecewise polynomials S of the form

$$\begin{aligned} S=\sum _{j=1}^n P_j\cdot {\mathbbm {1}}_{R_j}, \quad S\in C^{k-2}(\Omega ), \quad P_j\in \Pi _{k-1}, \end{aligned}$$
(1.1)

where \(R_1, \dots , R_n\) are rings with disjoint interiors. Here \(\Pi _{k-1}\) denotes the set of all algebraic polynomials of (total) degree \(\le k-1\) in two variables, and as usual \(S\in C^{k-2}(\Omega )\) means that all partial derivatives \(\partial ^\alpha S \in C(\Omega )\), \(|\alpha | \le k-2\). The elements of \(\mathcal {S}(n, 1)\) are simply piecewise constants.

A set \(R\subset \mathbb {R}^2\) is called a ring if R is a compact convex set with polygonal boundary or the difference of two such sets. All convex sets we consider are with uniformly bounded eccentricity, and we do not allow uncontrollably narrow elongated subregions. For the precise definitions, see Sects. 3.1 and 4.1.

It is important to point out that although regular, our tool for approximation is highly nonlinear. In particular, the rings in (1.1) may vary with S, and we do not assume any nested structure of the rings involved in the definition of different splines S in (1.1). Consequently, if \(S_1, S_2\in \mathcal {S}(n, k)\), then in general \(S_1\pm S_2\not \in \mathcal {S}(N, k)\) for any N. The case of approximation from splines over nested (anisotropic) rings induced by hierarchical nested triangulations is developed in [3, 6].

Given a function \(f\in L^p(\Omega )\), we denote by \(S_n^k(f)_p\) the best \(L^p\)-approximation of f from \(\mathcal {S}(n, k)\). Our goal is to completely characterize the approximation spaces \(A_q^\alpha \), \(\alpha >0\), \(0<q\le \infty \), defined by the (quasi)norm

$$\begin{aligned} \Vert f\Vert _{A_q^\alpha } := \Vert f\Vert _{L^p} + \left( \sum _{n=1}^\infty \left( n^\alpha S_n^k(f)_p\right) ^q\frac{1}{n}\right) ^{1/q}, \end{aligned}$$

with the \(\ell ^q\)-norm replaced by the \(\sup \)-norm if \(q=\infty \). To this end, we utilize the standard machinery of Jackson and Bernstein estimates. The Besov spaces \(B_\tau ^{s, k}{:=}B_{\tau \tau }^{s, k}\) with \(1/\tau =s/2+1/p\) naturally appear in our regular setting, see (2.1). The Jackson estimate takes the form: For any \(f\in B_\tau ^{s, k}\),

$$\begin{aligned} S_n^k(f)_p \le cn^{-s/2}|f|_{B_\tau ^{s, k}}. \end{aligned}$$
(1.2)

For \(k=1, 2\), this estimate follows readily from the results in [6]. It is an open problem to establish it for \(k>2\). Estimate (1.2) implies the direct estimate

$$\begin{aligned} S_n^k(f)_p \le cK\left( f, n^{-s/2}\right) , \end{aligned}$$
(1.3)

where \(K(f, t)=K(f, t;L^p, B_\tau ^{s,k})\) is the K-functional induced by \(L^p\) and \(B_\tau ^{s, k}\), see (3.6). Note that estimate (1.2), for any \(k>2\), is well known and easy to prove when approximating from discontinuous piecewise polynomials over rings. For example, it follows from Theorems 2.25 and 3.10 in [6]. For smoother splines (but not splines of maximal smoothness), (1.2) follows by Theorems 2.15 and 3.1 in [3].

It is a major problem to establish a companion matching inverse estimate. The following Bernstein estimate would imply such an estimate:

$$\begin{aligned} |S_1-S_2|_{B^{s,k}_{\tau }} \le cn^{s/2}\Vert S_1-S_2\Vert _{L^p}, \quad S_1, S_2\in \mathcal {S}(n, k). \end{aligned}$$
(1.4)

However, as is easy to show, this estimate is not valid. The problem is that \(S_1-S_2\) may have one or more uncontrollably elongated parts such as \({\mathbbm {1}}_{[0, {\varepsilon }]\times [0, 1]}\) with small \({\varepsilon }\), which create problems for the Besov norm, see Example 3.3 below.

The main idea of this article is to replace (1.4) by the Bernstein type estimate

$$\begin{aligned} |S_1|_{B^{s,k}_{\tau }}^\lambda \le |S_2|_{B^{s,k}_{\tau }}^\lambda + cn^{\lambda s/2}\Vert S_1-S_2\Vert _{L^p}^\lambda , \quad \lambda :=\min \{\tau , 1\}, \end{aligned}$$
(1.5)

where \(0<s/2<k-1+1/p\). This estimate leads to the needed inverse estimate

$$\begin{aligned} K(f, n^{-s/2}) \le cn^{-s/2} \left( \sum _{\nu =1}^n \frac{1}{\nu }\left[ \nu ^{s/2} S_\nu (f)_p\right] ^\lambda + \Vert f\Vert _p^\lambda \right) ^{1/\lambda }. \end{aligned}$$
(1.6)

In turn, this estimate and (1.3) yield a characterization of the associated approximation spaces \(A_q^\alpha \) in terms of real interpolation spaces

$$\begin{aligned} A_q^\alpha = \left( L^p, B_\tau ^{s, k}\right) _{\frac{\alpha }{s}, q}, \quad 0<\alpha<s, \; 0<q\le \infty . \end{aligned}$$
(1.7)

See, e.g., [4, 8].

A natural restriction on the Bernstein estimate (1.5) is the requirement that the splines \(S_1, S_2\in \mathcal {S}(n, k)\) have maximum smoothness. For instance, if we consider approximation from piecewise linear functions S (\(k=2\)), it is assumed that S is continuous. As will be shown in Example 4.4, estimate (1.5) is no longer valid for discontinuous piecewise linear functions.

Motivation. Our setting would simplify considerably if the rings \(R_j\) in (1.1) are replaced by regular convex sets with polygonal boundaries or simply triangles. For example, in many cases, people do adaptive approximation from piecewise linear functions by local refinements resulting in nested triangulations. However, this would restrict the approximation power of our approximation tool. The isotropic refinement schemes can give rate \(O(n^{-s/2})\) of \(L^p\)-approximation for functions in the Besov space \(B^{s,k}_\tau \) with \(1/\tau <s/2+1/p\), which is off the Sobolev embedding line. For more details, see [1] ([5] is also relevant). In contrast, the piecewise polynomials over rings as defined above allow one to obtain the Jackson estimate (1.2), where \(1/\tau =s/2+1/p\); i.e., in this case the Besov space \(B^{s,k}_\tau \) is just on the Sobolev embedding line. This leads to a complete characterization of the associated approximation spaces. The idea of using rings has already been utilized in [2]. In concluding, there are two main points that we would like to make:

(i) In order to achieve the full strength of nonlinear spline approximation in dimension \(d=2\), the underlying partition should be in rings or a partition compatible with rings.

(ii) In nonlinear approximation from regular splines in \(L^p\), \(p<\infty \), in dimension \(d=2\) the rates of approximation are not sensitive to the particular underlying partitions as long as these are in rings. For example, in regular piecewise linear approximation asymptotically one cannot do better than if approximating by using a particular hierarchy of Courant hat functions over regular nested triangulations.

The proof of the Bernstein estimate (1.5) is quite involved. To make it more understandable, we first prove it in Sect. 3 in the easier case of piecewise constants and then in Sect. 4 for smoother splines. Our method is not limited to splines in dimension \(d=2\). However, there is a great deal of geometric arguments involved in these proofs, and to avoid more complicated considerations, we focus only on spline approximation in dimension \(d=2\) here.

Useful notation. Throughout this article, we shall use |G| to denote the Lebesgue measure of a set \(G\subset \mathbb {R}^2\), \(G^\circ \), \(\overline{G}\), and \(\partial G\) will denote the interior, closure, and boundary of G, \(d(G):= \sup _{x, y\in G} |x-y|\) will stand for the diameter of G, and \({\mathbbm {1}}_G\) will denote the characteristic function of G; as usual, |x| will stand for the Euclidean norm of \(x\in \mathbb {R}^2\). If G is finite, then \(\# G\) will stand for the number of elements of G. If \(\gamma \) is a polygonal curve in \(\mathbb {R}^2\), then \(\ell (\gamma )\) will denote its length. Positive constants will be denoted by \(c_1\), \(c_2\), \(c'\), \(\dots \), and they may vary at every occurrence. Some important constants will be denoted by \(c_0\), \(N_0\), \(\beta , \dots \), and will remain unchanged throughout. The notation \(a\sim b\) will stand for \(c_1\le a/b\le c_2\).

2 Background

2.1 Besov Spaces

Besov spaces appear naturally in nonlinear spline approximation. For spline approximation in \(L^p(\Omega )\), \(0<p<\infty \), we will utilize the Besov spaces \(B^{s, k}_\tau =B_{\tau \tau }^{s, k}\), where \(s>0\), \(k\ge 1\), \(1/\tau :=s/2+1/p\). The space \(B^{s, k}_\tau \) is defined as the set of all functions \(f\in L^\tau (\Omega )\) such that

$$\begin{aligned} |f|_{B^{s, k}_\tau } := \left( \int _0^\infty \left[ t^{-s}\omega _k(f, t)_\tau \right] ^\tau \frac{\mathrm{d}t}{t}\right) ^{1/\tau } <\infty . \end{aligned}$$
(2.1)

Here \(\omega _k(f, t)_\tau := \sup _{|h|\le t}\Vert \Delta _h^kf(\cdot )\Vert _{L^\tau (\Omega )}\), with

$$\begin{aligned} \Delta _h^kf(x):= \sum _{\nu =0}^k (-1)^{k+\nu } \left( {\begin{array}{c}k\\ \nu \end{array}}\right) f(x+\nu h) \end{aligned}$$

if the segment \([x, x+kh] \subset \Omega \) and \(\Delta _h^kf(x):=0\) otherwise.

Observe that for the standard Besov spaces \(B^s_{pq}\) with \(s>0\) and \(1\le p, q \le \infty \), the norm is independent of the index \(k>s\). However, for the above Besov spaces in general \(\tau <1\), which changes the nature of the Besov space and k should no longer be directly connected to s. For more details, see the discussion in [6, pp. 202–203].

2.2 Nonlinear Spline Approximation in Dimension \({{\varvec{d}}}=\mathbf{1}\)

For comparison, here we provide a brief account of nonlinear spline approximation in the univariate case. Denote by \(\tilde{S}_n^k(f)_p\) the best \(L^p\)-approximation of \(f\in L^p(\mathbb {R})\) from the set \(\tilde{S}(n, k)\) of all piecewise polynomials S of degree \(\le k-1\) with \(n+1\) free knots. Thus, \(S\in \tilde{S}(n, k)\) if \(S=\sum _{j=1}^n P_j{\mathbbm {1}}_{I_j}\), where \(P_j\in \Pi _{k-1}\) and \(I_j\), \(j=1, \dots , n\), are arbitrary compact intervals with disjoint interiors and \(\cup _j I_j\) is an interval. No smoothness of S is required.

Let \(s>0\), \(0<p<\infty \), and \(1/\tau =s+1/p\). The following Jackson and Bernstein estimates hold (see [7]): If \(f\in L^p(\mathbb {R})\) and \(n\ge 1\), then

$$\begin{aligned} \tilde{S}_n^k(f)_p \le cn^{-s}|f|_{B_\tau ^{s, k}} \end{aligned}$$
(2.2)

and

$$\begin{aligned} |S|_{B^{s,k}_{\tau }} \le cn^{s}\Vert S\Vert _{L^p}, \quad S\in \tilde{S}(n, k), \end{aligned}$$
(2.3)

where \(c>0\) is a constant depending only on s and p. These estimates imply direct and inverse estimates that allow the complete characterization of the respective approximation spaces. For more details, see [7] or [4, 8].

Several remarks are in order. (1) Above no smoothness is imposed on the piecewise polynomials from \(\tilde{S}(n, k)\). The point is that the rates of approximation from smooth splines are the same as for nonsmooth splines. A key observation is that in dimension \(d=1\), the discontinuous piecewise polynomials are infinitely smooth with respect to the Besov spaces \(B_\tau ^{s, k}\). This is not the case in dimensions \(d>1\), where smoothness matters. (2) Unlike in the multivariate case, estimates (2.22.3) hold for every \(s>0\). (3) If \(S_1, S_2\in \tilde{S}(n, k)\), then \(S_1-S_2\in \tilde{S}(2n, k)\), and hence (2.3) is sufficient for establishing the respective inverse estimate. This is not true in the multivariate case, and one needs estimates like (1.4) (if valid) or (1.5) (in our case). (4) There is a great deal of geometry involved in multivariate spline approximation, while in dimension \(d=1\) there is none.

2.3 Nonlinear Nested Spline Approximation in Dimension \({{\varvec{d}}}=\mathbf{2}\)

The rates of approximation in \(L^p\), \(0<p<\infty \), from splines generated by multilevel anisotropic nested triangulations in \(\mathbb {R}^2\) are studied in [3, 6]. The respective approximation spaces are completely characterized in terms of Besov type spaces (B-spaces) defined via local piecewise polynomial approximation. The setting in [3, 6] allows one to deal with piecewise polynomials over triangulations with arbitrarily sharp angles. However, the nested structure of the underlying triangulations is quite restrictive. In this article, we consider nonlinear approximation from nonnested splines, but in a regular setting. This is a setting that frequently appears in applications.

3 Nonlinear Approximation from Piecewise Constants

3.1 Setting

Here we describe all components of our setting, including the region \(\Omega \) where the approximation will take place and the tool for approximation we consider.

The region \({\varvec{\Omega }}\). We shall consider two scenarios for \(\Omega \): (a) \(\Omega \) is a compact polygonal domain in \(\mathbb {R}^2\), or (b) \(\Omega ={\mathbb {R}}^2\). More explicitly, in the first case, we assume that \(\Omega \) can be represented as the union of finitely many rings in the sense of Definition 3.1 with disjoint interiors. Therefore, the boundary \(\partial \Omega \) of \(\Omega \) is the union of finitely many polygonal curves consisting of finitely many segments (edges).

The approximation tool. To describe our tool for approximation, we first introduce rings in \(\mathbb {R}^2\).

Definition 3.1

We say that \(R\subset \mathbb {R}^2\) is a ring if R can be represented in the form \(R=Q_1\setminus Q_2\), where \(Q_1, Q_2\) satisfy the following conditions:

  1. (a)

    \(Q_2\subset Q_1\) or \(Q_2=\emptyset \);

  2. (b)

    Each of \(Q_1\) and \(Q_2\) is a compact regular convex set in \(\mathbb {R}^2\) whose boundary is a polygonal curve consisting of no more than \(N_0\) (\(N_0 \ge 3\) is fixed) line segments. Here a compact convex set \(Q\subset \mathbb {R}^2\) is deemed regular if Q has a bounded eccentricity; that is, there exist balls \(B_1\), \(B_2\), \(B_j=B(x_j, r_j)\), such that \(B_2\subset Q \subset B_1\) and \(r_1 \le c_0 r_2\), where \(c_0>0\) is a universal constant.

  3. (c)

    R contains no uncontrollably narrow and elongated subregions, which is specified as follows: Each edge (segment) E of the boundary of R can be subdivided into the union of at most two segments \(E_1\), \(E_2\) (\(E=E_1\cup E_2\)) with disjoint (one dimensional) interiors such that there exist triangles \(\triangle _1\) with a side \(E_1\) and adjacent to \(E_1\) angles of magnitude \(\beta \), and \(\triangle _2\) with a side \(E_2\) and adjacent to \(E_2\) angles of magnitude \(\beta \) such that \(\triangle _j\subset R\), \(j=1, 2\), where \(0<\beta \le \pi /3\) is a fixed constant.

Fig. 1
figure 1

Left a ring \(R=Q_1\setminus Q_2\). Right R with the triangles associated to the segments of \(\partial R\)

Figure 1 illustrates the above definition of rings.

Remark

Observe that from the above definition, it readily follows that for any ring R in \(\mathbb {R}^2\),

$$\begin{aligned} |R|\sim d(R)^2, \end{aligned}$$
(3.1)

with constants of equivalence depending only on the parameters \(N_0\), \(c_0\), and \(\beta \).

Condition 3.2

In the case, when \(\Omega \) is a compact polygonal domain in \(\mathbb {R}^2\), we assume that there exists a constant \(n_0 \ge 1\) such that \(\Omega \) can be represented as the union of \(n_0\) rings \(R_j\) with disjoint interiors: \(\Omega = \cup _{j=1}^{n_0} R_j\). If \(\Omega =\mathbb {R}^2\), then we set \(n_0:=1\).

We now can introduce the class of regular piecewise constants.

Case 1: \(\Omega \) is a compact polygonal domain in \(\mathbb {R}^2\). We denote by \(\mathcal {S}(n, 1)\) (\(n\ge n_0\)) the set of all piecewise constants S of the form

$$\begin{aligned} S=\sum _{j=1}^n c_j{\mathbbm {1}}_{R_j}, \quad c_j\in \mathbb {R}, \end{aligned}$$
(3.2)

where \(R_1, \dots , R_n\) are rings with disjoint interiors such that \(\Omega =\cup _{j=1}^n R_j\).

Case 2: \(\Omega =\mathbb {R}^2\). In this case, we denote by \(\mathcal {S}(n, 1)\) the set of all piecewise constant functions S of the form (3.2), where \(R_1, \dots , R_n\) are rings with disjoint interiors such that the support \(R :=\cup _{j=1}^n R_j\) of S is a ring in the sense of Definition 3.1.

A simple case of the above setting is when \(\Omega =[0, 1]^2\) and the rings R are of the form \(R=Q_1\setminus Q_2\), where \(Q_1\), \(Q_2\) are dyadic squares in \(\mathbb {R}^2\). These kind of dyadic rings have been used in [2].

A bit more general is the setting when \(\Omega \) is a regular rectangle in \(\mathbb {R}^2\) with sides parallel to the coordinate axes or \(\Omega =\mathbb {R}^2\) and the rings R are of the form \(R=Q_1\setminus Q_2\), where \(Q_1\), \(Q_2\) are regular rectangles with sides parallel to the coordinate axes, and no narrow and elongated subregions are allowed in the sense of Definition 3.1 (c).

Clearly, the set \(\mathcal {S}(n, 1)\) is nonlinear since the rings \(\{R_j\}\) and the constants \(\{c_j\}\) in (3.2) may vary with S.

We denote by \(S_n^1(f)_p\) the best approximation of \(f\in L^p(\Omega )\) from \(\mathcal {S}(n, 1)\) in \(L^p(\Omega )\), \(0< p<\infty \); i.e.,

$$\begin{aligned} S_n^1(f)_p:= \inf _{S\in \mathcal {S}(n, 1)}\Vert f-S\Vert _{L^p}. \end{aligned}$$
(3.3)

Besov spaces. When approximating in \(L^p\), \(0< p<\infty \), from piecewise constants the Besov spaces \(B^{s, 1}_{\tau }\) with \(1/\tau =s/2+1/p\) naturally appear. In this section, we shall use the abbreviated notation \(B^s_{\tau }:=B^{s, 1}_{\tau }\).

3.2 Direct and Inverse Estimates

The following Jackson estimate is quite easy to establish (see [6]): If \(f\in B^s_\tau \), \(s>0\), \(1/\tau :=s/2+1/p\), \(0<p<\infty \), then \(f\in L^p(\Omega )\) and

$$\begin{aligned} S_n^1(f)_p \le cn^{-s/2}|f|_{B^s_{\tau }} \quad \hbox {for}\quad n\ge n_0, \end{aligned}$$
(3.4)

where \(c>0\) is a constant depending only on sp and the structural constants \(N_0\), \(c_0\), and \(\beta \) of the setting.

This estimate leads immediately to the following direct estimate: If \(f\in L^p(\Omega )\), then

$$\begin{aligned} S_n^1(f)_p \le cK(f, n^{-s/2}), \quad n\ge n_0, \end{aligned}$$
(3.5)

where K(ft) is the K-functional induced by \(L^p\) and \(B^s_\tau \); namely,

$$\begin{aligned} K(f, t)=K(f, t; L^p, B^s_\tau ) := \inf _{g\in B^s_\tau } \{\Vert f-g\Vert _p+t|g|_{B^s_\tau }\}, \quad t>0. \end{aligned}$$
(3.6)

The main problem here is to prove a matching inverse estimate. Observe that the following Bernstein estimate holds: If \(S\in \mathcal {S}(n, 1)\), \(n\ge n_0\), and \(0< p<\infty \), \(0<s<2/p\), \(1/\tau =s/2+1/p\), then

$$\begin{aligned} |S|_{B^s_{\tau }} \le cn^{s/2}\Vert S\Vert _{L^p}, \end{aligned}$$
(3.7)

where the constant \(c>0\) depends only on sp, and the structural constants of the setting (see the proof of Theorem 4.5). The point is that this estimate does not imply a companion to (3.5) inverse estimate. The following estimate would imply such an estimate:

$$\begin{aligned} |S_1-S_2|_{B^s_{\tau }} \le cn^{s/2}\Vert S_1-S_2\Vert _{L^p}, \quad S_1, S_2\in \mathcal {S}(n, 1). \end{aligned}$$
(3.8)

However, as the following example shows this estimate is in general not valid.

Example 3.3

Consider the function \(f :={\mathbbm {1}}_{[0, {\varepsilon }]\times [0, 1]}\), where \({\varepsilon }>0\) is sufficiently small. It is easy to see that

$$\begin{aligned} \omega _1(f, t)_\tau ^\tau \sim \left\{ \begin{array}{lll} t \quad \hbox {if} \quad t\le {\varepsilon },\\ {\varepsilon }\quad \hbox {if} \quad t > {\varepsilon }, \end{array} \right. \end{aligned}$$

and hence for \(0<s<2/p\) and \(1/\tau =s/2+1/p\), we have

$$\begin{aligned} |f|_{B^s_{\tau }} \sim {\varepsilon }^{1/\tau -s} \sim {\varepsilon }^{1/p-s/2} \sim {\varepsilon }^{-s/2}\Vert f\Vert _{L^p}, \quad \hbox {implying} \quad |f|_{B^s_{\tau }} \not \le c\Vert f\Vert _{L^p}, \end{aligned}$$

since \({\varepsilon }\) can be arbitrarily small. It is easy to see that one comes to the same conclusion if f is the characteristic function of any convex elongated set in \(\mathbb {R}^2\). The point is that if \(S_1, S_2\in \mathcal {S}(n, 1)\), then \(S_1-S_2\) can be a constant multiple of the characteristic function of one or more elongated convex sets in \(\mathbb {R}^2\), and, therefore, estimate (3.8) is in general not possible.

We overcome the problem with estimate (3.8) by establishing the following main result:

Theorem 3.4

Let \(0< p<\infty \), \(0<s<2/p\), and \(1/\tau =s/2+1/p\). Then for any \(S_1, S_2\in \mathcal {S}(n, 1)\), \(n\ge n_0\), we have

$$\begin{aligned} |S_1|_{B^s_{\tau }}&\le |S_2|_{B^s_{\tau }} + cn^{s/2}\Vert S_1-S_2\Vert _{L^p} \quad \hbox {if} \;\; \tau \ge 1, \quad \hbox {and}\end{aligned}$$
(3.9)
$$\begin{aligned} |S_1|_{B^s_{\tau }}^\tau&\le |S_2|_{B^s_{\tau }}^\tau + cn^{\tau s/2}\Vert S_1-S_2\Vert _{L^p}^\tau \quad \hbox {if} \;\; \tau < 1, \end{aligned}$$
(3.10)

where the constant \(c>0\) depends only on sp, and the structural constants \(N_0\), \(c_0\), and \(\beta \); \(n_0\) is from Condition 3.2.

In the limiting case, we have this result:

Theorem 3.5

If \(S_1, S_2\in \mathcal {S}(n, 1)\), \(n\ge n_0\), then

$$\begin{aligned} |S_1|_{BV} \le |S_2|_{BV} + cn^{1/2}\Vert S_1-S_2\Vert _{L^2}, \end{aligned}$$
(3.11)

where the constant \(c>0\) depends only on the structural constants \(N_0\), \(c_0\), and \(\beta \).

The proof of this theorem is easier than the one of Theorem 3.4 and will be omitted.

We next show that estimates (3.93.10) and (3.11) imply the desired inverse estimate.

Theorem 3.6

Let p, s, and \(\tau \) be as in Theorem 3.4, and set \(\lambda :=\min \{\tau , 1\}\). Then for any \(f\in L^p(\Omega )\), we have

$$\begin{aligned} K(f, n^{-s/2}) \le cn^{-s/2} \left( \sum _{\ell =n_0}^n \frac{1}{\ell }\left[ \ell ^{s/2} S_\ell ^1(f)_p\right] ^\lambda + \Vert f\Vert _p^\lambda \right) ^{1/\lambda }, \quad n\ge n_0. \end{aligned}$$
(3.12)

Here \(K(f, t)=K(f, t; L^p, B^s_\tau )\) is the K-functional defined in (3.6), and \(c>0\) is a constant depending only on sp, and the structural constants of the setting.

Furthermore, in the case when \(p=2\) and \(s=1\), estimate (3.12) holds with \(B^s_\tau \) replaced by BV and \(\lambda =1\).

Proof

Let \(\tau < 1\) and \(f\in L^p(\Omega )\). We may assume that for any \(n\ge n_0\), there exists \(S_n\in \mathcal {S}(n, 1)\) such that \(\Vert f-S_n\Vert _p = S_n^1(f)_p\). Clearly, for any \(m\ge m_0\) with \(m_0:=\lceil \log _2 n_0 \rceil \), we have

$$\begin{aligned} K(f, 2^{-ms/2}) \le \Vert f-S_{2^m}\Vert _p +2^{-ms/2}|S_{2^m}|_{B^s_\tau }. \end{aligned}$$
(3.13)

We now estimate \(|S_{2^m}|_{B^s_\tau }^\tau \) using iteratively estimate (3.10). For \(\nu \ge m_0+1\), we get

$$\begin{aligned} |S_{2^\nu }|_{B^s_\tau }^\tau&\le |S_{2^{\nu -1}}|_{B^s_\tau }^\tau +c2^{\tau \nu s/2}\Vert S_{2^\nu }-S_{2^{\nu -1}}\Vert _p^\tau \\&\le |S_{2^{\nu -1}}|_{B^s_\tau }^\tau +c2^{\tau \nu s/2}\left( \Vert f-S_{2^\nu }\Vert _p^\tau + \Vert f-S_{2^{\nu -1}}\Vert _p^\tau \right) \\&\le |S_{2^{\nu -1}}|_{B^s_\tau }^\tau +c'2^{\tau \nu s/2}S_{2^{\nu -1}}^1(f)_p^\tau . \end{aligned}$$

From (3.7) we also have

$$\begin{aligned} |S_{2^{m_0}}|_{B^s_\tau } \le c\Vert S_{2^{m_0}}\Vert _p \le c\Vert f-S_{2^{m_0}}\Vert _p + c\Vert f\Vert _p = cS_{2^{m_0}}^1(f)_p + c\Vert f\Vert _p. \end{aligned}$$

Summing up these estimates, we arrive at

$$\begin{aligned} |S_{2^m}|_{B^s_\tau }^\tau \le c\sum _{\nu =m_0}^{m-1}2^{\tau \nu s/2}S_{2^{\nu }}^1(f)_p^\tau + c\Vert f\Vert _p^\tau . \end{aligned}$$

Clearly, this estimate and (3.13) imply (3.12). The proof in the cases \(\lambda \ge 1\) or \(p=2\), \(s=1\), and \(B^s_\tau \) replaced by BV is similar; we omit it. \(\square \)

Observe that the direct and inverse estimates (3.5) and (3.93.11) imply immediately a characterization of the approximation spaces \(A_q^\alpha \) associated with piecewise constant approximation from above just like in (1.7).

3.3 Proof of Theorem 3.4

We shall only consider the case when \(\Omega \subset \mathbb {R}^2\) is a compact polygonal domain, obeying Condition 3.2. The proof in the case \(\Omega =\mathbb {R}^2\) is similar.

Assume \(S_1, S_2\in \mathcal {S}(n, 1)\), \(n\ge n_0\). Then \(S_1, S_2\) can be represented in the form \( S_j=\sum _{R\in \mathcal {R}_j} c_R {\mathbbm {1}}_{R}, \) where \(\mathcal {R}_j\) is a set of at most n rings in the sense of Definition 3.1 with disjoint interiors and such that \(\Omega = \cup _{R\in \mathcal {R}_j} R\), \(j=1, 2\).

We denote by \(\mathcal {U}\) the set of all maximal compact connected subsets U of \(\Omega \) obtained by intersecting all rings from \(\mathcal {R}_1\) and \(\mathcal {R}_2\) with the property \(\overline{U^\circ } =U\) (the closure of the interior of U is U). Here U being maximal means that it is not contained in another such set.

Observe first that each \(U\in \mathcal {U}\) is obtained from the intersection of exactly two rings \(R'\in \mathcal {R}_1\) and \(R''\in \mathcal {R}_2\), and is a subset of \(\Omega \) with polygonal boundary \(\partial U\) consisting of \(\le 2N_0\) line segments (edges). Secondly, the sets in \(\mathcal {U}\) have disjoint interiors and \(\Omega =\cup _{U\in \mathcal {U}}U\).

It is easy to see that there exists a constant \(c>0\) such that

$$\begin{aligned} \# \mathcal {U}\le cn. \end{aligned}$$
(3.14)

Indeed, each \(U\in \mathcal {U}\) is obtained by intersecting two rings, say, \(R'\in \mathcal {R}_1\) and \(R''\in \mathcal {R}_2\). If \(|R'|<|R''|\), we associate \(R'\) with U, if \(|R'|> |R''|\) we associate \(R''\) with U, and if \(|R'|=|R''|\), we associate either \(R'\) or \(R''\) with U. However, because of condition (b) in Definition 3.1, every ring R from \(\mathcal {R}_1\) or \(\mathcal {R}_2\) can be intersected by only finitely many, say, \(N^\star \) rings from \(\mathcal {R}_2\) or \(\mathcal {R}_1\), respectively, of area \(\ge |R|\). Here \(N^\star \) depends only on the structural constants \(N_0\) and \(c_0\). Also, the intersection of any two rings may have only finitely many, say \(N^{\star \star }\), connected components. Therefore, every ring \(R\in \mathcal {R}_1\cup \mathcal {R}_2\) can be associated with no more than \(N^\star N^{\star \star }\) sets \(U\in \mathcal {U}\), which implies (3.14) with \(c=2N^\star N^{\star \star }\).

Example 3.3 clearly indicates that our main problem will be in dealing with sets \(U\in \mathcal {U}\) or parts of them with \((\mathrm{diameter})^2\) much larger than their area. To overcome the problem with these sets, we shall subdivide each of them using the following

Construction of good triangles. According to Definition 3.1, each segment E from the boundary of every ring \(R\in \mathcal {R}_j\) can be subdivided into the union of at most two segments \(E_1\), \(E_2\) (\(E=E_1\cup E_2\)) with disjoint interiors such that there exist triangles \(\triangle _1\) with a side \(E_1\) and adjacent to \(E_1\) angles of size \(\beta >0\) and \(\triangle _2\) with a side \(E_2\) and adjacent to \(E_2\) angles \(\beta \) such that \(\triangle _\ell \subset R\), \(\ell =1, 2\). We now associate with \(\triangle _1\) the triangle \(\tilde{\triangle }_1 \subset \triangle _1\) with one side \(E_1\) and adjacent to \(E_1\) angles of size \(\beta /2\); just in the same way we construct the triangle \(\tilde{\triangle }_2 \subset \triangle _2\) with a side \(E_2\). We proceed in the same way for each edge E from \(\partial R\), \(R\in \mathcal {R}_j\), \(j=1, 2\). We denote by \(\mathcal {T}_R\) the set of all triangles \(\tilde{\triangle }_1\), \(\tilde{\triangle }_2\) associated in the above manner with all edges E from \(\partial R\). We shall call the triangles from \(\mathcal {T}_R\) the good triangles associated with R. Observe that due to \(\triangle _1, \triangle _2 \subset R\) for the triangles from above it readily follows that the good triangles associated with R (\(R\in \mathcal {R}_j\), \(j=1, 2\)) have disjoint interiors; this was the purpose of the above construction. To see this, one has simply to consider two arbitrary segments on \(\partial R\) and the associated triangles.

From now on, for every segment E from \(\partial R\) that has been subdivided into \(E_1\) and \(E_2\) as above, we shall consider \(E_1\) and \(E_2\) as segments from \(\partial R\) in place of E. We denote by \(\mathcal {E}_R\) the set of all (new) segments from \(\partial R\). We now associate with each \(E\in \mathcal {E}_R\) the good triangle that has E as a side and denote it by \(\triangle _E\).

To summarize, we have subdivided the boundary \(\partial R\) of each ring \(R \in \mathcal {R}_j\), \(j=1,2\), into a set \(\mathcal {E}_R\) of segments with disjoint interiors (\(\partial R = \cup _{E\in \mathcal {E}_R} E\)) and associated with each \(E\in \mathcal {E}_R\) a good triangle \(\triangle _E \subset R\) such that E is a side of \(\triangle _E\) and the triangles \(\{\triangle _E\}_{E\in \mathcal {E}_R}\) have disjoint interiors. In addition, if \(E'\subset E\) is a subsegment of E, then we associate to \(E'\) the triangle \(\triangle _{E'} \subset \triangle _E\) with one side \(E'\) and the other two sides parallel to the respective sides of \(\triangle _E\); hence \(\triangle _{E'}\) is similar to \(\triangle _E\). We shall call \(\triangle _{E'}\) a good triangle as well. Fig. 2 illustrates the construction of good triangles (compare with Fig. 1).

Fig. 2
figure 2

The ring from Fig. 1 with good triangles (angles \(=\beta /2\))

Subdivision of the sets from \(\mathcal {U}\). We next subdivide each set \(U\in \mathcal {U}\) by using the good triangles constructed above. Suppose \(U\in \mathcal {U}\) is obtained from the intersection of rings \(R'\in \mathcal {R}_1\) and \(R''\in \mathcal {R}_2\). Then the boundary \(\partial U\) of U consists of two sets of segments \(\mathcal {E}_U'\) and \(\mathcal {E}_U''\), where each \(E\in \mathcal {E}_U'\) is a segment or subsegment of a segment from \(\mathcal {E}_{R'}\) and each \(E\in \mathcal {E}_U''\) is a segment or subsegment of a segment from \(\mathcal {E}_{R''}\). Clearly, \(\partial U = \cup _{E\in \mathcal {E}_U'\cup \mathcal {E}_U''} E\), and the segments from \(\mathcal {E}_U'\cup \mathcal {E}_U''\) have disjoint interiors. For each \(E\in \mathcal {E}_U'\cup \mathcal {E}_U''\), we denote by \(\triangle _E\) the good triangle with a side E, defined above.

Definition of the set \(\mathcal {T}_U\) of trapezoids associated with \(U\in \mathcal {U}\). We consider the collection of all nonempty sets of the form \(\triangle _{E_1} \cap \triangle _{E_2}\) with the properties: (a) \(E_1\in \mathcal {E}_{U'}\), \(E_2\in \mathcal {E}_{U''}\), and (b) There exists an isosceles trapezoid or an isosceles triangle \(T\subset \triangle _{E_1} \cap \triangle _{E_2}\) such that its two legs (of equal length) are contained in \(E_1\) and \(E_2\), respectively, and its height is not smaller than its larger base. We assume that T is a maximal isosceles trapezoid (or triangle) with these properties. We denote by \(\mathcal {T}_U\) the set of all trapezoids as above.

Definition of the collection \(\mathcal {A}_U\). We denote by \(\mathcal {A}_U\) the set of all maximal compact connected subsets A of \(U \setminus \cup _{T\in \mathcal {T}_U} T^\circ \).

Clearly, \(U = \cup _{T\in \mathcal {T}_U} T \cup _{A\in \mathcal {A}_U} A\), and the sets in \(\mathcal {T}_U\cup \mathcal {A}_U\) have disjoint interiors.

Figs. 3 and 8 illustrate of the above construction.

Fig. 3
figure 3

A set U with its good triangles. Note also the trapezoids

In the next lemma, we prove the “obvious” fact that as a result of the above subdivision of every set \(U\subset \mathcal {U}\) uncontrollably narrow and elongated subregions of U can only be realized as trapezoids from \(\mathcal {T}_U\).

Lemma 3.7

There exist constants \(c^{\star }>1\) and \(\beta ^\star >0\) depending only on \(N_0\), \(c_0\), and \(\beta \), such that if \(A\in \mathcal {A}_U\) for some \(U\in \mathcal {U}\), then \(d(A)^2 \le c^{\star } |A|\), and there exists a triangle \(\triangle \subset A\) whose minimum angle is \(\ge \beta ^\star \) such that \(|A| \le c^\star |\triangle |\). Here d(A) stands for the diameter of A.

Proof

Let \(U\in \mathcal {U}\). There are only two possibilities for U: either U is of the form \(U=(Q_2\setminus \tilde{Q}_2)\setminus \tilde{Q}_1\) or of the form \(U=(Q_1\cap Q_2)\setminus (\tilde{Q}_1\cup \tilde{Q}_2)\), where \(R_1=Q_1\setminus \tilde{Q}_1\) and \(R_2=Q_2\setminus \tilde{Q}_2\) are two rings (see Definition 3.1), one of which belongs to \(\mathcal {R}_1\) and the other to \(\mathcal {R}_2\), see Fig. 4.

Fig. 4
figure 4

Two possible configurations for U: \(U=(Q_2\setminus \tilde{Q}_2)\setminus \tilde{Q}_1\) (left) or \(U=(Q_1\cap Q_2)\setminus (\tilde{Q}_1\cup \tilde{Q}_2)\) (right)

We shall only consider the case \(U=(Q_2\setminus \tilde{Q}_2)\setminus \tilde{Q}_1\); the other case is similar. Let \(A\in \mathcal {A}_U\) (observe that if \(\mathcal {T}_U=\emptyset \), then \(\mathcal {A}_U=\{U\}\) and \(A=U\)). Define \(\gamma _2:=\partial Q_2\cap A\), \(\tilde{\gamma }_1:=\partial \tilde{Q}_1\cap A\), and \(\tilde{\gamma }_2:=\partial \tilde{Q}_2\cap A\). Clearly, \(\partial A\) consists of \(\gamma _2\), \(\tilde{\gamma }_1\), \(\tilde{\gamma }_2\) (if \(\tilde{\gamma }_2\ne \emptyset \)) and at most two base segments of trapezoids from \(\mathcal {T}_U\), see Fig. 5. Observe that from Definition 3.1, it follows that the number of edges of \(\partial A\) is \(\le 3N_0+2\).

Fig. 5
figure 5

One instance of \(A\in \mathcal {A}_U\) (dark shade) and trapezoids (light shade). Observe that in this case, one of the trapezoids arises from a segment of \(\tilde{Q}_2\)

Let \(E^\star \) be the longest edge (line segment) of \(\partial A\). There are four possibilities for \(E^\star \) that we consider separately below.

Case 1: \(E^\star \) is the base of a trapezoid in \(\mathcal {T}_U\). Then from the construction of the trapezoids in \(\mathcal {T}_U\), it readily follows that there exists a triangle \(\triangle \subset A\) with a side \(E^\star \) and minimal angle \(\ge \beta /2\). Hence,

$$\begin{aligned} d(A)^2 \le (3N_0+2)^2\ell (E^\star )^2 \le c(3N_0+2)^2|\triangle | \le c'|A|, \end{aligned}$$
(3.15)

and \(|A| \le d(A)^2 \le c|\triangle |\) as claimed.

Case 2: \(E^\star \) is an edge (line segment) of \(\gamma _2\). Let \(\triangle _{E^\star }\subset R_2\) be the good triangle with a side \(E^\star \). As such it follows that \(\triangle _{E^\star }\cap \tilde{Q}_2^\circ =\emptyset \). Denote by \(u_1\), \(u_2\), \(u_3\) the vertices of \(\triangle _{E^\star }\), where \(u_1\), \(u_2\) are the end points of \(E^\star \). Further, let \(u_4\) be the point on the side \([u_1, u_3]\) of \(\triangle _{E^\star }\) such that \(|u_1-u_4| = |u_1-u_3|/4\). Similarly, let \(u_5\) be the point on the side \([u_2, u_3]\) such that \(|u_2-u_5| = |u_2-u_3|/4\). Also, denote by \(u_6\) and \(u_7\) the points on \(E^\star \) such that \(|u_1-u_6| = |u_1-u_2|/4\) and \(|u_2-u_7| = |u_1-u_2|/4\). Let \(\triangle ':=[u_1, u_4, u_6]\) be the triangle with vertices \(u_1, u_4, u_6\), and let \(\triangle '':=[u_2, u_5, u_7]\). See Fig. 5.

If \(\tilde{\gamma }_1 \cap \triangle ' = \emptyset \), then \(\triangle '\subset A\), and as in (3.15), we conclude that

$$\begin{aligned} d(A)^2 \le 4^2(3N_0+2)^2 (\ell (E^\star )/4)^2 \le c|\triangle '| \le c|A| \end{aligned}$$

as claimed. The same argument applies whenever \(\tilde{\gamma }_1 \cap \triangle '' = \emptyset \).

Fig. 6
figure 6

Illustration of Case 2

Consider the case when \(\tilde{\gamma }_1 \cap \triangle ' \ne \emptyset \) and \(\tilde{\gamma }_1 \cap \triangle '' \ne \emptyset \), see Fig. 6. Define \(\tilde{\gamma }^\diamond _1:=\tilde{\gamma }_1\setminus (\triangle '\cup \triangle '')\). Let \(\tilde{\mathcal {E}}^\diamond _1\) be the set of all edges of \(\tilde{\gamma }^\diamond _1\). Clearly, \(\ell (\tilde{\gamma }^\diamond _1) \ge \ell (E^\star )/2\) and \(\# \tilde{\mathcal {E}}^\diamond _1 \le N_0\). Since each good triangle \(\triangle _E \subset R_1\) associated with an edge \(E\in \tilde{\mathcal {E}}^\diamond _1\) does not form a trapezoid in \(\mathcal {T}_U\), there exists a constant \(\hat{\beta }\), depending only on \(\beta \), such that \(0<\hat{\beta } < \beta /2\) and the triangle \(\hat{\triangle }_E\) (\(\hat{\triangle }_E \subset \triangle _E\)) with angles adjacent to E of size \(\hat{\beta }\) does not intersect \(E^\star \). Also, from the fact that E is contained in the trapezoid \([u_4, u_5, u_7, u_6]\), it follows that \(\hat{\triangle }_E\) cannot intersect the other two sides of \(\triangle _{E^\star }\). Hence, \(\hat{\triangle }_E\subset \triangle _{E^\star }\). Using this, we obtain

$$\begin{aligned} d(A)^2&\le (3N_0+2)^2 \ell (E^\star )^2 \le 4(3N_0+2)^2 \ell (\tilde{\gamma }_1^\diamond )^2 = 4(3N_0+2)^2 \left( \sum _{E\in \tilde{\mathcal {E}}_1^\diamond }\ell (E)\right) ^2 \nonumber \\&\le 4(3N_0+2)^2 N_0\sum _{E\in \tilde{\mathcal {E}}_1^\diamond }\ell (E)^2 \le c\sum _{E\in \tilde{\mathcal {E}}_1^\diamond }|\hat{\triangle }_{E}| \le c|A|, \end{aligned}$$
(3.16)

where we used that the triangles \(\hat{\triangle }_{E}\), \(E\in \tilde{\mathcal {E}}_1^\diamond \), are with disjoint interiors and \(\hat{\triangle }_{E}\subset A\). Observe also that if \(\triangle \in \{\hat{\triangle }_{E}: E\in \tilde{\mathcal {E}}_1^\diamond \}\) is a triangle of largest area from this set of triangles, then it follows from above that \(|A| \le d(A)^2 \le c|\triangle |\). This completes the proof of the lemma in Case 2.

Case 3: \(E^\star \) is an edge of \(\tilde{\gamma }_2\). In this case, the argument is just as the one in Case 2. We omit the details.

Case 4: \(E^\star \) is an edge of \(\tilde{\gamma }_1\) (recall that \(E^\star \) is the longest edge of \(\partial A\)). Let \(\triangle _{E^\star }\subset R_1\) be the good triangle with a side \(E^\star \). Two subcases present themselves here depending on whether \(\triangle _{E^\star }\cap \tilde{Q}_2^\circ =\emptyset \) or \(\triangle _{E^\star }\cap \tilde{Q}_2^\circ \ne \emptyset \).

Case 4 (a): \(\triangle _{E^\star }\cap \tilde{Q}_2^\circ =\emptyset \). Let \(u_1\), \(u_2\), \(u_3\) be the vertices of \(\triangle _{E^\star }\), where \(u_1\), \(u_2\) are the end points of \(E^\star \). We define the points \(u_4, u_5, u_6, u_7\) on the sides of \(\triangle _{E^\star }\) just as in Case 2 above, see Fig. 7.

Fig. 7
figure 7

Illustration of Case 4 (a)

Assume \(\gamma _2 \cap [u_4, u_5, u_3] \ne \emptyset \), where \([u_4, u_5, u_3]\) stands for the triangle with vertices \(u_4, u_5, u_3\). Pick a point \(u\in \gamma _2 \cap [u_4, u_5, u_3]\). Because of the convexity of \(Q_2\), the triangle \(\triangle :=[u_1, u_2, u]\) is contained in A, and hence

$$\begin{aligned} d(A)^2 \le (3N_0+2)^2\ell (E^\star )^2 \le c(3N_0+2)^2|\triangle | \le c'|A| \end{aligned}$$

as claimed.

Assume \(\gamma _2 \cap [u_4, u_5, u_3] = \emptyset \). Then \(\gamma _2\) intersects the segments \([u_4, u_6]\) and \([u_5, u_7]\). Set \(\gamma _2^\diamond := \gamma _2\cap [u_6, u_7, u_5, u_4]\), where \([u_6, u_7, u_5, u_4]\) is the trapezoid with vertices \(u_6, u_7, u_5, u_4\). Let \(\mathcal {E}_2^\diamond \) be the set of all edges of \(\gamma _2^\diamond \). Clearly, \(\ell (\gamma _2^\diamond ) > \ell (E^\star )/2\) and \(\# \mathcal {E}_2^\diamond \le N_0\). Just as in Case 2, we note that each good triangle \(\triangle _E \subset R_2\) associated with an edge \(E\in \mathcal {E}^\diamond _2\) does not form a trapezoid in \(\mathcal {T}_U\), and hence the triangle \(\hat{\triangle }_E\) (\(\hat{\triangle }_E \subset \triangle _E\)) with angles adjacent to E of size \(\hat{\beta }\) with \(0<\hat{\beta }< \beta /2\) as in Case 2 is contained in \(Q_2\cap \triangle _{E^\star }\). Therefore, as in (3.16),

$$\begin{aligned} d(A)^2&\le (3N_0+2)^2 \ell (E^\star )^2 \le 4(3N_0+2)^2 \ell (\gamma _2^\diamond )^2 = 4(3N_0+2)^2 \left( \sum _{E\in \mathcal {E}_2^\diamond }\ell (E)\right) ^2 \\&\le 4(3N_0+2)^2 N_0\sum _{E\in \mathcal {E}_2^\diamond }\ell (E)^2 \le c\sum _{E\in \mathcal {E}_2^\diamond }|\hat{\triangle }_{E}| \le c|A|. \end{aligned}$$

Furthermore, if \(\triangle \in \{\hat{\triangle }_{E}: E\in \mathcal {E}_2^\diamond \}\) is a triangle of largest area from this set of triangles, then \(|A| \le d(A)^2 \le c|\triangle |\), which completes the proof in this subcase.

Case 4 (b): \(\triangle _{E^\star }\cap \tilde{Q}_2^\circ \ne \emptyset \). Observe that \(A\cap \triangle _{E^\star } = (Q_2\setminus \tilde{Q}_2)\cap \triangle _{E^\star }\). Just as in Case 4 (a), one shows that

$$\begin{aligned} \ell (E^\star )^2 \le c|Q_2\cap \triangle _{E^\star }|. \end{aligned}$$
(3.17)

We next prove that there exists a constant \(c'>0\) such that

$$\begin{aligned} |Q_2\cap \triangle _{E^\star }| \le c'|(Q_2\cap \triangle _{E^\star })\setminus \tilde{Q}_2| = c'|A\cap \triangle _{E^\star }|. \end{aligned}$$
(3.18)

Define \(\tilde{Q}_2^\diamond :=\triangle _{E^\star }\cap \tilde{Q}_2\), \(\tilde{\gamma }_2^\diamond := \tilde{\gamma }_2\cap \triangle _{E^\star }\), and let \(\tilde{\mathcal {E}}_2^\diamond \) be the set of all edges of \(\tilde{\gamma }_2^\diamond \). Note that just as in Case 2 and Case 4 (a), each good triangle \(\triangle _E \subset R_2\) associated with an edge \(E\in \tilde{\mathcal {E}}^\diamond _2\) does not form with \(E^\star \) a trapezoid in \(\mathcal {T}_U\), and hence the triangle \(\hat{\triangle }_E\) (\(\hat{\triangle }_E \subset \triangle _E\)) with angles adjacent to E of size \(\hat{\beta }\) with \(0<\hat{\beta }< \beta /2\) as in Case 2 does not intersect \(E^\star \). We claim that there exists a constant \(c''>0\) such that

$$\begin{aligned} |\tilde{Q}_2\cap \triangle _{E^\star }| \le c''|(\cup _{E\in \tilde{\mathcal {E}}_2^\diamond } \hat{\triangle }_E) \cap \triangle _{E^\star }|. \end{aligned}$$
(3.19)

As in Case 4 (a), let \(\triangle _{E^\star }=:[u_1, u_2, u_3]\), where \(E^\star =[u_1, u_2]\). Denote by \(n_1\) and \(n_2\) the unit vectors that are orthogonal to the sides \([u_1, u_3]\) and \([u_2, u_3]\), respectively, and exterior to \(\triangle _{E^\star }\). Observe that since \(\triangle _{E^\star }\) is a good triangle, the angle made by the sides \([u_1, u_3]\) and \([u_2, u_3]\) is of size \(\ge \pi -\beta \ge 2\pi /3\). Further, denote by \(\tilde{\mathcal {E}}_2^\flat \) the set of all edges \(E\in \tilde{\mathcal {E}}_2^\diamond \) whose exterior (to \(\tilde{Q}_2\)) normal vectors make angles \(\ge \pi /2\) with \(n_1\) and \(n_2\). Clearly, \(\hat{\triangle }_E \cap ([u_1, u_3]\cup [u_2, u_3])=\emptyset \), \(\forall E\in \tilde{\mathcal {E}}_2^\flat \), and hence

$$\begin{aligned} \hat{\triangle }_E \subset (Q_2\cap \triangle _{E^\star })\setminus \tilde{Q}_2, \quad \forall E\in \tilde{\mathcal {E}}_2^\flat . \end{aligned}$$
(3.20)

On the other hand, since the convex set \(\tilde{Q}_2\) is with bounded eccentricity (Definition 3.1), it readily follows that there exist constants \(c_1, c_2 >0\) such that

$$\begin{aligned} \sum _{E\in \tilde{\mathcal {E}}_2^\flat }\ell (E) \ge c_1\sum _{E\in \tilde{\mathcal {E}}_2^\diamond }\ell (E) \ge c_2 d(\tilde{Q}_2\cap \triangle _{E^\star }). \end{aligned}$$
(3.21)

From (3.203.21) it follows that

$$\begin{aligned} |\tilde{Q}_2\cap \triangle _{E^\star }|&\le d(\tilde{Q}_2\cap \triangle _{E^\star })^2 \le c\left( \sum _{E\in \tilde{\mathcal {E}}_2^\flat }\ell (E)\right) ^2 \le cN_0 \sum _{E\in \tilde{\mathcal {E}}_2^\flat }\ell (E)^2 \\&\le cN_0 \sum _{E\in \tilde{\mathcal {E}}_2^\flat }|\hat{\triangle }_E| \le c\sum _{E\in \tilde{\mathcal {E}}_2^\diamond } |\hat{\triangle }_E \cap \triangle _{E^\star }|, \end{aligned}$$

which confirms (3.19).

To prove (3.18), we consider two cases. If \(|\tilde{Q}_2 \cap \triangle _{E^\star }| \le |Q_2 \cap \triangle _{E^\star }|/2\), then (3.18) follows trivially. Assume \(|\tilde{Q}_2 \cap \triangle _{E^\star }| > |Q_2 \cap \triangle _{E^\star }|/2\). Then using (3.19),

$$\begin{aligned} |Q_2 \cap \triangle _{E^\star }| \le 2|\tilde{Q}_2 \cap \triangle _{E^\star }| \le c|(\cup _{E\in \tilde{\mathcal {E}}_2^\diamond } \hat{\triangle }_E) \cap \triangle _{E^\star }| \le c'|(Q_2\cap \triangle _{E^\star })\setminus \tilde{Q}_2|, \end{aligned}$$

which completes the proof of (3.18).

Finally, (3.17) and (3.18) imply

$$\begin{aligned} d(A)^2 \le (3N_0+2)^2\ell (E^\star )^2 \le c|A\cap \triangle _{E^\star }| \le c|A|. \end{aligned}$$

Therefore, \(d(A)^2 \le c|A|\) as claimed.

In the case when \(|\tilde{Q}_2 \cap \triangle _{E^\star }| \le |Q_2 \cap \triangle _{E^\star }|/2\), just as in Case 4 (a), the triangle \(\triangle \in \{\hat{\triangle }_{E}: E\in \mathcal {E}_2^\diamond \}\) of largest area has the property \(|A| \le d(A)^2 \le c|\triangle |\). In the other case, the triangle \(\triangle \in \{\hat{\triangle }_{E}: E\in \tilde{\mathcal {E}}_2^\diamond , \hat{\triangle }_{E}\subset \triangle _{E^\star }\}\) of largest area has this property. The proof of Lemma 3.7 is complete.\(\square \)

In what follows, we shall need the following obvious property of the trapezoids from \(\mathcal {T}\).

Property 3.8

There exists a constant \(0< {\hat{c}}<1\) such that if \(L=[v_1,v_2]\) is one of the legs of a trapezoid \(T\in \mathcal {T}\) and \(T\subset \triangle _{E_1}\cap \triangle _{E_2}\) (see the construction of trapezoids), then for any \(x\in L\) with \(|x-v_j| \ge \rho \), \(j=1, 2\), for some \(\rho >0\) we have \(B(x, {\hat{c}}\rho ) \subset \triangle _{E_1}\cup \triangle _{E_2}\). Moreover, if \(D=[v_1,v_2]\) is one of the bases of the trapezoid T, then for any \(x\in D\) with \(|x-v_j| \ge \rho \), \(j=1, 2\), for some \(\rho >0\) we have \(B(x, {\hat{c}}\rho ) \subset \triangle _{E_1}\cap \triangle _{E_2}\).

Let \(\mathcal {A}:= \cup _{U\in \mathcal {U}} \mathcal {A}_U\) and \(\mathcal {T}:= \cup _{U\in \mathcal {U}} \mathcal {T}_U\). We have \(\Omega =\cup _{A\in \mathcal {A}} A \cup _{T\in \mathcal {T}}T\), and, clearly, the sets in \(\mathcal {A}\cup \mathcal {T}\) have disjoint interiors. From these we obtain the following representation of \(S_1(x)-S_2(x)\) for \(x\in \Omega \) which is not on any of the edges:

$$\begin{aligned} S_1(x)-S_2(x) = \sum _{A\in \mathcal {A}} c_A{\mathbbm {1}}_A(x) + \sum _{T\in \mathcal {T}} c_T{\mathbbm {1}}_T(x), \end{aligned}$$
(3.22)

where \(c_A\) and \(c_T\) are constants.

For future reference, we note that

$$\begin{aligned} \# \mathcal {A}\le cn \quad \hbox {and}\quad \#\mathcal {T}\le cn. \end{aligned}$$
(3.23)

These estimates follow readily by (3.14) and the fact that the number of edges of each \(U\in \mathcal {U}\) is \(\le 2N_0\).

Let \(0<s/2<1/p\), and assume \(\tau \le 1\). Fix \(t>0\), and let \(h\in \mathbb {R}^2\) with norm \(|h|\le t\). Write \(\nu :=|h|^{-1}h\), and assume \(\nu =:(\cos \theta , \sin \theta )\), \(-\pi <\theta \le \pi \).

We shall frequently use the following obvious identities: If S is a constant on a measurable set \(G \subset \mathbb {R}^2\) and \(H\subset G\) (H measurable), then

$$\begin{aligned} \Vert S\Vert _{L^\tau (G)} = |G|^{1/\tau -1/p}\Vert S\Vert _{L^p(G)} = |G|^{s/2}\Vert S\Vert _{L^p(G)} \end{aligned}$$
(3.24)

and

$$\begin{aligned} \Vert S\Vert _{L^\tau (H)} = (|H|/|G|)^{1/\tau }\Vert S\Vert _{L^\tau (G)}. \end{aligned}$$
(3.25)

We next estimate \(\Vert \Delta _h S_1\Vert _{L^\tau (G)}^\tau - \Vert \Delta _h S_2\Vert _{L^\tau (G)}^\tau \) for different subsets G of \(\Omega \).

3.4 Case 1

Let \(T \in \mathcal {T}\) be such that \(d(T)> 2t/{\hat{c}}\) with \({\hat{c}}\) the constant from Property 3.8. Define

$$\begin{aligned} T_h:=\{x\in \Omega : [x, x+h] \subset \Omega \;\;\hbox {and}\;\;[x, x+h]\cap T\ne \emptyset \}. \end{aligned}$$

We now estimate \(\Vert \Delta _h S_1\Vert _{L^\tau (T_h)}^\tau - \Vert \Delta _h S_2\Vert _{L^\tau (T_h)}^\tau \).

We may assume that T is an isosceles trapezoid contained in \(\triangle _{E_1}\cap \triangle _{E_2}\), where \(\triangle _{E_j}\) (\(j=1, 2\)) is a good triangle for a ring \(R_j\in \mathcal {R}_j\) and T is positioned so that its vertices are the points

$$\begin{aligned} v_1:=(-\delta _1/2, 0),\;\; v_2:=(\delta _1/2, 0),\;\; v_3:=(\delta _2/2, H),\;\; v_4:=(-\delta _2/2, H), \end{aligned}$$

where \(0\le \delta _2\le \delta _1\) and \(H >\delta _1\). Let \(L_1:=[v_1, v_4]\) and \(L_2:=[v_2, v_3]\) be the two equal (long) legs of T. We assume that \(L_1\subset E_1\) and \(L_2\subset E_2\). We denote by \(D_1:=[v_1, v_2]\) and \(D_2:=[v_3, v_4]\) the two bases of T. Set \(\mathcal {V}_T:=\{v_1, v_2, v_3, v_4\}\). See Fig. 8.

Fig. 8
figure 8

A trapezoid T

Furthermore, let \(\gamma \le \pi /2\) be the angle between \(D_1\) and \(L_1\), and assume that \(\nu =:(\cos \theta , \sin \theta )\) with \(\theta \in [\gamma , \pi ]\). The case \(\theta \in [-\gamma , 0]\) is just the same. The case when \(\theta \in [0, \gamma ]\cup [-\pi , -\gamma ]\) is considered similarly.

Define \(B_v:=B(v, 2t/{\hat{c}})\), \(v\in \mathcal {V}_T\),

$$\begin{aligned} \mathcal {A}_T^t&:=\big \{A\in \mathcal {A}: d(A) > t \quad \hbox {and}\quad A\cap (T+B(0,t))\ne \emptyset \big \},\\ {\mathfrak {A}}_T^t&:=\big \{A\in \mathcal {A}: d(A) \le t \quad \hbox {and}\quad A\cap (T+B(0,t))\ne \emptyset \big \}, \end{aligned}$$

and

$$\begin{aligned} \mathcal {T}_T^t:=\big \{T'\in \mathcal {T}: d(T') > 2t/{\hat{c}}\quad \hbox {and}\quad T'\cap (T+B(0,t))\ne \emptyset \big \},\\ {\mathfrak {T}}_T^t:=\big \{T'\in \mathcal {T}: d(T') \le 2t/{\hat{c}}\quad \hbox {and}\quad T'\cap (T+B(0,t))\ne \emptyset \big \}. \end{aligned}$$

Case 1 (a). If \([x, x+h]\in \triangle _{E_1}^\circ \), then \(\Delta _h S_1(x)=0\) because \(S_1\) is a constant on \(\triangle _{E_1}\). Hence no estimate is needed.

Case 1 (b). If \([x, x+h] \subset \cup _{v\in \mathcal {V}_{T}} B_v\), we estimate \(|\Delta _h S_1(x)|\) using the obvious inequality

$$\begin{aligned} |\Delta _h S_1(x)| \le |\Delta _h S_2(x)| + |S_1(x)-S_2(x)| + |S_1(x+h)-S_2(x+h)|. \end{aligned}$$
(3.26)

Clearly, the contribution of this case to estimating \(\Vert \Delta _h S_1\Vert _{L^\tau (T_h)}^\tau - \Vert \Delta _h S_2\Vert _{L^\tau (T_h)}^\tau \) is

$$\begin{aligned}&\le c \sum _{v\in \mathcal {V}_T}\sum _{A\in \mathcal {A}_T^t}\Vert S_1-S_2\Vert _{L^\tau (B_v\cap A)}^\tau + c\sum _{v\in \mathcal {V}_T}\sum _{T'\in \mathcal {T}_T^t}\Vert S_1-S_2\Vert _{L^\tau (B_v\cap T')}^\tau \\&\quad + c \sum _{v\in \mathcal {V}_T}\sum _{A\in {\mathfrak {A}}_T^t}\Vert S_1-S_2\Vert _{L^\tau (B_v\cap A)}^\tau + c\sum _{v\in \mathcal {V}_T}\sum _{T'\in {\mathfrak {T}}_T^t}\Vert S_1-S_2\Vert _{L^\tau (B_v\cap T')}^\tau \\&\le \sum _{A\in \mathcal {A}_T^t}ct^2d(A)^{\tau s-2}\Vert S_1-S_2\Vert _{L^p(A)}^\tau + \sum _{T'\in \mathcal {T}_T^t} ct^{1+\tau s/2}d(T')^{\tau s/2-1}\Vert S_1-S_2\Vert _{L^p(T')}^\tau \\&\quad + \sum _{A\in {\mathfrak {A}}_T^t} cd(A)^{\tau s}\Vert S_1-S_2\Vert _{L^p(A)}^\tau + \sum _{T'\in {\mathfrak {T}}_T^t} cd(T')^{\tau s}\Vert S_1-S_2\Vert _{L^p(T')}^\tau . \end{aligned}$$

Here we used these estimates, obtained using Lemma 3.7 and (3.24) or/and (3.25): (1) If \(A\in \mathcal {A}_T^t\) and \(v\in \mathcal {V}_T\), then

$$\begin{aligned} \Vert S_1-S_2\Vert _{L^\tau (B_v\cap A)}^\tau&=(|B_v|/|A|)\Vert S_1-S_2\Vert _{L^\tau (A)}^\tau \\&\le ct^2d(A)^{-2}\Vert S_1-S_2\Vert _{L^\tau (A)}^\tau \le ct^2d(A)^{\tau s-2}\Vert S_1-S_2\Vert _{L^p(A)}^\tau . \end{aligned}$$

(2) If \(T'\in \mathcal {T}_T^t\) and \(\delta _1(T') > 2t/{\hat{c}}\) with \(\delta _1(T')\) being the maximal base of \(T'\), then for any \(v\in \mathcal {V}_{T}\), we have

$$\begin{aligned} \Vert S_1-S_2\Vert _{L^\tau (B_v\cap T')}^\tau&=(|B_v|/|T'|)\Vert S_1-S_2\Vert _{L^\tau (T')}^\tau \le ct^2|T'|^{\tau s/2-1}\Vert S_1-S_2\Vert _{L^p(T')}^\tau \\&\le ct^2\delta _1(T')^{\tau s/2-1}d(T')^{\tau s/2-1}\Vert S_1-S_2\Vert _{L^p(T')}^\tau \\&\le ct^{1+\tau s/2}d(T')^{\tau s/2-1}\Vert S_1-S_2\Vert _{L^p(T')}^\tau , \end{aligned}$$

where we used that \(\tau s/2 <1\), which is equivalent to \(s<s+2/p\).

(3) If \(T'\in \mathcal {T}_T^t\) and \(\delta _1(T') \le 2t/{\hat{c}}\), then for any \(v\in \mathcal {V}_{T}\),

$$\begin{aligned} \Vert S_1-S_2\Vert _{L^\tau (B_v\cap T')}^\tau&=(|B_v\cap T'|/|T'|)\Vert S_1-S_2\Vert _{L^\tau (T')}^\tau \\&= |B_v\cap T'||T'|^{\tau s/2-1}\Vert S_1-S_2\Vert _{L^p(T')}^\tau \\&\le ct\delta _1(T')[\delta _1(T')d(T')]^{\tau s/2-1}\Vert S_1-S_2\Vert _{L^p(T')}^\tau \\&= ct\delta _1(T')^{\tau s/2}d(T')^{\tau s/2-1}\Vert S_1-S_2\Vert _{L^p(T')}^\tau \\&\le ct^{1+\tau s/2}d(T')^{\tau s/2-1}\Vert S_1-S_2\Vert _{L^p(T')}^\tau . \end{aligned}$$

(4) If \(A\in {\mathfrak {A}}_T^t\), then

$$\begin{aligned} \Vert S_1-S_2\Vert _{L^\tau (B_v\cap A)}^\tau&\le \Vert S_1-S_2\Vert _{L^\tau (A)}^\tau \le c|A|^{\tau s/2}\Vert S_1-S_2\Vert _{L^p(A)}^\tau \\&\le cd(A)^{\tau s}\Vert S_1-S_2\Vert _{L^p(A)}^\tau . \end{aligned}$$

(5) If \(T'\in {\mathfrak {T}}_T^t\), then

$$\begin{aligned} \Vert S_1-S_2\Vert _{L^\tau (B_v\cap T')}^\tau&\le \Vert S_1-S_2\Vert _{L^\tau (T')}^\tau \le c|T'|^{\tau s/2}\Vert S_1-S_2\Vert _{L^p(T')}^\tau \\&\le cd(T')^{\tau s}\Vert S_1-S_2\Vert _{L^p(T')}^\tau .\nonumber \end{aligned}$$
(3.27)

Case 1 (c). If \([x, x+h]\not \subset \cup _{v\in \mathcal {V}_{T}} B_v\) and \([x, x+h]\) intersects \(D_1\) or \(D_2\), then \(\delta _1> 2t/{\hat{c}}>2t\) or \(\delta _2 >2t\) and hence \([x, x+h]\subset \triangle _{E_1}\cap \triangle _{E_2}\), which implies \(\Delta _h S_1(x)=0\). No estimate is needed.

Case 1 (d). Let \({I_T^t}\) be the set defined by

$$\begin{aligned} {I_T^t}:= \{x\in T: x \;\; \hbox {is between }L_1\hbox { and }L_1+{\varepsilon }e_1\} \setminus \left( B(v_1, t/{\hat{c}})\cup B(v_4, t/{\hat{c}})\right) , \end{aligned}$$
(3.28)

where \({\varepsilon }:= (\delta _1-\delta _2)M^{-1}t\), \(e_1:= \langle 1, 0 \rangle \), and \(M:= |L_1|=|L_2|\). Set \({J_T^h}:={I_T^t}+ [0, h]\). See Fig. 8.

In this case, we again use (3.26) to estimate \(|\Delta _h S_1(x)|\). We obtain

$$\begin{aligned} \Vert \Delta _h S_1\Vert _{L^\tau ({I_T^t})}^\tau&\le \Vert \Delta _h S_2\Vert _{L^\tau ({I_T^t})}^\tau + \Vert S_1-S_2\Vert _{L^\tau ({I_T^t})}^\tau \\&\quad + \sum _{A\in \mathcal {A}_T^t}\Vert S_1-S_2\Vert _{L^\tau ({J_T^h}\cap A)}^\tau + \sum _{A\in {\mathfrak {A}}_T^t}\Vert S_1-S_2\Vert _{L^\tau ({J_T^h}\cap A)}^\tau . \end{aligned}$$

Clearly, \(|{I_T^t}| \le ct\delta _1(T)\) and \(|T| \sim \delta _1(T)d(T)\). Then using (3.243.25), we infer

$$\begin{aligned} \Vert S_1-S_2\Vert _{L^\tau ({I_T^t})}^\tau&=(|{I_T^t}|/|T|)\Vert S_1-S_2\Vert _{L^\tau (T)}^\tau \le ct d(T)^{-1}\Vert S_1-S_2\Vert _{L^\tau (T)}^\tau \\&= ct d(T)^{-1}|T|^{\tau s/2}\Vert S_1-S_2\Vert _{L^p(T)}^\tau \le ct d(T)^{\tau s-1}\Vert S_1-S_2\Vert _{L^p(T)}^\tau . \end{aligned}$$

Similarly, for \(A\in \mathcal {A}_T^t\), we use that \(|{J_T^h}\cap A| \le ctd(A)\) and \(|A|\sim d(A)^2\) to obtain

$$\begin{aligned} \Vert S_1-S_2\Vert _{L^\tau ({J_T^h}\cap A)}^\tau&\le ct d(A)\Vert S_1-S_2\Vert _{L^\infty (A)}^\tau = ct d(A)|A|^{-\tau /p}\Vert S_1-S_2\Vert _{L^p(A)}^\tau \\&\le ct d(A)^{1-2\tau /p}\Vert S_1-S_2\Vert _{L^p(A)}^\tau \le ct d(A)^{\tau s-1}\Vert S_1-S_2\Vert _{L^p(A)}^\tau . \end{aligned}$$

For \(A\in {\mathfrak {A}}_T^t\), we have

$$\begin{aligned} \Vert S_1-S_2\Vert _{L^\tau ({J_T^h}\cap A)}^\tau&\le \Vert S_1-S_2\Vert _{L^\tau (A)}^\tau = |A|^{\tau s/2}\Vert S_1-S_2\Vert _{L^p(A)}^\tau \\&\le cd(A)^{\tau s}\Vert S_1-S_2\Vert _{L^p(A)}^\tau . \end{aligned}$$

Putting the above estimates together, we get

$$\begin{aligned} \Vert \Delta _h S_1\Vert _{L^\tau ({I_T^t})}^\tau&\le \Vert \Delta _h S_2\Vert _{L^\tau ({I_T^t})}^\tau + ct d(T)^{\tau s-1}\Vert S_1-S_2\Vert _{L^p(T)}^\tau \\&\quad + \!\!\!\sum _{A\in \mathcal {A}_T^t} ct d(A)^{\tau s-1}\Vert S_1-S_2\Vert _{L^\infty (A)}^\tau + \sum _{A\in {\mathfrak {A}}_T^t}cd(A)^{\tau s}\Vert S_1-S_2\Vert _{L^p(A)}^\tau . \end{aligned}$$

Case 1 (e) (Main). Let \(T_h^\star \subset T_h\) be defined by

$$\begin{aligned} T_h^\star := \{x\in T_h: [x, x+h]\cap L_1 \ne \emptyset ,\; x\not \in {I_T^t},\; [x,x+ h] \not \subset \bigcup _{v\in \mathcal {V}_{T}} B_v\}. \end{aligned}$$
(3.29)

We next estimate \(\Vert \Delta _h^k S_1\Vert _{L^\tau (T_h^\star )}^\tau \).

Recall that by assumption \(h=|h|\nu \) with \(\nu =:(\cos \theta , \sin \theta )\) and \(\theta \in [\gamma , \pi ]\), where \(\gamma \le \pi /2\) is the angle between \(D_1\) and \(L_1\).

Let \(x\in T_h^\star \). With the notation \(x=(x_1, x_2)\), we let \((-a, x_2)\in L_1\) and \((a, x_2)\in L_2\), \(a>0\), be the points of intersection of the horizontal line through x with \(L_1\) and \(L_2\). Set \(b:=2a-{\varepsilon }\) with \({\varepsilon }:= (\delta _1-\delta _2)M^{-1}t\), see (3.28).

We associate the points \(x+be_1\) and \(x+be_1+h\) with x and \(x+h\). A simple geometric argument shows that \(x+be_1\in \triangle _{E_1}\setminus T\), while \(x+be_1+h \in T^\circ \).

Now, using that \(S_1 = {\text {constant}}\) on \(\triangle _{E_1}^\circ \), we have \(S_1(x)=S_1(x+be_1)\), and since \(S_2 = {\text {constant}}\) on \(\triangle _{E_2}^\circ \), we have \(S_2(x+h)=S_2(x+be_1+h)\). We use these two identities to obtain

$$\begin{aligned} S_1(x+h)-S_1(x)&= S_2(x+be_1+h)-S_2(x+be_1)\\&\quad +[S_1(x+h)-S_2(x+h)] - [S_1(x+be_1)-S_2(x+be_1)], \end{aligned}$$

and, therefore,

$$\begin{aligned} |\Delta _h S_1(x)|&\le |\Delta _h S_2(x+be_1)|\\&\quad +|S_1(x+h)-S_2(x+h)| + |S_1(x+be_1)-S_2(x+be_1)|.\nonumber \end{aligned}$$
(3.30)

Some words of explanation are in order here. The purpose of the set \({I_T^t}\) is that there is one-to-one correspondence between pairs of points \(x\in T^\circ \setminus {I_T^t}\), \(x+h\in \triangle _{E_2}\setminus T\) and \(x+be_1 \in \triangle _{E_1}\setminus T\), \(x+be_1+h\in T^\circ \). Due to \(\delta _2<\delta _1\), this would not be true if \({I_T^t}\) was not removed from \(T^\circ \). Thus there is one-to-one correspondence between the differences \(|\Delta _h S_1(x)|\) and \(|\Delta _h S_2(x+be_1)|\) in the case under consideration. Also, it is important that \(\Delta _h S_1(x+be_1)=0\), and hence \(|\Delta _h S_2(x+be_1)|\) need not be used to estimate \(|\Delta _h S_1(x+be_1)|\).

Another important point here is that \(x+h\not \in T^\circ \) and \(x+be_1 \not \in T^\circ \). Therefore, no quantities \(|S_1(x)-S_2(x)|\) with \(x\in T^\circ \setminus {I_T^t}\) are involved in (3.30), which is critical.

Observe that for \(x\in T_h^\star \), we have

$$\begin{aligned}{}[x,x+h] \not \subset \bigcup _{v\in \mathcal {V}_{T}} B_v, \quad \hbox {and hence}\quad [x+be_1,x+be_1+h] \not \subset \bigcup _{v\in \mathcal {V}_{T}} B_v. \end{aligned}$$

Therefore, by Property 3.8, it follows that \([x, x+h]\) and \([x+be_1,x+be_1+h]\) do not intersect any trapezoid \(T'\in \mathcal {T}\), \(T'\ne T\).

Let \(T_h^{\star \star } :=\{x+be_1: x\in T_h^\star \}\). For any \(A\in \mathcal {A}\) and \(t>0\), define

$$\begin{aligned} A_t:=\{x\in A: {\text {dist}}(x, \partial A) \le t\}. \end{aligned}$$
(3.31)

From all of the above, we get

$$\begin{aligned} \Vert \Delta _h S_1\Vert _{L^\tau (T_h^\star )}^\tau \le \Vert \Delta _h S_2\Vert _{L^\tau (T_h^{\star \star })}^\tau + \sum _{A\in \mathcal {A}_T^t}\Vert S_1-S_2\Vert _{L^\tau (A_t)}^\tau + \sum _{A\in {\mathfrak {A}}_T^t}\Vert S_1-S_2\Vert _{L^\tau (A)}^\tau . \end{aligned}$$

Now, using that \(|A_t| \le ctd(A)\) and \(|A|\sim d(A)^2\) for \(A\in \mathcal {A}_T^t\), we obtain

$$\begin{aligned} \Vert S_1-S_2\Vert _{L^\tau (A_t)}^\tau&=(|A_t|/|A|)|A|^{\tau s/2}\Vert S_1-S_2\Vert _{L^p(A)}^\tau \nonumber \\&\le ct d(A)^{\tau s-1}\Vert S_1-S_2\Vert _{L^p(A)}^\tau . \end{aligned}$$
(3.32)

For \(A\in {\mathfrak {A}}_T^t\), we use that \(|A|\sim d(A)^2\) and obtain

$$\begin{aligned} \Vert S_1-S_2\Vert _{L^\tau (A)}^\tau =|A|^{\tau s/2}\Vert S_1-S_2\Vert _{L^p(A)}^\tau \le cd(A)^{\tau s}\Vert S_1-S_2\Vert _{L^p(A)}^\tau . \end{aligned}$$
(3.33)

Inserting these estimates above, we get

$$\begin{aligned} \Vert \Delta _h S_1\Vert _{L^\tau (T_h^\star )}^\tau \le \Vert \Delta _h S_2\Vert _{L^\tau (T_h^{\star \star })}^\tau&+ \sum _{A\in \mathcal {A}_T^t} ct d(A)^{\tau s-1}\Vert S_1-S_2\Vert _{L^p(A)}^\tau \nonumber \\&+ \sum _{A\in {\mathfrak {A}}_T^t}cd(A)^{\tau s}\Vert S_1-S_2\Vert _{L^p(A)}^\tau . \end{aligned}$$
(3.34)

3.5 Case 2

Let \(\Omega _h^\star \) be the set of all \(x\in \Omega \) such that \([x, x+h] \subset \Omega \) and \([x, x+h] \cap T = \emptyset \) for all \(T\in \mathcal {T}\) with \(d(T) \ge 2t/{\hat{c}}\). To estimate \(|\Delta _h S_1(x)|\), we again use (3.26). With the notation from (3.31), we get

$$\begin{aligned} \Vert \Delta _hS_1\Vert _{L^\tau (\Omega _h^\star )}^\tau \le \Vert \Delta _h S_2\Vert _{L^\tau (\Omega _h^\star )}^\tau&+ \sum _{T\in \mathcal {T}: d(T) \le 2t/{\hat{c}}} \Vert S_1-S_2\Vert _{L^\tau (T)}^\tau \\ + \sum _{A\in \mathcal {A}: d(A) >t} \Vert S_1-S_2\Vert _{L^\tau (A_t)}^\tau&+ \sum _{A\in \mathcal {A}: d(A) \le t} \Vert S_1-S_2\Vert _{L^\tau (A)}^\tau . \end{aligned}$$

For the first sum above, we have just as in (3.27),

$$\begin{aligned} \sum _{T\in \mathcal {T}: d(T) \le 2t/{\hat{c}}} \Vert S_1-S_2\Vert _{L^\tau (T)}^\tau \le \sum _{T\in \mathcal {T}: d(T) \le 2t/{\hat{c}}} c d(T)^{\tau s}\Vert S_1-S_2\Vert _{L^p(T)}^\tau . \end{aligned}$$

We estimate the other two sums as in (3.32) and (3.33). We obtain

$$\begin{aligned} \Vert \Delta _hS_1\Vert _{L^\tau (\Omega _h^\star )}^\tau \le \Vert \Delta _h S_2\Vert _{L^\tau (\Omega _h^\star )}^\tau&+ \sum _{T\in \mathcal {T}: d(T) \le 2t/{\hat{c}}}c d(T)^{\tau s}\Vert S_1-S_2\Vert _{L^p(T)}^\tau \\ + \sum _{A\in \mathcal {A}: d(A) >t}ct d(A)^{\tau s-1}\Vert S_1-S_2\Vert _{L^p(A)}^\tau&+ \sum _{A\in \mathcal {A}: d(A) \le t}cd(A)^{\tau s}\Vert S_1-S_2\Vert _{L^p(A)}^\tau . \end{aligned}$$

It is an important observation that each trapezoid \(T\in \mathcal {T}\) with \(d(T) > 2t/{\hat{c}}\) may share trapezoids \(T'\in {\mathfrak {T}}_T^t\) and sets \(A\in {\mathfrak {A}}_T^t\) with only finitely many trapezoids with the same properties. Also, for every such trapezoid T, we have \(\# \mathcal {T}_T^t \le c\) and \(\# \mathcal {A}_T^t \le c\) with \(c>0\) a constant depending only on the structural constants of the setting. Therefore, in the above estimates, only finitely many norms may overlap at a time. Putting all of them together, we obtain

$$\begin{aligned} \omega _1(S_1, t)_\tau ^\tau \le \omega _1(S_2, t)_\tau ^\tau + Y_1 + Y_2, \end{aligned}$$

where

$$\begin{aligned} Y_1&= \sum _{A\in \mathcal {A}: d(A)>t}ct d(A)^{\tau s-1}\Vert S_1-S_2\Vert _{L^p(A)}^\tau \\&\quad + \sum _{A\in \mathcal {A}: d(A) >t}ct^2d(A)^{\tau s-2}\Vert S_1-S_2\Vert _{L^p(A)}^\tau \\&\quad + \sum _{A\in \mathcal {A}: d(A) \le t}cd(A)^{\tau s}\Vert S_1-S_2\Vert _{L^p(A)}^\tau \end{aligned}$$

and

$$\begin{aligned} Y_2&= \sum _{T\in \mathcal {T}: d(T)> 2t/{\hat{c}}}ct d(T)^{\tau s-1}\Vert S_1-S_2\Vert _{L^p(T)}^\tau \\&\quad + \sum _{T\in \mathcal {T}: d(T) > 2t/{\hat{c}}}ct^{1+\tau s/2}d(T)^{\tau s/2-1}\Vert S_1-S_2\Vert _{L^p(T)}^\tau \\&\quad + \sum _{T\in \mathcal {T}: d(T) \le 2t/{\hat{c}}}c d(T)^{\tau s}\Vert S_1-S_2\Vert _{L^p(T)}^\tau . \end{aligned}$$

We now turn to the estimation of \(|S_1|_{B^s_\tau }\). Using the above and interchanging the order of integration and summation, we get

$$\begin{aligned} |S_1|_{B^s_\tau }^\tau =\int _0^\infty t^{-s\tau -1}\omega _1(S_1, t)_\tau ^\tau \mathrm{d}t \le |S_2|_{B^s_\tau }^\tau + Z_1 + Z_2, \end{aligned}$$

where

$$\begin{aligned} Z_1&= \sum _{A\in \mathcal {A}}cd(A)^{\tau s-1}\Vert S_1-S_2\Vert _{L^p(A)}^\tau \int _0^{d(A)}t^{-\tau s} \mathrm{d}t\\&\quad + \sum _{A\in \mathcal {A}}cd(A)^{\tau s-2}\Vert S_1-S_2\Vert _{L^p(A)}^\tau \int _0^{d(A)}t^{-\tau s+1} \mathrm{d}t\\&\quad + \sum _{A\in \mathcal {A}}cd(A)^{\tau s}\Vert S_1-S_2\Vert _{L^p(A)}^\tau \int _{d(A)}^\infty t^{-\tau s-1} \mathrm{d}t \end{aligned}$$

and

$$\begin{aligned} Z_2&= \sum _{T\in \mathcal {T}}cd(T)^{\tau s-1}\Vert S_1-S_2\Vert _{L^p(T)}^\tau \int _0^{{\hat{c}}d(T)/2}t^{-\tau s} \mathrm{d}t\\&\quad + \sum _{T\in \mathcal {T}}cd(T)^{\tau s/2-1}\Vert S_1-S_2\Vert _{L^p(T)}^\tau \int _0^{{\hat{c}}d(T)/2}t^{-\tau s/2} \mathrm{d}t\\&\quad + \sum _{T\in \mathcal {T}}cd(T)^{\tau s}\Vert S_1-S_2\Vert _{L^p(T)}^\tau \int _{{\hat{c}}d(T)/2}^\infty t^{-\tau s-1} \mathrm{d}t. \end{aligned}$$

Observe that \(-\tau s>-1\) is equivalent to \(s/2<1/p\), which is one of the assumptions, and \(-\tau s/2>-1\) is equivalent to \(s<s+2/p\), which is obvious. Therefore, all of the above integrals are convergent, and we obtain

$$\begin{aligned} |S_1|_{B^s_\tau }^\tau \le |S_2|_{B^s_\tau }^\tau + \sum _{A\in \mathcal {A}}c\Vert S_1-S_2\Vert _{L^p(A)}^\tau + \sum _{T\in \mathcal {T}}c\Vert S_1-S_2\Vert _{L^p(T)}^\tau . \end{aligned}$$

Finally, applying Hölder’s inequality and using (3.23), we arrive at

$$\begin{aligned} |S_1|_{B^s_\tau }^\tau \le |S_2|_{B^s_\tau }^\tau&+ c\left( \#\mathcal {A}\right) ^{\tau (1/\tau -1/p)}\left( \sum _{A\in \mathcal {A}} \Vert S_1-S_2\Vert _{L^p(A)}^p\right) ^{\tau /p}\\&+ c\left( \#\mathcal {T}\right) ^{\tau (1/\tau -1/p)}\left( \sum _{T\in \mathcal {T}} \Vert S_1-S_2\Vert _{L^p(T)}^p\right) ^{\tau /p}\\&\le cn^{\tau (1/\tau -1/p)}\Vert S_1-S_2\Vert _{L^p(\Omega )}^{\tau } = cn^{\tau s/2}\Vert S_1-S_2\Vert _{L^p(\Omega )}^{\tau }. \end{aligned}$$

This confirms estimate (3.10). The proof in the case when \(\tau >1\) is the same. \(\square \)

4 Nonlinear Approximation from Smooth Splines

In this section, we focus on Bernstein estimates in nonlinear approximation in \(L^p\), \(0<p< \infty \), from regular nonnested smooth piecewise polynomial functions in \(\mathbb {R}^2\).

4.1 Setting and Approximation Tool

We first elaborate on our setting and consider examples. As in Sect. 3, we consider two versions of the class of regular piecewise polynomials \(\mathcal {S}(n, k)\) of degree \(k-1\) with \(k\ge 2\) over n rings of maximum smoothness, depending on whether \(\Omega \) is compact or \(\Omega =\mathbb {R}^2\).

Case 1: Assume \(\Omega \) is a compact polygonal domain in \(\mathbb {R}^2\) that can be represented as the union of \(n_0\) rings with disjoint interiors, see Condition 3.2. We denote by \(\mathcal {S}(n, k)\) (\(n\ge n_0\)) the set of all piecewise polynomials S of the form

$$\begin{aligned} S=\sum _{j=1}^n P_j{\mathbbm {1}}_{R_j}, \quad S\in C^{k-2}(\Omega ), \quad P_j\in \Pi _{k-1}, \end{aligned}$$
(4.1)

where \(R_1, \dots , R_n\) are rings in the sense of Definition 3.1 with disjoint interiors such that \(\Omega =\cup _{j=1}^n R_j\). Recall that \(\Pi _{k-1}\) stands for the set of all polynomials of degree \(\le k-1\) in two variables and \(S\in C^{k-2}(\Omega )\) means that all partial derivatives \(\partial ^\alpha S \in C(\Omega )\), \(|\alpha | \le k-2\).

Case 2: \(\Omega =\mathbb {R}^2\). In this case, we denote by \(\mathcal {S}(n, k)\) the set of all piecewise polynomials S of degree \(k-1\) on \(\mathbb {R}^2\) of the form (4.1), where \(R_1, \dots , R_n\) are rings with disjoint interiors such that the support \(\Lambda =\cup _{j=1}^n R_j\) of S is a ring in the sense of Definition 3.1.

We denote by \(S_n^k(f)_p\) the best approximation of \(f\in L^p(\Omega )\) from \(\mathcal {S}(n,k)\) in \(L^p(\Omega )\), \(0< p<\infty \); i.e.,

$$\begin{aligned} S_n^{k}(f)_p:= \inf _{S\in \mathcal {S}(n, k)}\Vert f-S\Vert _{L^p}. \end{aligned}$$
(4.2)

Remark

Observe that in our setting, the splines are of maximum smoothness, and this is critical for our development. As will be shown in Example 4.4 below in the nonnested case our Bernstein type inequality is not valid in the case when the smoothness of the splines is not maximal.

We next consider several scenarios for constructing regular piecewise polynomials of maximum smoothness:

1. Piecewise linear functions induced by nested triangulations. Suppose that \(\mathcal {T}_0\) is an initial subdivision of \(\Omega \) into triangles that obey the minimum angle condition and is with no hanging vertices in the interior of \(\Omega \). In the case of \(\Omega =\mathbb {R}^2\), we assume for simplicity that the triangles \(\triangle \in \mathcal {T}_0\) are of similar areas; i.e., \(c_1 \le |\triangle _1|/|\triangle _2| \le c_2\) for all \(\triangle _1, \triangle _2\in \mathcal {T}_0\). Next we subdivide each triangle \(\triangle \in \mathcal {T}_0\) into 4 triangles by introducing the midpoints on the sides of \(\triangle \). The result is a triangulation \(\mathcal {T}_1\) of \(\Omega \). In the same way, we define the triangulations \(\mathcal {T}_2\), \(\mathcal {T}_3\), etc. Each triangulation \(\mathcal {T}_j\) supports Courant hat functions (linear finite elements) \(\varphi _\theta \), each of them supported on the union \(\theta \) of all triangles from \(\mathcal {T}_j\) that have a common vertex, say, v. Thus \(\varphi _\theta (v)=1\), \(\varphi _\theta \) takes values zero at all other vertices of triangles from \(\mathcal {T}_j\), and \(\varphi _\theta \) is continuous and piecewise linear over the triangles from \(\mathcal {T}_j\). Clearly, each piecewise linear function over the triangles from \(\mathcal {T}_j\) can be represented as a linear combination of Courant hat functions like these.

Denote by \(\Theta _j\) the set of all supports \(\theta \) of Courant elements supported by \(\mathcal {T}_j\) and set \(\Theta := \cup _{j\ge 0} \Theta _j\). Consider the nonlinear set \({\mathbb {S}}_n\) of all piecewise linear functions S of the form

$$\begin{aligned} S=\sum _{\theta \subset \mathcal {M}_n} c_\theta \varphi _\theta , \end{aligned}$$

where \(\mathcal {M}_n\subset \Theta \) and \(\# \mathcal {M}_n \le n\); the elements \(\theta \in \mathcal {M}_n\) may come from different levels and locations. It is not hard to see that \({\mathbb {S}}_n \subset S(cn, 2)\), see [6].

2. General piecewise linear functions. More generally, one can consider piecewise linear functions S of the form

$$\begin{aligned} S=\sum _{\theta \subset \mathcal {M}_n} c_\theta \varphi _\theta , \end{aligned}$$

where \(\{\varphi _\theta \}\) are Courant hat functions as above, \(\# \mathcal {M}_n \le n\), and \(\mathcal {M}_n\) consists of cells \(\theta \) as above that are not necessarily induced by a hierarchical collection of triangulations of \(\Omega \); however, there exists an underlying subdivision of \(\Omega \) into rings obeying the conditions from Sect. 3.1.

3. Piecewise quadratic or cubic splines. The \(C^1\) quadratic box-splines on the four-directional mesh (the so-called “Powell–Zwart finite elements”) and the piecewise cubics in \(\mathbb {R}^2\) or on a rectangular domain, endowed with the Powell–Sabin triangulation generated by a uniform 6-direction mesh, provide examples of quadratic and cubic splines of maximum smoothness.

Other examples are to be identified or developed.

Splines with defect. To make the difference between approximation from nonnested and nested splines more transparent and for future references, we now introduce the splines with arbitrary smoothness. Given a set \(\Omega \subset \mathbb {R}^2\) with polygonal boundary or \(\Omega :=\mathbb {R}^2\), \(k\ge 2\), and \(0\le r\le k-1\), we denote by \(\mathcal {S}(n, k, r)\) (\(n\ge n_0\)) the set of all piecewise polynomials S of the form

$$\begin{aligned} S=\sum _{j=1}^n P_j{\mathbbm {1}}_{R_j}, \quad S\in C^{r-1}(\Omega ), \quad P_j\in \Pi _{k-1}, \end{aligned}$$
(4.3)

where \(R_1, \dots , R_n\) are rings with disjoint interiors such that \(\Omega =\cup _{j=1}^n R_j\). We set

$$\begin{aligned} S_n^{k, r}(f)_p:= \inf _{S\in \mathcal {S}(n, k, r)}\Vert f-S\Vert _{L^p}. \end{aligned}$$
(4.4)

4.2 Jackson Estimate

Jackson estimates in spline approximation are relatively easy to prove. Such estimates (also in anisotropic settings) are established in [3, 6]. For example, the Jackson estimate we need in the case of approximation from piecewise linear functions (\(k=2\)) follows from [6, Theorem 3.6] and takes the form:

Theorem 4.1

Let \(0<p<\infty \), \(s>0\), and \(1/\tau =s/2+1/p\). Assume \(\Omega =\mathbb {R}^2\) or \(\,\Omega \subset \mathbb {R}^2\) is a compact set with polygonal boundary and an initial triangulation consisting of \(\le n_0\) triangles with no hanging interior vertices and obeying the minimum angle condition. Then for any \(f\in B^{s,2}_{\tau }\), we have \(f\in L^p(\Omega )\) and

$$\begin{aligned} S_n^2(f)_p \le cn^{-s/2}|f|_{B^{s,2}_{\tau }} , \quad n\ge n_0. \end{aligned}$$
(4.5)

Consequently, for any \(f\in L^p(\Omega )\),

$$\begin{aligned} S_n^2(f)_p \le cK(f, n^{-s/2}) , \quad n\ge n_0. \end{aligned}$$
(4.6)

Here \(K(f, t)=K(f, t; L^p, B^s_\tau )\) is the K-functional defined in (3.6) and \(c>0\) is a constant depending only on sp, and the structural constants of the setting.

Similar Jackson and direct estimates for nonlinear approximation from splines of degrees \(\ge 2\) and of maximum smoothness do not follow automatically from the results in [3], the reason being the fact that the basis functions for splines of degree 2 and 3 that we are familiar with are not stable. The stability is required in [3]. The problem for establishing Jackson estimates for approximation from splines of degree \(\ge 2\) of maximum smoothness remains open.

4.3 Bernstein Estimate in the Nonnested Case

We come now to one of the main results of this article. Here we operate in the setting described above in Sect. 4.1.

Theorem 4.2

Let \(0< p<\infty \), \(k\ge 1\), \(0<s/2<k-1+1/p\), and \(1/\tau =s/2+1/p\). Then for any \(S_1, S_2\in \mathcal {S}(n, k)\), \(n\ge n_0\), we have

$$\begin{aligned} |S_1|_{B^{s,k}_{\tau }}&\le |S_2|_{B^{s, k}_{\tau }} + cn^{s/2}\Vert S_1-S_2\Vert _{L^p} \quad \hbox {if} \;\; \tau \ge 1, \quad \hbox {and} \end{aligned}$$
(4.7)
$$\begin{aligned} |S_1|_{B^{s, k}_{\tau }}^\tau&\le |S_2|_{B^{s, k}_{\tau }}^\tau + cn^{\tau s/2}\Vert S_1-S_2\Vert _{L^p}^\tau \quad \hbox {if} \;\; \tau < 1, \end{aligned}$$
(4.8)

where the constant \(c>0\) depends only on spk, and the structural constants of the setting; \(n_0\) is from Condition 3.2.

An immediate consequence of this theorem is the inverse estimate given in

Corollary 4.3

Let \(0< p<\infty \), \(k\ge 1\), \(0<s/2<k-1+1/p\), and \(1/\tau =s/2+1/p\). Set \(\lambda :=\min \{\tau , 1\}\). Then for any \(f\in L^p(\Omega )\), we have

$$\begin{aligned} K(f, n^{-s/2}) \le cn^{-s/2} \left( \sum _{\ell =n_0}^n \frac{1}{\ell }\left[ \ell ^{s/2} S_\ell ^k(f)_p\right] ^\lambda + \Vert f\Vert _p^\lambda \right) ^{1/\lambda }, \quad n\ge n_0. \end{aligned}$$
(4.9)

Here \(K(f, t)=K(f, t; L^p, B^s_\tau )\) is the K-functional defined just as in (3.6), and \(c>0\) is a constant depending only on spk, and the structural constants of the setting.

The proof of this corollary is just a repetition of the proof of Theorem 3.6. We omit it.

In turn, estimates (4.6) and (4.9) imply a characterization of the approximation spaces associated with nonlinear nonnested piecewise linear approximation, see (1.7).

The proof of Theorem 4.2 relies on the idea we used in the proof of Theorem 3.4. However, there is an important complication to overcome. The fact that many rings with relatively small supports can be located next to a large ring is a major obstacle in implementing this idea in the case of smooth splines. An additional construction is needed. To make the proof more accessible, we shall proceed in two steps. We first develop the needed additional construction and implement it in Sect. 4.4 to prove the respective Bernstein estimate in the nested case, and then we present the proof of Theorem 4.2 in Sect. 4.5.

Before we proceed with the proofs of the Bernstein estimates, we show in the next example that the assumption that in our setting the splines are of maximum smoothness is essential.

Example 4.4

We now show that estimates (4.74.8) fail without the assumption that \(S_1,S_2\in C^{k-2}(\Omega )\) (i.e., both splines have maximum smoothness). We shall only consider the case when \(k=2\) and \(\tau \le 1\). Let \(\Omega =[-1,1]\times [0,1]\) and \(0<\varepsilon <1/4\). Set

$$\begin{aligned} S_1(x):=x_1{\mathbbm {1}}_{[0,1]^2}(x),\quad S_2(x):=x_1{\mathbbm {1}}_{[\varepsilon ,1]\times [0,1]}(x), \quad x=(x_1, x_2). \end{aligned}$$

Clearly, \(S_1\) is continuous on \(\Omega \), while \(S_2\) is discontinuous along \(x_1=\varepsilon \). A straightforward calculation shows that

$$\begin{aligned} \omega _2(S_1,t)_\tau ^\tau =\frac{2t^{\tau +1}}{\tau +1} \quad \hbox {and}\quad \omega _2(S_2,t)_\tau ^\tau =\int _{-t}^t|w+\varepsilon |^\tau \mathrm{d}w \quad \hbox {for}\quad 0\le t\le 1/4. \end{aligned}$$
(4.10)

Further,

$$\begin{aligned} \int _{-t}^t|w+\varepsilon |^\tau \mathrm{d}w=\frac{1}{\tau +1}\left[ (t+\varepsilon )^{\tau +1} +\mathrm{sign}(t-\varepsilon )|t-\varepsilon |^{\tau +1}\right] . \end{aligned}$$
(4.11)

On the other hand, obviously \(\omega _2(S_1-S_2, t)_\tau ^\tau \le 4\Vert S_1-S_2\Vert _{L^\tau }^\tau \le 4\varepsilon ^{\tau +1}\), yielding

$$\begin{aligned} \omega _2(S_2,t)_\tau ^\tau \ge \omega _2(S_1,t)_\tau ^\tau -4\varepsilon ^{\tau +1}. \end{aligned}$$
(4.12)

We shall use this estimate for \(t>1/4\). From (2.1) and (4.104.12), we obtain

Substituting \(t=\varepsilon u\) in \(I_1\) and \(I_2\), we get

$$\begin{aligned} I_1+I_2=\frac{\varepsilon ^{\tau -s\tau +1}}{\tau +1} \left[ \int _0^1u^{-s\tau -1}\phi _1(u)\mathrm{d}u +\int _1^{1/4\varepsilon }u^{-s\tau -1}\phi _2(u)\mathrm{d}u\right] , \end{aligned}$$

where

$$\begin{aligned} \phi _1(u)=(1+u)^{\tau +1}-(1-u)^{\tau +1}-2u^{\tau +1} \end{aligned}$$

and

$$\begin{aligned} \phi _2(u)=(1+u)^{\tau +1}+(u-1)^{\tau +1}-2u^{\tau +1}. \end{aligned}$$

We clearly have \(\phi _1\ge 0\) on [0, 1] and \(\phi _2\ge 0\) on \([1,\infty )\). Therefore,

$$\begin{aligned} |S_2|_{B_\tau ^{s,2}}^\tau -|S_1|_{B_\tau ^{s,2}}^\tau \ge c_1\varepsilon ^{\tau -s\tau +1}-c_0\varepsilon ^{\tau +1} =\varepsilon ^{\tau -s\tau +1}(c_1-c_0\varepsilon ^{s\tau }), \end{aligned}$$

where

$$\begin{aligned} c_1:=\frac{1}{\tau +1}\int _0^1t^{-s\tau -1}\phi _1(u)\mathrm{d}u>0 \quad \hbox {and}\quad c_0:= 4^{s\tau +1}/s\tau . \end{aligned}$$

By taking \(\varepsilon \) sufficiently small, we get

$$\begin{aligned} |S_2|_{B_\tau ^{s,2}}^\tau -|S_1|_{B_\tau ^{s,2}}^\tau \ge (c_1/2)\varepsilon ^{\tau -s\tau +1}. \end{aligned}$$
(4.13)

Evidently, \(\Vert S_2-S_1\Vert _{L^p}\le \varepsilon ^{1+1/p}\). This estimate coupled with (4.13) implies

$$\begin{aligned} \frac{|S_2|_{B_\tau ^{s,2}}^\tau -|S_1|_{B_\tau ^{s,2}}^\tau }{\Vert S_2-S_1\Vert _{L^p}^\tau } \ge (c_1/2)\varepsilon ^{1-s\tau -\tau /p}=(c_1/2)\varepsilon ^{-s\tau /2}. \end{aligned}$$

Therefore, since \(\varepsilon ^{-s\tau /2}\rightarrow \infty \) as \(\varepsilon \rightarrow 0\), estimate (4.8) cannot hold.

4.4 Additional Subdivision and Bernstein Estimate in the Nested Case

As already indicated above, the idea of the proof of the Bernstein estimate from Theorem 3.4 is insufficient for the proof of the Bernstein estimate for approximation from smooth splines (Theorem 4.2). In the case of smooth splines, we hit a snag when “small” rings are located next to “large” rings. To overcome this obstacle, we next introduce an additional subdivision of the underlying rings. As an application of this construction and for comparison, we prove the following Bernstein estimate, which yields an inverse estimate, in the case of nested spline approximation.

Theorem 4.5

Let \(0< p<\infty \), \(k\ge 2\), \(0\le r\le k-1\), \(0<s/2<r+1/p\), and \(1/\tau =s/2+1/p\). Then for any \(S\in \mathcal {S}(n, k, r)\), \(n\ge n_0\), we have

$$\begin{aligned} |S|_{B^{s, k}_{\tau }} \le cn^{s/2}\Vert S\Vert _{L^p}, \end{aligned}$$
(4.14)

where the constant \(c>0\) depends only on spkr, and the structural constant of our setting.

Additional subdivision of \(\Omega \). We subdivide \(\Omega \) in two steps.

Subdivision of all rings \(R\in \mathcal {R}_n\) into nested hierarchies of rings.

Lemma 4.6

There exists a subdivision of every ring \(R\in \mathcal {R}_n\) into a nested multilevel collection of rings

$$\begin{aligned} \mathcal {K}^R = \cup _{m=m_R}^\infty \mathcal {K}_m^R \end{aligned}$$

with the following properties, where we use the abbreviated notation \(\mathcal {K}_m:=\mathcal {K}_m^R\):

  1. (a)

    Every level \(\mathcal {K}_m\) defines a partition of R into rings with disjoint interiors such that \(R = \cup _{K\in \mathcal {K}_m} K\).

  2. (b)

    The levels \(\{\mathcal {K}_m\}_{m\ge m_R}\) are nested; i.e., \(\mathcal {K}_{m+1}\) is a refinement of \(\mathcal {K}_m\), and each \(K\in \mathcal {K}_m\) has at least 4 and at most \(M\) children in \(\mathcal {K}_{m+1}\), where \(M\ge 4\) is a constant.

  3. (c)

    \(|R| \le c_1|K|\) for all \(K\in \mathcal {K}_{m_R}\).

  4. (d)

    We have

    $$\begin{aligned} c_2^{-1}4^{-m} \le |K| \le c_2 4^{-m}, \quad \forall K\in \mathcal {K}_m, \quad \forall m\ge m_R. \end{aligned}$$

    As a consequence, we have \(c_3^{-1}4^{-m_R} \le |R| \le c_3 4^{-m_R}\) and

    $$\begin{aligned} c_4^{-1} 2^{-m} \le d(K) \le c_4 2^{-m}, \quad \forall K\in \mathcal {K}_m, \quad \forall m\ge m_R. \end{aligned}$$
  5. (e)

    All rings \(K\in \mathcal {K}^R\) are rings without a hole, except for finitely many of them in the case when \(R=Q_1\setminus Q_2\) and \(Q_2\) is small relative to \(Q_1\). Then the rings with a hole form a chain \(R\supset K_1 \supset K_2 \supset \cdots \supset K_\ell \supset Q_2\). All sets \(K\in \mathcal {K}^R\) are rings in the sense of Definition 3.1 with structural constants (parameters) \(N_0^*\), \(c_0^\star \), and \(\beta ^\star \). These and the constants \(M\) and \(c_1, c_2, c_3, c_4>0\) from above depend only on the initial structural constants \(N_0\), \(c_0\), and \(\beta \).

Proof

Observe first that if we are in a setting as the one described in Scenario 1 from Sect. 4.1, then the needed subdivision is given by the hierarchy of triangulations described there.

In the general case, let \(R=Q_1\setminus Q_2\) be a ring in the sense of Definition 3.1, and assume that \(Q_2 \ne \emptyset \). We subdivide the polygonal convex set \(Q_1\) into subrings by connecting the center of eccentricity of \(Q_1\) with, say, 6 points from the boundary \(\partial R\) of R, preferably end points of segments on the boundary, so that the minimum angle condition is obeyed. After that we subdivide the resulting rings using midpoints and connecting them with segments. Necessary adjustments are made around \(Q_2\), depending on the size and location of \(Q_2\).\(\square \)

Subdivision of all rings from \(\mathcal {R}_n\) into subrings with disjoint interiors. We first pick up all rings from each \(\mathcal {K}^R\), \(R\in \mathcal {R}_n\), see Lemma 4.6, that are needed to handle situations where many small rings are located next to a large ring.

We shall only need the rings in \(\mathcal {K}^R\) that intersect the boundary \(\partial R\) of R. Denote the set of all such rings by \(\Gamma ^R\), and set \(\Gamma ^R_m:= \Gamma ^R \cap \mathcal {K}^R_m\). We shall make use of the tree structure in \(\Gamma ^R\). More precisely, we shall use the parent-child relation in \(\Gamma ^R\) induced by the inclusion relation: Each ring \(K \in \Gamma ^R_m\) has (contains) at least 1 and at most M children in \(\Gamma ^R_{m+1}\) and has a single parent in \(\Gamma ^R_{m-1}\) or no parent.

We now construct a set \(\Lambda ^R\) of rings from \(\Gamma ^R\) which will help prevent situations where a ring may have many small neighbors.

Given \(R\in \mathcal {R}_n\), we denote by \(\mathcal {R}_n^R\) the set of all rings \({\tilde{R}}\in \mathcal {R}_n\), \({\tilde{R}}\ne R\), such that \({\tilde{R}}\cap R \ne \emptyset \) and \(d({\tilde{R}}) \le d(R)\). These are all rings from \(\mathcal {R}_n\) that are small relative to R and intersect R (are neighbors of R).

It will be convenient to introduce the following somewhat geometric terminology: We say that a ring \(K\in \Gamma ^R\) can see \({\tilde{R}}\in \mathcal {R}_n^R\) or that \({\tilde{R}}\) is in the range of K if \(d(K) \ge d({\tilde{R}})\) and \(K\cap {\tilde{R}}\ne \emptyset \).

We now construct \(\Lambda ^R\) by applying the following

Rule: We place \(K\in \Gamma ^R\) in \(\Lambda ^R\) if K can see some (at least one) rings from \(\mathcal {R}_n^R\) but neither of the children of K in \(\Gamma ^R\) can see all of them.

We now extend \(\Lambda ^R\) to \({\tilde{\Lambda }}^R\) by adding to \(\Lambda ^R\) all same level neighbors of all \(K\in \Lambda ^R\); i.e., if \(K\in \Lambda ^R\) and \(K\in \Gamma ^R_m\), then we add to \(\Lambda ^R\) each \(K'\in \Gamma ^R_m\) such that \(K'\cap K \ne \emptyset \).

The next step is to construct a subdivision of each \(R\in \mathcal {R}_n\) into rings by using \({\tilde{\Lambda }}^R\). We fix \(R\in \mathcal {R}_n\) and shall suppress the superscript R for the new sets that will be introduced next and depend on R.

Let \(\tilde{\Gamma } \subset \Gamma ^R\) be the minimal subtree of \(\Gamma ^R\) that contains \({\tilde{\Lambda }}^R\); i.e., \({\tilde{\Gamma }}\) is the set of all \(K \in \Gamma ^R\) such that \(K \supset K'\) for some \(K' \in {\tilde{\Lambda }}^R\). We denote by \({\tilde{\Gamma }}_b\) the set of all branching rings in \({\tilde{\Gamma }}\) (rings with more than one child in \({\tilde{\Gamma }}\)) and by \({\tilde{\Gamma }}_b'\) the set of all children in \({\tilde{\Gamma }}\) of branching rings (each of them may or may not belong to \({\tilde{\Gamma }}\)). Furthermore, we let \({\tilde{\Gamma }}_\ell \) denote the set of all leaves in \({\tilde{\Gamma }}\) (rings in \({\tilde{\Gamma }}\) containing no other rings from \({\tilde{\Gamma }}\)).

Evidently, \({\tilde{\Gamma }}_\ell \subset {\tilde{\Lambda }}^R\). However, rings from \({\tilde{\Gamma }}_b\) and \({\tilde{\Gamma }}_b'\) may or may not belong to \({\tilde{\Lambda }}^R\). We extend \({\tilde{\Lambda }}^R\) to \(\tilde{{\tilde{\Lambda }}}^R := {\tilde{\Lambda }}^R \cup {\tilde{\Gamma }}_b \cup {\tilde{\Gamma }}_b'\). In addition, we add to \(\tilde{{\tilde{\Lambda }}}^R\) all rings from \(\mathcal {K}^R_{m_R}\), if they are not there yet.

It is readily seen that each ring \({\tilde{R}}\in \mathcal {R}_n^R\) can be in the range of only finitely many \(K\in {\tilde{\Gamma }}_\ell \) and each ring \({\tilde{R}}\in \mathcal {R}_n\) may have only finitely many neighbors \(R\in \mathcal {R}_n\) such that \(d(R) \ge d({\tilde{R}})\). Therefore,

$$\begin{aligned} \sum _{R\in \mathcal {R}_n} \# {\tilde{\Gamma }}_\ell ^R \le cn. \end{aligned}$$

Obviously, \(\# {\tilde{\Gamma }}_b \le \# {\tilde{\Gamma }}_\ell \), \(\# {\tilde{\Gamma }}_b' \le M\# {\tilde{\Gamma }}_b \le M\# {\tilde{\Gamma }}_\ell \), implying \(\# {\tilde{\Lambda }}^R \le \#{\tilde{\Gamma }}_\ell + \#{\tilde{\Gamma }}_b \le c\# {\tilde{\Gamma }}_\ell \), and hence \(\# \tilde{{\tilde{\Lambda }}}^R \le c'\# {\tilde{\Gamma }}_\ell \). Putting these estimates together implies

$$\begin{aligned} \sum _{R\in \mathcal {R}_n} \# \tilde{{\tilde{\Lambda }}}^R \le cn. \end{aligned}$$

Observe that, with the exception of all branching rings in \({\tilde{\Lambda }}^R\), by construction every other ring \(K\in \tilde{\Lambda }^R\) is either a leaf, and hence contains no other rings from \(\tilde{{\tilde{\Lambda }}}^R\), or contains only one ring \(K'\in \tilde{{\tilde{\Lambda }}}^R\) of minimum level; i.e., K has one descendent \(K'\) in \(\tilde{{\tilde{\Lambda }}}^R\).

We now make the final step in our construction: We denote by \(\mathcal {F}^R\) the set of all rings from \({\tilde{\Gamma }}_\ell ^R\) along with all new rings of the form \(\overline{K\setminus K'}\), where \(K\in {\tilde{\Gamma }}_b'\), \(K'\in \tilde{{\tilde{\Lambda }}}^R\), \(K' \subset K\) and \(K'\) is of minimum level with these properties. Set \(\mathcal {F}:= \cup _{R\in \mathcal {R}_n} \mathcal {F}^R\).

The purpose of the above construction becomes clear from the following

Lemma 4.7

The set \(\mathcal {F}\) consists of rings in the sense of Definition 3.1 with parameters depending only on the structural constants \(N_0\), \(c_0\), and \(\beta \). Also, for any \(R\in \mathcal {R}_n\), the rings in \(\mathcal {F}^R\) have disjoint interiors, \(R= \cup _{K\in \mathcal {F}^R} K\), and \(\# \mathcal {F}^R \le c\#\tilde{{\tilde{\Lambda }}}^R\). Hence,

$$\begin{aligned} \Omega = \bigcup _{R\in \mathcal {R}_n} \bigcup _{K\in \mathcal {F}^R} K \quad \hbox {and}\quad \sum _{R\in \mathcal {R}_n} \#\mathcal {F}^R \le cn. \end{aligned}$$

Most importantly, each ring \(K\in \mathcal {F}\) has only finitely many neighbors in \(\mathcal {F}\); that is, there exists a constant \(N_1\) such that for any \(K\in \mathcal {F}\) there are at most \(N_1\) rings in \(\mathcal {F}\) intersecting K.

To prove the most important property of the set of rings \(\mathcal {F}\) , namely, that each ring \(K\in \mathcal {F}\) has only finitely many neighbors in \(\mathcal {F}\), we shall need the following technical

Lemma 4.8

Suppose \(K \supset K_1 \supset K_2\), \(K\in \Gamma ^R\), \(K_1, K_2\in {\tilde{\Lambda }}^R\), and both \(K_1\) and \(K_2\) share parts of an edge E of K located in the interior of R. Then there exists \(K^\star \in {\tilde{\Lambda }}^R\) such that \(K^\star \cap K^\circ =\emptyset \), \(K^\star \cap E\ne \emptyset \), and \(K^\star \) is either a neighbor of \(K_1\) or \(K_2\), or \(K^\star \) is a neighbor of the parent of \(K_1\) in \(\Gamma ^R\).

Proof

If \(K_1\in \Lambda ^R\), then by construction all same level neighbors of \(K_1\) belong to \({\tilde{\Lambda }}^R\) and hence the one that shares the edge of \(K_1\) contained in E will be in \({\tilde{\Lambda }}^R\). We denote this ring by \(K^\star \), and apparently it has the claimed properties. By the same token, if \(K_2\in \Lambda ^R\), then one of its neighbors will do the job.

Suppose \(K_1, K_2\in {\tilde{\Lambda }}^R\setminus \Lambda ^R\). Then \(K_1\) has a neighbor, say, \(\hat{K}_1\) that belongs to \(\Lambda ^R\) and \(\hat{K}_1\) is at the level of \(K_1\). If \(\hat{K}_1\) has an edge contained in E, then \(K^\star :=\hat{K}_1\) has the claimed property. Similarly, \(K_2\) has a neighbor \(\hat{K}_2\in \Lambda ^R\) at the level of \(K_2\). If \(\hat{K}_2\) has an edge contained in E, then \(K^\star :=\hat{K}_2\) will do the job.

Assume that neither of the above is true. Then since \(K_1, \hat{K}_1 \in \Gamma ^R\), they must have the same parent in \(\Gamma ^R\) that has an edge contained in E. Denote this common parent by \(K^\sharp \). For the same reason, \(K_2, \hat{K}_2 \in \Gamma ^R\) have a common parent, say, \(K^{\sharp \sharp }\) in \(\Gamma ^R\). Clearly, \(K^\sharp \) and \(K^{\sharp \sharp }\) have some edges contained in E. Also, \(\hat{K}_1 \subset K^\sharp \), \(\hat{K}_2 \subset K^\sharp \), and \(\hat{K}_1^\circ \cap \hat{K}_2^\circ =\emptyset \).

We claim that \(K^\sharp \) belongs to \(\Lambda ^R\). Indeed, the rings from \(\mathcal {R}_n\) that are in the range of \(\hat{K}_1\) are also in the range of \(K^\sharp \). Also, the rings from \(\mathcal {R}_n\) that are in the range of \(\hat{K}_2\) are also in the range of \(K^\sharp \). However, obviously neither of the children of \(K^\sharp \) can have the range of \(K^\sharp \). Therefore, \(K^\sharp \) belongs to \(\Lambda ^R\). Now, just as above, we conclude that one of the neighbors of \(K^\sharp \) has the claimed property.\(\square \)

Proof of Lemma 4.7

All properties of the newly constructed set of rings \(\mathcal {F}\), given in Lemma 4.7, but the last one follow readily from their construction.

We now show that each ring \(K\in \mathcal {F}\) has only finitely many neighbors in \(\mathcal {F}\). Indeed, by the construction any \(K\in \mathcal {F}^R\), \(R\in \mathcal {R}_n\), has only finitely many neighbors that do not belong to \(\mathcal {F}^R\). Thus, it remains to show that it cannot happen that there exist rings \(K_1\subset K_2 \subset \cdots \subset K_J\), \(K_j\in {\tilde{\Lambda }}^R\), with J uncontrollably large that have edges contained in an edge of a single ring \(K\in {\tilde{\Lambda }}^R\) whose interior does not intersect \(K_j\), \(j=1, \dots , J\). But this assertion readily follows by Lemma 4.8. \(\square \)

The following lemma will be instrumental in the proof of Theorem 4.5.

Lemma 4.9

Assume \(0< p, q\le \infty \), \(k\ge 1\), \(r\ge 0\), and \(\nu \in \mathbb {R}^2\) with \(|\nu |=1\). Let the sets \(G, H\subset \mathbb {R}^2\) be measurable, \(G\subset H\), and such that there exist balls \(B_1, B_2, B_3, B_4\), \(B_j=B(x_j, r_j)\), with the properties: \(B_2\subset G \subset B_1\), \(r_1\le {c^\flat }r_2\), and \(B_4\subset H \subset B_3\), \(r_3\le {c^\flat }r_4\), where \({c^\flat }\ge 1\) is a constant. Then for any \(P\in \Pi _{k-1}\),

$$\begin{aligned}&\Vert P\Vert _{L^p(G)} \le c|G|^{1/p-1/q}\Vert P\Vert _{L^q(G)}, \end{aligned}$$
(4.15)
$$\begin{aligned}&\Vert D_\nu ^r P\Vert _{L^p(G)} \le c d(G)^{-r}\Vert P\Vert _{L^p(G)}, \end{aligned}$$
(4.16)

and

$$\begin{aligned} \Vert P\Vert _{L^p(G)} \le c(|G|/|H|)^{1/p}\Vert P\Vert _{L^p(H)}, \end{aligned}$$
(4.17)

where \(c>0\) is a constant depending on \(p, q, k, r, {c^\flat }\), and the parameters \(N_0\), \(c_0\), and \(\beta \) from Definition 3.1. Here \(D_\nu ^r S\) is the rth directional derivative of S in the direction of \(\nu \).

Furthermore, inequality (4.17) holds with Q and H replaced by their images L(G) and L(H), where L is a nonsingular linear transformation of \(\mathbb {R}^2\).

Proof

Inequality (4.15) holds whenever \(B_2=B(0, 1)\) and \(B_1=B(0, {c_\diamond })\) with \({c_\diamond }=\mathrm{constant}\) by the fact that any two (quasi)norms on \(\Pi _{k-1}\) are equivalent. This implies that (4.15) is valid in the case when \(B_2=B(0, 1)\) and \(B_2\subset B_1\), where \(B_1=B(x_2, {c_\diamond }/2)\). Then (4.15), in general, follows by rescaling. Inequality (4.17) is obvious when \(p=\infty \). In general, it follows from the case \(p=\infty \) and application of (4.15) to G with p and \(q=\infty \) and to H with \(p=\infty \), \(q=p\). Inequality (4.16) is an easy consequence of the Markov inequality for univariate polynomials whenever G is a square. Then in general it follows by inscribing \(B_1\) in a smallest possible cube and then applying it for the cube and using (4.17). The last claim in the lemma is obvious.\(\square \)

Proof of Theorem 4.5

With the preparations from above, we are ready to carry out this proof. We shall only consider the case when \(\Omega \subset \mathbb {R}^2\) is a compact polygonal domain. Let \(S\in \mathcal {S}(n, k, r)\), and suppose S is represented as in (4.1); that is,

$$\begin{aligned} S=\sum _{R\in \mathcal {R}_n} P_R{\mathbbm {1}}_{R}, \quad S\in C^{r-1}(\Omega ), \quad P_R\in \Pi _{k-1}, \end{aligned}$$
(4.18)

where \(\mathcal {R}_n\) is a collection of \(\le n\) rings with disjoint interiors such that \(\Omega =\cup _{R\in \mathcal {R}_n} R\). Let \(\mathcal {F}\) be the set of rings constructed above starting with the rings from \(\mathcal {R}_n\). Then from (4.18) and because \(\mathcal {F}\) is a refinement of \(\mathcal {R}_n\), it follows that S can be represented in the form

$$\begin{aligned} S=\sum _{K\in \mathcal {F}} P_K{\mathbbm {1}}_{K}, \quad \quad S\in C^{r-1}(\Omega ), \quad P_K\in \Pi _{k-1}. \end{aligned}$$
(4.19)

Recall that \(\mathcal {F}\) is the collection of at most cn rings with disjoint interiors such that \(\Omega =\cup _{K\in \mathcal {F}} K\) (see Lemma 4.7).

We next introduce some convenient notation. For any ring \(K\in \mathcal {F}\), we denote by \(\mathcal {N}_K\) the set of all rings \(K'\in \mathcal {F}\) such that \(K\cap K' \ne \emptyset \); \(\mathcal {E}_K\) will denote the set of all segments (edges) from the boundary \(\partial K\) of K; and \(\mathcal {V}_K\) will be the set of all vertices of the polygonal curve \(\partial K\) (end points of edges from \(\mathcal {E}_K\)).

The fact that \(\mathcal {F}\) consists of rings in the sense of Definition 3.1 implies the following

Property 4.10

There exists a constant \(0< {\check{c}}<1\) such that if \(E=[v_1,v_2]\) is an edge shared by two rings \(K, K'\in \mathcal {F}\), then for any \(x\in E\) with \(|x-v_j| \ge \rho \), \(j=1, 2\) for some \(\rho >0\), we have \(B(x, {\check{c}}\rho ) \subset K\cup K'\).

Fix \(t>0\). For each ring \(K\in \mathcal {F}\), we define

$$\begin{aligned} K_t:= \{x\in K: {\text {dist}}(x, \partial K) \le kt\}. \end{aligned}$$

Write \(\Omega _t:=\cup _{K\in \mathcal {F}} K_t\).

Let \(h\in \mathbb {R}^2\) with norm \(|h|\le t\), and set \(\nu :=|h|^{-1}h\). For S is a polynomial of degree \(\le k-1\) on each \(K\in \mathcal {F}\), we have \(\Delta ^k_hS(x) =0\) for \(x\in \cup _{K\in \mathcal {F}} K\setminus K_t\). Therefore,

$$\begin{aligned} \Vert \Delta ^k_hS\Vert _{L^\tau (\Omega )} = \Vert \Delta ^k_hS\Vert _{L^\tau (\Omega _t)}. \end{aligned}$$

Let \(K\in \mathcal {F}\), and assume \(d(K) > 2kt/{\check{c}}\) with \(0<{\check{c}}<1\) the constant from Property 4.10. Define \(\mathcal {N}_K^t:= \{K'\in \mathcal {N}_K: d(K) > 2kt/{\check{c}}\}\), \(B_v:=B(v, 2kt/{\check{c}})\), \(v\in \mathcal {V}_K\), and

$$\begin{aligned} \mathfrak {N}_K^t:= \{K'\in \mathcal {F}: d(K') > 2kt/{\check{c}}\;\;\hbox {and} \;\; K'\cap (K+B(0, 2kt/{\check{c}}))\ne \emptyset \}. \end{aligned}$$

Observe that because \(d(K) > 2kt/{\check{c}}\), the number of rings in \(\mathfrak {N}_K^t\) is uniformly bounded.

Let \(x\in \Omega _t\) be such that \([x, x+kh]\cap K \ne \emptyset \). Two cases are to be considered here.

(a) Let \([x, x+kh] \not \subset \cup _{v\in \mathcal {V}_K} B_v\). Then \([x, x+kh]\) intersects some edge \(E\in \mathcal {E}_K\) such that \(\ell (E) \ge 2kt/{\check{c}}\), and \([x, x+kh]\) cannot intersect another edge \(E'\in \mathcal {E}_K\) with this property or an edge \(E'\in \mathcal {E}_K\) with \(\ell (E') < 2kt/{\check{c}}\).

Suppose that the edge \(E=:[v_1, v_2]\) is shared with \(K'\in \mathcal {F}\) and \(y:= E\cap [x, x+kh]\). Evidently, \(|y-v_j| > kt/{\check{c}}\), \(j=1,2\), and in light of Property 4.10, we have \([x, x+kh] \subset B(y, kt) \subset K\cup K'\). Clearly,

$$\begin{aligned} |\Delta _h^k S(x)| \le ct^r\Vert D_\nu ^rS\Vert _{L^\infty ([x, x+kh])} \le ct^r\Vert D_\nu ^rS\Vert _{L^\infty (K)} + ct^r\Vert D_\nu ^rS\Vert _{L^\infty (K')}. \end{aligned}$$
(4.20)

(b) Let \([x, x+kh] \subset \cup _{v\in \mathcal {V}_K} B_v\). Then we estimate \(|\Delta _h^k S(x)|\) trivially:

$$\begin{aligned} |\Delta _h^k S(x)| \le 2^k\sum _{\ell =0}^k|S(x+\ell h)|. \end{aligned}$$
(4.21)

Using (4.204.21), we obtain

$$\begin{aligned} \Vert \Delta _h^k&S\Vert _{L_\tau (K_t)}^\tau \le c\sum _{K'\in \mathcal {N}_K^t} t d(K') t^{r\tau }\Vert D_\nu ^rS\Vert _{L^\infty (K')}^\tau \nonumber \\&+ c\sum _{K'\in \mathfrak {N}_K^t}\sum _{v\in \mathcal {V}_{K}} \Vert S\Vert _{L^\tau (B_v\cap K')}^\tau + c\sum _{K''\in \mathcal {F}: d(K'') \le 2kt/{\check{c}}}\Vert S\Vert _{L^\tau (K''\cap (K+[0, kh]))}^\tau . \end{aligned}$$
(4.22)

Note that the number of rings \(K'\in \mathfrak {N}_K^t\) such that \(K'\cap B_v\ne \emptyset \) for some \(v\in \mathcal {V}_K\) is uniformly bounded.

By Lemma 4.9, it follows that \(\Vert D_\nu ^rS\Vert _{L^\infty (K')} \le c d(K')^{-r-2/p}\Vert S\Vert _{L^p(K')}\), and if the ring \(K'\in \mathfrak {N}_K^t\) and \(v\in \mathcal {V}_{K}\), then

$$\begin{aligned} \Vert S\Vert _{L^\tau (B_v\cap K')}^\tau&\le c(|B_v|/|K'|)\Vert S\Vert _{L^\tau (K')}^\tau \le ct^2|K'|^{-1}\Vert S\Vert _{L^\tau (K')}^\tau \\&\le ct^2|K'|^{-1+\tau (1/\tau -1/p)}\Vert S\Vert _{L^p(K')}^\tau \le ct^2d(K')^{\tau s-2}\Vert S\Vert _{L^p(K')}^\tau . \end{aligned}$$

We use the above estimates in (4.22) to obtain

$$\begin{aligned} \Vert \Delta _h^k S\Vert _{L_\tau (K_t)}^\tau&\le c\sum _{K'\in \mathcal {N}_K^t} t^{1+r\tau } d(K')^{1-r\tau -2\tau /p}\Vert S\Vert _{L^p(K')}^\tau \nonumber \\&\quad + c\sum _{K'\in \mathfrak {N}_K^t} t^{2} d(K')^{\tau s-2}\Vert S\Vert _{L^p(K')}^\tau \nonumber \\&\quad + c\sum _{K''\in \mathcal {F}: d(K'') \le 2kt/{\check{c}}}\Vert S\Vert _{L^\tau (K''\cap (K+[0, kh]))}^\tau . \end{aligned}$$
(4.23)

Denote by \(\Omega _t^\star \) the set of all \(x\in \Omega _t\) such that \([x, x+kh] \subset \Omega \) and

$$\begin{aligned}{}[x, x+kh] \subset \cup \{K\in \mathcal {F}: d(K) \le 2kt/{\check{c}}\}. \end{aligned}$$

In this case, we shall use the obvious estimate

$$\begin{aligned} \Vert \Delta _h^kS\Vert _{L^\tau (\Omega _t^\star )}^\tau \le c \sum _{K\in \mathcal {F}: d(K) \le 2kt/{\check{c}}}\Vert S\Vert _{L^\tau (K)}^\tau . \end{aligned}$$

This estimate along with (4.23) yields

$$\begin{aligned} \omega _k(S, t)_\tau ^\tau&\le c \sum _{K\in \mathcal {F}: d(K) \ge 2kt/{\check{c}}} t^{1+r\tau }d(K)^{1-r\tau -2\tau /p}\Vert S\Vert _{L^p(K)}^\tau \\&\quad + c\sum _{K\in \mathcal {F}: d(K) \ge 2kt/{\check{c}}} t^2 d(K)^{s\tau -2}\Vert S\Vert _{L^p(K)}^\tau + c\sum _{K\in \mathcal {F}: d(K) \le 2kt/{\check{c}}} \Vert S\Vert _{L^\tau (K)}^\tau . \end{aligned}$$

Here we used the fact that only finitely many (uniformly bounded number) of the rings involved in the above estimates may overlap at a time due to Lemma 4.7. For the norms involved in the last sum, we use the estimate \( \Vert S\Vert _{L^\tau (K)}^\tau \le cd(K)^{s\tau }\Vert S\Vert _{L^p(K)}^\tau , \) which follows by Lemma 4.9, to obtain

$$\begin{aligned}&\omega _k(S, t)_\tau ^\tau \le c \sum _{K\in \mathcal {F}: d(K) \ge 2kt/{\check{c}}} t^{1+r\tau }d(K')^{1-r\tau -2\tau /p}\Vert S\Vert _{L^p(K')}^\tau \\&\qquad + c\sum _{K\in \mathcal {F}: d(K) \ge 2kt/{\check{c}}} t^2 d(K)^{s\tau -2}\Vert S\Vert _{L^p(K)}^\tau + c\sum _{K\in \mathcal {F}: d(K) \le 2kt/{\check{c}}} d(K)^{s\tau }\Vert S\Vert _{L^p(K)}^\tau . \end{aligned}$$

We insert this estimate in (2.1) and interchange the order of integration and summation to obtain

$$\begin{aligned} |S|_{B^{s,k}_\tau }^\tau&=\int _0^\infty t^{-s\tau -1}\omega _k(S, t)_\tau ^\tau \mathrm{d}t \\&\le c \sum _{K\in \mathcal {F}} d(K)^{1-r\tau -2\tau /p}\Vert S\Vert _{L^p(K)}^\tau \int _0^{{\check{c}}d(K)/2k} t^{-s\tau +r\tau } \mathrm{d}t \\&\quad + c\sum _{K\in \mathcal {F}} d(K)^{s\tau -2}\Vert S\Vert _{L^p(K)}^\tau \int _0^{{\check{c}}d(K)/2k} t^{-s\tau +1} \mathrm{d}t \\&\quad + c\sum _{K\in \mathcal {F}} d(K)^{s\tau }\Vert S\Vert _{L^p(K)}^\tau \int _{{\check{c}}d(K)/2k}^\infty t^{-s\tau -1} \mathrm{d}t. \end{aligned}$$

Observe that \(-s\tau + r\tau > -1\) is equivalent to \(s/2<r+1/p\) and \(-s\tau + 1 > -1\) is equivalent to \(s<2/\tau = s+2/p\). Therefore, the above integrals are convergent, and taking into account that \(2-2\tau /p-s\tau = 2\tau (1/\tau -1/p-s/2) = 0\), we obtain

$$\begin{aligned} |S|_{B^{s,k}_\tau }^\tau \le c\sum _{K\in \mathcal {F}} \Vert S\Vert _{L^p(K)}^\tau \le cn^{\tau (1/\tau -1/p)}\left( \sum _{K\in \mathcal {F}} \Vert S\Vert _{L^p(K)}^\tau \right) ^{\tau /p} =cn^{\tau s/2}\Vert S\Vert _{L^p(\Omega )}^\tau , \end{aligned}$$

where we used Hölder’s inequality. This completes the proof of Theorem 4.5. \(\square \)

4.5 Proof of the Bernstein Estimate (Theorem 4.2) in the Nonnested Case

For the proof of Theorem 4.2, we combine ideas from the proofs of Theorem 3.4 and Theorem 4.5. We shall adhere to a large extent to the notation introduced in the proof of Theorem 3.4 in Sect. 3.3. An important distinction between this proof and the proof of Theorem 3.4 is that the directional derivatives \(D_\nu ^{k-1}S\) of any \(S\in \mathcal {S}(n, k)\) are piecewise constants along the respective straight lines rather than S being a piecewise constant.

We consider the case when \(\Omega \subset \mathbb {R}^2\) is a compact polygonal domain. Assume \(S_1, S_2\in \mathcal {S}(n, k)\), \(n\ge n_0\). Then each \(S_j\) (\(j=1, 2\)) can be represented in the form \( S_j=\sum _{R\in \mathcal {R}_j} P_R {\mathbbm {1}}_{R}, \) where \(\mathcal {R}_j\) is a set of at most n rings in the sense of Definition 3.1 with disjoint interiors and such that \(\Omega = \cup _{R\in \mathcal {R}_j} R\), \(P_R \in \Pi _k\), and \(S_j\in W^{k-2}(\Omega )\).

Just as in the proof of Theorem 4.5, there exist subdivisions \(\mathcal {F}_1\), \(\mathcal {F}_2\) of the rings from \(\mathcal {R}_1\), \(\mathcal {R}_2\) with the following properties, for \(j=1, 2\):

  1. (a)

    \(\mathcal {F}_j\) consists of rings in the sense of Definition 3.1 with parameters \(N_0^\star \), \(c_0^\star \), and \(\beta ^\star \) depending only on the structural constants \(N_0\), \(c_0\), and \(\beta \).

  2. (b)

    \(\cup _{R\in \mathcal {F}_j} R = \Omega \) and \(\# F_j \le cn\).

  3. (c)

    There exists a constant \(N_1\) such that for any \(R\in \mathcal {F}_j\), there are at most \(N_1\) rings in \(\mathcal {F}_j\) intersecting R (R has \(\le N_1\) neighbors in \(\mathcal {F}_j\)).

  4. (d)

    \(S_j\) can be represented in the form \(S_j = \sum _{R\in \mathcal {F}_j} P_R{\mathbbm {1}}_{R}\) with \(P_R\in \Pi _k\).

Now, just as in the proof of Theorem 3.4, we denote by \(\mathcal {U}\) the collection of all maximal connected sets obtained by intersecting rings from \(\mathcal {F}_1\) and \(\mathcal {F}_2\). By (3.14), there exists a constant \(c>0\) such that \( \# \mathcal {U}\le cn. \)

We claim that there exists a constant \(N_2\) such that for any \(U\in \mathcal {U}\) there are no more than \(N_2\) sets \(U'\in \mathcal {U}\) that intersect U; i.e., U has at most \(N_2\) neighbors in \(\mathcal {U}\). Indeed, let \(U\in \mathcal {U}\) be a maximal connected component of \(R_1\cap R_2\) with \(R_1\in \mathcal {F}_1\) and \(R_2\in \mathcal {F}_2\). Then using the fact that the ring \(R_1\) has finitely many neighbors in \(\mathcal {F}_1\) and \(R_2\) has finitely many neighbors in \(\mathcal {F}_2\), we conclude that U has finitely many neighbors in \(\mathcal {U}\).

Further, we introduce the sets \(\mathcal {A}\) and \(\mathcal {T}\) just as in the proof of Theorem 3.4.

Trapezoids. Our main concern will be in dealing with the trapezoids \(T\in \mathcal {T}\). We next use the fact that any ring from \(\mathcal {F}_j\), \(j=1, 2\), has at most \(N_1\) neighbors in \(\mathcal {F}_j\) to additionally subdivide the trapezoids from \(\mathcal {T}\) into trapezoids whose long sides are sides of good triangles for rings in \(\mathcal {F}_1\) or \(\mathcal {F}_2\).

Consider an arbitrary trapezoid \(T\in \mathcal {T}\). Just as in Sect. 3.3, we may assume that T is a maximal isosceles trapezoid contained in \(\triangle _{E_1}\cap \triangle _{E_2}\), where \(\triangle _{E_j}\) (\(j=1, 2\)) is a good triangle for a ring \(R_j\in \mathcal {F}_j\), and T is positioned so that its vertices are the points

$$\begin{aligned} v_1:=(-\delta _1/2, 0),\;\; v_2:=(\delta _1/2, 0),\;\; v_3:=(\delta _2/2, H),\;\; v_4:=(-\delta _2/2, H), \end{aligned}$$

where \(0\le \delta _2\le \delta _1\) and \(H >\delta _1\). Let \(L_1:=[v_1, v_4]\) and \(L_2:=[v_2, v_3]\) be the two equal (long) legs of T. We assume that \(L_1\subset E_1\) and \(L_2\subset E_2\). See Fig. 8.

By Lemma 4.7, it follows that there exist less than \(N_1\) rings \(K_\ell '\in \mathcal {F}_1\), \(\ell =1, \dots ,\nu '\), each of them with an edge or part of an edge contained in \(L_1\). By Definition 3.1, each of them can be subdivided into at most two segments so that each of these is a side of a good triangle. Denote by \(I_\ell '\), \(\ell =1, \dots ,m'\), these segments, and by \(\triangle _{I_\ell '}\), \(\ell =1, \dots ,m'\), the respective good triangles attached to them. More precisely, \(I_\ell '\) is a side of \(\triangle _{I_\ell '} \subset K_\ell '\), and \(\triangle _{I_\ell '}\) is a good triangle for \(K_\ell '\). Thus we have \(L_1=\cup _{\ell =1}^{m'} I_\ell '\), where the segments \(I_\ell '\), \(\ell =1, \dots ,m'\), are with disjoint interiors.

Similarly, there exist segments \(I_\ell ''\), \(\ell =1, \dots ,m''\), and attached to them good triangles \(\triangle _{I_\ell ''}\), \(\ell =1, \dots ,m''\), for rings from \(\mathcal {F}_2\), so that \(L_2=\cup _{\ell =1}^{m''} I_\ell ''\).

Denote by \(v_\ell '\), \(\ell =1, \dots ,m'+1\), the vertices of the triangles \(\triangle _{I_\ell '}\), \(\ell =1, \dots ,m'\), on \(L'\) so that \(I_\ell '=[v_\ell ', v_{\ell +1}']\), and assume that their orthogonal projections onto the \(x_2\)-axis \(p_\ell '\), \(\ell =1, \dots ,m'+1\), are ordered so that \(0=p_1'< p_2'<\cdots <p_{m'+1}'=H\). Exactly in the same way, we define the vertices \(v_\ell ''\), \(\ell =1, \dots ,m''+1\), of the triangles \(\triangle _{I_\ell ''}\) and their projections onto the \(x_2\)-axis \(0=p_1''< p_2''<\cdots <p_{m''+1}''=H\).

For any \(q\in [0, H]\), we let \(\delta (q)\) be the distance between the points where the line with equation \(x_2=q\) intersects \(L_1\) and \(L_2\). Thus \(\delta (0) = \delta _1\) and \(\delta (H) = \delta _2\), and \(\delta (q)\) is linear.

Inductively, starting from \(q_1=0\), one can easily subdivide the interval [0, H] by means of points

$$\begin{aligned} 0=q_1< q_2<\cdots < q_{m+1}=H, \quad m \le m'+m'' \le 2N_1, \end{aligned}$$

with the following properties, for \(k=1, \dots , m\), either

  1. (a)

    \(\delta (q_k)\le q_{k+1}-q_k <2\delta (q_k)\) or

  2. (b)

    \(q_{k+1}-q_k > \delta (q_k)\) and \((q_k, q_{k+1})\) contains no points \(p_\ell '\) or \(p_\ell ''\).

We use the above points to subdivide the trapezoid T. Let \(T_k\), \(k=1, \dots , m\), be the trapezoid bounded by \(L_1\), \(L_2\), and the lines with equations \(x_2=q_k\) and \(x_2=q_{k+1}\).

We now separate the “bad” from the “good” trapezoids \(T_k\). Namely, if property (a) from above is valid, then \(T_k\) is a ring and we place \(T_k\) in \(\mathcal {A}\); if property (b) is valid, then \(T_k\) is a “bad” trapezoid and we place \(T_k\) in \(\mathcal {T}\). We apply the above procedure to all trapezoids.

Properties of the new trapezoids. We now consider an arbitrary trapezoid T from the above defined \(\mathcal {T}\) (the set of bad trapezoids). We next summarise the properties of T. It will be convenient to us to use the same notation as above as well as in the proof of Theorem 3.4. We assume that T is an isosceles trapezoid contained in \(\triangle _{E_1}\cap \triangle _{E_2}\), where \(\triangle _{E_j}\), \(j=1, 2\), is a good triangle for a ring \(R_j\in \mathcal {F}_j\), and T is positioned so that its vertices are the points

$$\begin{aligned} v_1:=(-\delta _1/2, 0),\;\; v_2:=(\delta _1/2, 0),\;\; v_3:=(\delta _2/2, H),\;\; v_4:=(-\delta _2/2, H), \end{aligned}$$

where \(0\le \delta _2\le \delta _1\) and \(H >\delta _1\). Let \(L_1:=[v_1, v_4]\) and \(L_2:=[v_2, v_3]\) be the two equal (long) sides of T. We assume that \(L_1\subset E_1\) and \(L_2\subset E_2\). See Fig. 9.

Fig. 9
figure 9

Illustration of Case 4 (b)

As a result of the above subdivision procedure, there exists a triangle \(\triangle _{L_1}\) with a side \(L_1\) such that \(\triangle _{L_1}\) is a good triangle for some ring \({\tilde{R}}_1\in \mathcal {F}_1\) and \(\triangle _{L_1}^\circ \cap \triangle _{E_1}^\circ =\emptyset \). For the same reason, there exists a triangle \(\triangle _{L_2}\) with a side \(L_2\) such that \(\triangle _{L_2}\) is a good triangle for some ring \({\tilde{R}}_2\in \mathcal {F}_2\) and \(\triangle _{L_2}^\circ \cap \triangle _{E_2}^\circ =\emptyset \).

Observe that \(\triangle _{E_1}\) and \(\triangle _{E_2}\) are good triangles, and hence the angles of \(\triangle _{E_j}\) adjacent to \(E_j\) are of size \(\beta ^\star /2\), \(j=1,2\). Likewise, \(\triangle _{L_1}\) and \(\triangle _{L_2}\) are good triangles, and hence the angles of \(\triangle _{L_j}\) adjacent to \(L_j\) are of size \(\beta ^\star /2\), \(j=1,2\). Therefore, we may assume that \(\triangle _{L_1} \subset \triangle _{E_2}\) and \(\triangle _{L_1} \subset \triangle _{E_2}\). Consequently, \(S_1\) is a polynomial of degree \(<k\) on \(\triangle _{L_1}\) and another polynomial of degree \(<k\) on \(\triangle _{L_2}\). By the same token, \(S_2\) is a polynomial of degree \(<k\) on \(\triangle _{L_1}\) and another polynomial of degree \(<k\) on \(\triangle _{L_2}\). We shall assume that \(\triangle _{L_1} \subset A_1\) and \(\triangle _{L_2} \subset A_2\), where \(A_1, A_2\in \mathcal {A}\).

Further, denote by \(D_1\) and \(D_2\) the bottom and top sides of T. We shall denote by \(\mathcal {V}_T=\{v_1, v_2, v_3, v_4\}\) the vertices of T, where \(v_1\) is the point of intersection of \(L_1\) and \(D_1\) and the other vertices are indexed counterclockwise.

We shall use the notation \(\delta _1(T) :=\delta _1\) and \(\delta _2(T) :=\delta _2\). We always assume that \(\delta _1(T) \ge \delta _2(T)\). Clearly, \(d(T)\sim H\); more precisely, \(H< d(T) < H +\delta _1+ \delta _2\).

Observe that by the construction of the sets \(\mathcal {T}\), \(\mathcal {A}\), and (3.14), it follows that \(\mathcal {A}\cup \mathcal {T}\) consists of polygonal sets with disjoint interiors, \(\cup _{A\in \mathcal {A}} A \cup _{T\in \mathcal {T}} T=\Omega \), there exists a constant \(c>0\) such that

$$\begin{aligned} \# \mathcal {A}\le cn, \quad \# \mathcal {T}\le cn, \end{aligned}$$

and there exists a constant \(N_3\) such that each set from \(\mathcal {A}\cup \mathcal {T}\) has at most \(N_3\) neighbors in \(\mathcal {A}\cup \mathcal {T}\).

We summarize the most important properties of the sets from \(\mathcal {T}\) and \(\mathcal {A}\) in the following

Lemma 4.11

The following properties hold for some constant \(0< {\tilde{c}}<1\) depending only on the structural constants \(N_0\), \(c_0\), and \(\beta \) of the setting:

  1. (a)

    Let \(T\in \mathcal {T}\), and assume the notation related to T from above. If \(x\in L_1\) with \(|x-v_j| \ge \rho \), \(j=1, 4\), then \(B(x, {\tilde{c}}\rho ) \subset \triangle _{L_1}\cup \triangle _{L_2}\). Also, if \(x\in L_2\) with \(|x-v_j| \ge \rho \), \(j=2, 3\), then \(B(x, {\tilde{c}}\rho ) \subset \triangle _{L_1}\cup \triangle _{L_2}\). Furthermore, if \(x\in D_1\) with \(|x-v_j| \ge \rho \), \(j=1, 2\), then \(B(x, {\tilde{c}}\rho ) \subset \triangle _{E_1}\cap \triangle _{E_2}\), and similarly for \(x\in D_2\).

  2. (b)

    Assume that \(E=[w_1,w_2]\) is an edge shared by two sets \(A, A'\in \mathcal {A}\). Let \(\mathcal {V}_A\) be the set of all vertices on \(\partial A\) (end points of edges) and let \(\mathcal {V}_{A'}\) be the set of all vertices on \(\partial A'\). If \(x\in E\) with \(|x-w_j| \ge \rho \), \(j=1, 2\), for some \(\rho >0\), then

    $$\begin{aligned} B(x, {\tilde{c}}\rho ) \subset A\cup A' \cup _{v\in \mathcal {V}_A\cup \mathcal {V}_{A'}} B(v, \rho ). \end{aligned}$$
    (4.24)

Proof

Part (a) of this lemma follows readily from the properties of the trapezoids. Part (b) needs clarification. Suppose that for some \(x\in E\) with \(|x-w_j| \ge \rho \), \(j=1, 2\), \(\rho >0\), the inclusion (4.24) is not valid. Then there exists a point y from an edge \(\tilde{E}=[u_1, u_2]\) of, say, \(\partial A\) such that \(|y-x|<\rho \) and \(|y-u_j|\ge \rho \), \(j=1, 2\). A simple geometric argument shows that if the constant \({\tilde{c}}\) is sufficiently small (depending only on the parameter \(\beta \) of the setting), then there exists an isosceles trapezoid \(\check{T} \subset \triangle _{E}\cap \triangle _{\tilde{E}}\) with two legs contained in E and \(\tilde{E}\) such that each leg is longer than its larger base. But then the subdivision of the sets from \(\mathcal {U}\) (see the proof of Theorem 3.4) would have created a trapezoid in \(\mathcal {T}\) that contains part of A. This is a contradiction, which shows that Part (b) holds true.\(\square \)

We have the representation

$$\begin{aligned} S_1(x)-S_2(x) = \sum _{A\in \mathcal {A}} P_A(x){\mathbbm {1}}_A(x) + \sum _{T\in \mathcal {T}} P_T(x){\mathbbm {1}}_T(x), \end{aligned}$$
(4.25)

where \(P_A, P_T\in \Pi _k\). Note that \(S_1-S_2\in C^{k-2}(\Omega )\).

Let \(0<s/2<k-1+1/p\) and \(\tau \le 1\). Fix \(t>0\), and let \(h\in \mathbb {R}^2\) with norm \(|h|\le t\). Write \(\nu :=|h|^{-1}h\), and assume \(\nu =:(\cos \theta , \sin \theta )\), \(-\pi <\theta \le \pi \).

Since \(S_1, S_2\in C^{k-2}(\Omega )\), we have the following representation of \(\Delta ^{k-1}_hS_j(x)\):

$$\begin{aligned} \Delta ^{k-1}_hS_j(x) = |h|^{k-1}\int _\mathbb {R}D_\nu ^{k-1} S_j\left( x+u\nu \right) M_{k-1}(u) \mathrm{d}u, \end{aligned}$$

where \(M_{k-1}(u)\) is the B-spline with knots \(u_0, u_1, \dots , u_{k-1}\), \(u_\ell := \ell |h|\). In fact, \( M_{k-1}(u)=(k-1)[u_0, \dots , u_{k-1}](\cdot - u)_+^{k-2} \) is the divided difference. As is well known, \(0\le M_{k-1} \le c|h|^{-1}\), \({\text {supp}}M_{k-1} \subset [0, (k-1)|h|]\), and \(\int _\mathbb {R}M_{k-1}(u) du =1\). Therefore, by \(\Delta ^k_hS_j(x)=\Delta ^{k-1}_hS_j(x+h)-\Delta ^{k-1}_hS_j(x)\), whenever \([x, x+kh]\subset \Omega \), we arrive at the representation

$$\begin{aligned} \Delta ^k_hS_j(x) = |h|^{k-1}\int _0^{k|h|} D_\nu ^{k-1} S_j\left( x+uv\right) M^*_k(u) \mathrm{d}u, \end{aligned}$$
(4.26)

where \(M^*_k(u):= M_{k-1}(u-|h|)-M_{k-1}(u)\).

In what follows, we estimate \(\Vert \Delta _h^k S_1\Vert _{L^\tau (G)}^\tau - \Vert \Delta _h^k S_2\Vert _{L^\tau (G)}^\tau \) for different subsets G of \(\Omega \).

4.6 Case 1

Let \(T \in \mathcal {T}\) be such that \(d(T)> 2kt/{\tilde{c}}\) with \({\tilde{c}}\) the constant from Lemma 4.11. Denote

$$\begin{aligned} T_h:=\{x\in \Omega : [x, x+kh] \subset \Omega \;\;\hbox {and}\;\;[x, x+kh]\cap T\ne \emptyset \}. \end{aligned}$$

We next estimate \(\Vert \Delta _h^k S_1\Vert _{L^\tau (T_h)}^\tau - \Vert \Delta _h^k S_2\Vert _{L^\tau (T_h)}^\tau \).

Assume that \(T \in \mathcal {T}\) is a trapezoid positioned as described above in Properties of New Trapezoids. We adhere to the notation introduced there.

In addition, let \(v_4-v_1 =: |v_4-v_1| (\cos \gamma , \sin \gamma )\) with \(\gamma \le \pi /2\); i.e., \(\gamma \) is the angle between \(D_1\) and \(L_1\). Assume that \(\nu =:(\cos \theta , \sin \theta )\) with \(\theta \in [\gamma , \pi ]\). The case \(\theta \in [-\gamma , 0]\) is just the same. The case when \(\theta \in [0, \gamma ]\cup [-\pi , -\gamma ]\) is considered similarly.

We set \(B_v:=B(v, 2kt/{\tilde{c}})\), \(v\in \mathcal {V}_T\). Also, denote

$$\begin{aligned} \mathcal {A}_T^t&:=\{A\in \mathcal {A}: d(A) > 2kt/{\tilde{c}}\quad \hbox {and}\quad A\cap (T+B(0, kt))\ne \emptyset \},\\ {\mathfrak {A}}_T^t&:=\{A\in \mathcal {A}: d(A) \le 2kt/{\tilde{c}}\quad \hbox {and}\quad A\cap (T+B(0, kt))\ne \emptyset \} \end{aligned}$$

and

$$\begin{aligned} \mathcal {T}_T^t:=\{T'\in \mathcal {T}: d(T') > 2kt/{\tilde{c}}\quad \hbox {and}\quad T'\cap (T+B(0, kt))\ne \emptyset \},\\ {\mathfrak {T}}_T^t:=\{T'\in \mathcal {T}: d(T') \le 2kt/{\tilde{c}}\quad \hbox {and}\quad T'\cap (T+B(0, kt))\ne \emptyset \}. \end{aligned}$$

Clearly, \(\# \mathcal {A}_T^t \le c\) and \(\# \mathcal {T}_T^t \le c\) for some constant \(c>0\).

Case 1 (a). If \([x, x+kh]\subset \triangle _{E_1}\), then \(\Delta _h^k S_1(x)=0\) because \(S_1\) is a polynomial of degree \(<k\) on \(\triangle _{E_1}\). Hence no estimate is needed.

Case 1 (b). If \([x, x+kh] \subset \cup _{v\in \mathcal {V}_{T}} B_v\), we estimate \(|\Delta _h^k S_1(x)|\) trivially:

$$\begin{aligned} |\Delta _h^k S_1(x)| \le |\Delta _h^k S_2(x)| + 2^k\sum _{\ell =0}^k|S_1(x+\ell h)-S_2(x+\ell h)|. \end{aligned}$$
(4.27)

Clearly, the contribution of this case to estimating \(\Vert \Delta _h^k S_1\Vert _{L^\tau (T_h)}^\tau - \Vert \Delta _h^k S_2\Vert _{L^\tau (T_h)}^\tau \) is

$$\begin{aligned}&\le c \sum _{v\in \mathcal {V}_T}\sum _{A\in \mathcal {A}_T^t}\Vert S_1-S_2\Vert _{L^\tau (B_v\cap A)}^\tau + c\sum _{v\in \mathcal {V}_T}\sum _{T'\in \mathcal {T}_T^t}\Vert S_1-S_2\Vert _{L^\tau (B_v\cap T')}^\tau \\&\quad + c \sum _{v\in \mathcal {V}_T}\sum _{A\in {\mathfrak {A}}_T^t}\Vert S_1-S_2\Vert _{L^\tau (B_v\cap A)}^\tau + c\sum _{v\in \mathcal {V}_T}\sum _{T'\in {\mathfrak {T}}_T^t}\Vert S_1-S_2\Vert _{L^\tau (B_v\cap T')}^\tau \\&\le \sum _{A\in \mathcal {A}_T^t}ct^2d(A)^{\tau s-2}\Vert S_1-S_2\Vert _{L^p(A)}^\tau + \sum _{T'\in \mathcal {T}_T^t} ct^{1+\tau s/2}d(T')^{\tau s/2-1}\Vert S_1-S_2\Vert _{L^p(T')}^\tau \\&\quad + \sum _{A\in {\mathfrak {A}}_T^t} cd(A)^{\tau s}\Vert S_1-S_2\Vert _{L^p(A)}^\tau + \sum _{T'\in {\mathfrak {T}}_T^t} cd(T')^{\tau s}\Vert S_1-S_2\Vert _{L^p(T')}^\tau . \end{aligned}$$

Here we used the following estimates, which are a consequence of Lemma 4.9:

  1. (1)

    If \(A\in \mathcal {A}_T^t\), then

    $$\begin{aligned} \Vert S_1-S_2\Vert _{L^\tau (B_v\cap A)}^\tau&\le c(|B_v|/|A|)\Vert S_1-S_2\Vert _{L^\tau (A)}^\tau \\&\le ct^2d(A)^{-2}\Vert S_1-S_2\Vert _{L^\tau (A)}^\tau \le ct^2d(A)^{\tau s-2}\Vert S_1-S_2\Vert _{L^p(A)}^\tau . \end{aligned}$$
  2. (2)

    If \(T'\in \mathcal {T}_T^t\) and \(\delta _1(T') > 2kt/{\tilde{c}}\), then for any \(v\in \mathcal {V}_{T}\), we have

    $$\begin{aligned} \Vert S_1-S_2\Vert _{L^\tau (B_v\cap T')}^\tau&\le c(|B_v|/|T'|)\Vert S_1-S_2\Vert _{L^\tau (T')}^\tau \le ct^2|T'|^{\tau s/2-1}\Vert S_1-S_2\Vert _{L^p(T')}^\tau \\&\le ct^2\delta _1(T')^{\tau s/2-1}d(T')^{\tau s/2-1}\Vert S_1-S_2\Vert _{L^p(T')}^\tau \\&\le ct^{1+\tau s/2}d(T')^{\tau s/2-1}\Vert S_1-S_2\Vert _{L^p(T')}^\tau , \end{aligned}$$

    where we used that \(\tau s/2 <1\), which is equivalent to \(s<s+2/p\).

  3. (3)

    If \(T'\in \mathcal {T}_T^t\) and \(\delta _1(T') \le 2kt/{\tilde{c}}\), then for any \(v\in \mathcal {V}_{T}\),

    $$\begin{aligned} \Vert S_1-S_2\Vert _{L^\tau (B_v\cap T')}^\tau&\le c(|B_v\cap T'|/|T'|)\Vert S_1-S_2\Vert _{L^\tau (T')}^\tau \\&\le ct\delta _1(T')[\delta _1(T')d(T')]^{-1}\Vert S_1-S_2\Vert _{L^\tau (T')}^\tau \\&\le ctd(T')^{-1}[\delta _1(T')d(T')]^{\tau s/2}\Vert S_1-S_2\Vert _{L^p(T')}^\tau \\&\le ct^{1+\tau s/2}d(T')^{\tau s/2-1}\Vert S_1-S_2\Vert _{L^p(T')}^\tau . \end{aligned}$$
  4. (4)

    If \(A\in {\mathfrak {A}}_T^t\), then

    $$\begin{aligned} \Vert S_1-S_2\Vert _{L^\tau (B_v\cap A)}^\tau&\le \Vert S_1-S_2\Vert _{L^\tau (A)}^\tau \le c|A|^{\tau s/2}\Vert S_1-S_2\Vert _{L^p(A)}^\tau \\&\le cd(A)^{\tau s}\Vert S_1-S_2\Vert _{L^p(A)}^\tau . \end{aligned}$$
  5. (5)

    If \(T'\in {\mathfrak {T}}_T^t\), then

    $$\begin{aligned} \Vert S_1-S_2\Vert _{L^\tau (B_v\cap T')}^\tau&\le \Vert S_1-S_2\Vert _{L^\tau (T')}^\tau \le c|T'|^{\tau s/2}\Vert S_1-S_2\Vert _{L^p(T')}^\tau \\&\le cd(T')^{\tau s}\Vert S_1-S_2\Vert _{L^p(T')}^\tau . \end{aligned}$$

Case 1 (c). If \([x, x+kh]\not \subset \cup _{v\in \mathcal {V}_{T}} B_v\) and \([x, x+kh]\) intersects \(D_1\) or \(D_2\), then \(\delta _1> 2kt/{\tilde{c}}>2kt\) or \(\delta _2 >2kt\) and hence \([x, x+kh]\subset \triangle _{E_1}\cap \triangle _{E_2}\), which implies \(\Delta _h^k S_1(x)=0\). No estimate is needed.

Case 1 (d). Let \({I_T^h}\subset T\) be the quadrilateral bounded by the segments \(L_1\), \(L_1-kh\), \(D_1\) and the line with equation \(x=v_2+uh\), \(u\in \mathbb {R}\), where \(v_2\) is the point of intersection of \(L_2\) with \(D_1\), whenever this straight line intersects \(L_1\). If the line \(x=v_2+uh\), \(u\in \mathbb {R}\), does not intersect \(L_1\), then we replace it with the line \(x=v_4+uh\), \(u\in \mathbb {R}\). Furthermore, we subtract \(B_{v_1}\) and \(B_{v_2}\) from \({I_T^h}\).

Set \({J_T^h}:= {I_T^h}+[0, kh]\).

A simple geometric argument shows that \(|{J_T^h}| \le 2\delta _1 kt\).

In estimating \(\Vert \Delta _h^k S_1\Vert _{L^\tau ({I_T^h})}^\tau \) there are two subcases to be considered.

If \(\delta _1(T) \le 2kt/{\tilde{c}}\), we use (4.27) to obtain

$$\begin{aligned} \Vert \Delta _h^k S_1\Vert _{L^\tau ({I_T^h})}^\tau&\le \Vert \Delta _h^k S_2\Vert _{L^\tau ({I_T^h})}^\tau +\Vert S_1-S_2\Vert _{L^\tau ({I_T^h})}^\tau +\Vert S_1-S_2\Vert _{L^\tau ({J_T^h}\cap A_1)}^\tau . \end{aligned}$$

We estimate the above norms quite like in Case 1 (b), using Lemma 4.9. We have

$$\begin{aligned}&\Vert S_1-S_2\Vert _{L^\tau ({I_T^h})}^\tau \le c(|{I_T^h}|/|T|)\Vert S_1-S_2\Vert _{L^\tau (T)}^\tau \\&\le ct\delta _1(T)[\delta _1(T)d(T)]^{-1}\Vert S_1-S_2\Vert _{L^\tau (T)}^\tau \le ct d(T)^{-1}|T|^{\tau s/2}\Vert S_1-S_2\Vert _{L^p(T)}^\tau \\&\le ct d(T)^{-1}(\delta _1(T) d(T))^{\tau s/2}\Vert S_1-S_2\Vert _{L^p(T)}^\tau \le ct^{1+\tau s/2} d(T)^{\tau s/2-1}\Vert S_1-S_2\Vert _{L^p(T)}^\tau . \end{aligned}$$

For the second norm, we get

$$\begin{aligned} \Vert S_1-S_2\Vert _{L^\tau ({J_T^h}\cap A_1)}^\tau \le c|{J_T^h}|\Vert S_1-S_2\Vert _{L^\infty (A_1)}^\tau&\le ct^2|A_1|^{-\tau /p} \Vert S_1-S_2\Vert _{L^p(A_1)}^\tau \\ \le ct^2d(A_1)^{-2\tau /p} \Vert S_1-S_2\Vert _{L^p(A_1)}^\tau&= ct^2d(A_1)^{\tau s-2} \Vert S_1-S_2\Vert _{L^p(A_1)}^\tau , \end{aligned}$$

where as before we used the fact that \(2\tau /p=2-\tau s\).

From the above estimates, we infer

$$\begin{aligned} \Vert \Delta _h^k S_1\Vert _{L^\tau ({I_T^h})}^\tau \le \Vert \Delta _h^k S_2\Vert _{L^\tau ({I_T^h})}^\tau&+ ct^{1+\tau s/2} d(T)^{\tau s/2-1}\Vert S_1-S_2\Vert _{L^p(T)}^\tau \\&+ ct^2d(A_1)^{\tau s-2} \Vert S_1-S_2\Vert _{L^p(A_1)}^\tau . \end{aligned}$$

Let \(\delta _1(T) > 2kt/{\tilde{c}}\). We use (4.26) to obtain

$$\begin{aligned} |\Delta _h^k S_1(x)|&\le |\Delta _h^k S_2(x)|+|\Delta _h^k (S_1-S_2)(x)|\\&\le |\Delta _h^k S_2(x)|+ ct^{k-1}\Vert D_\nu ^{k-1}(S_1-S_2)\Vert _{L^\infty ([x, x+kh])}, \end{aligned}$$

implying

$$\begin{aligned} \Vert \Delta _h^k S_1\Vert _{L^\tau ({I_T^h})}^\tau \le \Vert \Delta _h^k S_2\Vert _{L^\tau ({I_T^h})}^\tau&+ c |{I_T^h}|t^{\tau (k-1)}\Vert D_\nu ^{k-1} (S_1-S_2)\Vert _{L^\infty ({I_T^h}\cap T)}^\tau \\&+ c |{I_T^h}|t^{\tau (k-1)}\Vert D_\nu ^{k-1} (S_1-S_2)\Vert _{L^\infty (A_1)}^\tau . \end{aligned}$$

Clearly,

$$\begin{aligned}&\Vert D_\nu ^{k-1} (S_1-S_2)\Vert _{L^\infty ({I_T^h}\cap T)} \le c\delta _1(T)^{-(k-1)}\Vert S_1-S_2\Vert _{L^\infty (T)}\\&\le c\delta _1(T)^{-(k-1)}|T|^{-1/p}\Vert S_1-S_2\Vert _{L^p(T)} \le c\delta _1(T)^{-(k-1)-2/p}\Vert S_1-S_2\Vert _{L^p(T)}, \end{aligned}$$

and

$$\begin{aligned} \Vert D_\nu ^{k-1} (S_1-S_2)\Vert _{L^\infty (A_1)}&\le cd(A_1)^{-(k-1)}\Vert S_1-S_2\Vert _{L^\infty (A_1)}\nonumber \\&\le cd(A_1)^{-(k-1)-2/p}\Vert S_1-S_2\Vert _{L^p(A_1)}. \end{aligned}$$
(4.28)

Therefore,

$$\begin{aligned} \Vert \Delta _h^k S_1\Vert _{L^\tau ({I_T^h})}^\tau&\le \Vert \Delta _h^k S_2\Vert _{L^\tau ({I_T^h})}^\tau + ct^{1+\tau (k-1)}\delta _1(T)^{1-\tau (k-1)-2\tau /p}\Vert S_1-S_2\Vert _{L^p(T)}^\tau \\&\quad + ct^{1+\tau (k-1)}d(A_1)^{1-\tau (k-1)-2\tau /p}\Vert S_1-S_2\Vert _{L^p(A_1)}^\tau . \end{aligned}$$

Case 1 (e) (Main). Let \(T_h^\star \subset T_h\) be the set defined by

$$\begin{aligned} T_h^\star :=\Big \{x\in T_h: [x, x+kh]\cap L_1 \ne \emptyset \quad \hbox {and}\quad [x,x+ kh] \not \subset {I_T^h}\bigcup _{v\in \mathcal {V}_{T}} B_v\Big \}. \end{aligned}$$
(4.29)

We next estimate \(\Vert \Delta _h^k S_1\Vert _{L^\tau (T_h^\star )}^\tau \).

Let \(x\in T_h^\star \). Denote by \(b_1\) and \(b_2\) the points where the line through x and \(x+kh\) intersects \(L_1\) and \(L_2\). Set \(b=b(x):=b_2-b_1\). We associate the segment \([x+b, x+b +kh]\) with \([x, x+kh]\) and \(\Delta ^k_hS_2(x+b)\) with \(\Delta ^k_hS_1(x)\).

Since \(S_1\in \Pi _k\) on \(\triangle _{E_1}\), we have \(D_\nu ^{k-1}S_1(y) = {\text {constant}}\) on \([b_1, x+b]\), and hence

$$\begin{aligned} D_\nu ^{k-1}S_1(b_1-u\nu )= D_\nu ^{k-1}S_1(b_2-u\nu ) \quad \hbox {for}\quad 0\le u\le |x-b_1|. \end{aligned}$$
(4.30)

Similarly, since \(S_2\in \Pi _k\) on \(\triangle _{E_2}\), we have \(D_\nu ^{k-1}S_2(y) = {\text {constant}}\) on \([x+kh, b_2]\), and hence

$$\begin{aligned} D_\nu ^{k-1}S_2(b_1+u\nu )= D_\nu ^{k-1}S_2(b_2+u\nu ) \quad \hbox {for}\quad 0\le u\le |x+kh-b_1|. \end{aligned}$$
(4.31)

We use (4.26) and (4.304.31) to obtain

$$\begin{aligned} \Delta ^k_hS_1(x)&= |h|^{k-1}\int _{|b_1-x|}^{k|h|} D_\nu ^{k-1} S_1(x+u\nu )M^*_k(u) \mathrm{d}u\\&\quad + |h|^{k-1}\int _0^{|b_1-x|} D_\nu ^{k-1} S_1(x+u\nu )M^*_k(u) \mathrm{d}u\\&= |h|^{k-1}\int _{|b_1-x|}^{k|h|} D_\nu ^{k-1} S_1(x+u\nu )M^*_k(u) \mathrm{d}u\\&\quad + |h|^{k-1}\int _0^{|b_1-x|} D_\nu ^{k-1} S_1(x+b+u\nu )M^*_k(u) \mathrm{d}u \end{aligned}$$

and

$$\begin{aligned} \Delta ^k_hS_2(x+b)&= |h|^{k-1}\int _{|b_1-x|}^{k|h|} D_\nu ^{k-1} S_2(x+b+u\nu )M^*_k(u) \mathrm{d}u\\&\quad + |h|^{k-1}\int _0^{|b_1-x|} D_\nu ^{k-1} S_2(x+b+u\nu )M^*_k(u) \mathrm{d}u\\&= |h|^{k-1}\int _{|b_1-x|}^{k|h|} D_\nu ^{k-1} S_2(x+u\nu )M^*_k(u) \mathrm{d}u\\&\quad + |h|^{k-1}\int _0^{|b_1-x|} D_\nu ^{k-1} S_2(x+b+u\nu )M^*_k(u) \mathrm{d}u. \end{aligned}$$

Therefore,

$$\begin{aligned} \Delta ^k_hS_1(x)&= \Delta ^k_hS_2(x+b) + \Delta ^k_h (S_1-S_2)(x) \\&= \Delta ^k_hS_2(x+b) +|h|^{k-1}\int _{|b_1-x|}^{k|h|} D_\nu ^{k-1} [S_1-S_2]\left( x+u\nu \right) M^*_k(u) \mathrm{d}u\\&\quad + |h|^{k-1}\int _0^{|b_1-x|} D_\nu ^{k-1} [S_1-S_2]\left( x+b+u\nu \right) M^*_k(u) \mathrm{d}u, \end{aligned}$$

and hence

$$\begin{aligned} |\Delta ^k_hS_1(x)|&\le |\Delta ^k_hS_2(x+b)| +ct^{k-1}\Vert D_\nu ^{k-1}(S_1-S_2)\Vert _{L^\infty ([b_1, x+kh])}\\&\quad + ct^{k-1}\Vert D_\nu ^{k-1}(S_1-S_2)\Vert _{L^\infty ([x+b, b_2])}. \nonumber \end{aligned}$$
(4.32)

The key here is that \(([b_1, x+kh]\cup [x+b, b_2])\cap T^\circ =\emptyset \).

Let \(T_h^{\star \star }:=\{x+b(x): x\in T_h^{\star }\}\), where \(T_h^{\star }\) is from (4.29) and b(x) is defined thereafter. By (4.32), we get

$$\begin{aligned} \Vert \Delta _h^k S_1\Vert _{L^\tau (T_h^\star )}^\tau&\le \Vert \Delta _h^k S_2\Vert _{L^\tau (T_h^{\star \star })}^\tau + c t d(A_1)t^{\tau (k-1)}\Vert D_\nu ^{k-1} (S_1-S_2)\Vert _{L^\infty (A_1)}^\tau \\&\quad + c t d(A_2)t^{\tau (k-1)}\Vert D_\nu ^{k-1} (S_1-S_2)\Vert _{L^\infty (A_2)}^\tau . \end{aligned}$$

Just as (4.28) we have

$$\begin{aligned} \Vert D_\nu ^{k-1} (S_1-S_2)\Vert _{L^\infty (A_1)}&\le cd(A_1)^{-(k-1)}\Vert S_1-S_2\Vert _{L^\infty (A_1)}\\&\le cd(A_1)^{-(k-1)-2/p}\Vert S_1-S_2\Vert _{L^p(A_1)}, \end{aligned}$$

and similar estimates hold with \(A_1\) replaced by \(A_2\). We use all of the above to obtain

$$\begin{aligned} \Vert \Delta _h^k S_1\Vert _{L^\tau (T_h^\star )}^\tau&\le \Vert \Delta _h^k S_2\Vert _{L^\tau (T_h^{\star \star })}^\tau + ct^{1+\tau (k-1)}d(A_1)^{1-\tau (k-1)-2\tau /p}\Vert S_1-S_2\Vert _{L^p(A_1)}^\tau \\&\quad + ct^{1+\tau (k-1)}d(A_2)^{1-\tau (k-1)-2\tau /p}\Vert S_1-S_2\Vert _{L^p(A_2)}^\tau . \end{aligned}$$

It is an important observation that no part of \(\Vert \Delta _h^k S_2\Vert _{L^\tau (T_h^{\star \star })}^\tau \) has been used for estimation of quantities \(\Vert \Delta _h^k S_1\Vert _{L^\tau (\cdot )}^\tau \) from previous cases.

Putting all of the above estimates together, we arrive at

$$\begin{aligned} \Vert \Delta _h^k S_1\Vert _{L^\tau (T_h)}^\tau \le \Vert \Delta _h^k S_2\Vert _{L^\tau (T_h)}^\tau +Y_1+Y_2+ Y_3 + Y_4, \end{aligned}$$
(4.33)

where

$$\begin{aligned} Y_1&:= \sum _{A\in \mathcal {A}_T^t} ct^2d(A)^{\tau s-2}\Vert S_1-S_2\Vert _{L^p(A)}^\tau + \sum _{A\in {\mathfrak {A}}_T^t} cd(A)^{\tau s}\Vert S_1-S_2\Vert _{L^p(A)}^\tau ,\\ Y_2&:= ct^{1+\tau (k-1)}d(A_1)^{1-\tau (k-1)-2\tau /p}\Vert S_1-S_2\Vert _{L^p(A_1)}^\tau \\&\quad + ct^{1+\tau (k-1)}d(A_2)^{1-\tau (k-1)-2\tau /p}\Vert S_1-S_2\Vert _{L^p(A_2)}^\tau ,\\ Y_3&:= \sum _{T'\in \mathcal {T}_T^t} ct^{1+\tau s/2}d(T')^{\tau s/2-1}\Vert S_1-S_2\Vert _{L^p(T')}^\tau \\&\quad + \sum _{T'\in {\mathfrak {T}}_T^t} cd(T')^{\tau s}\Vert S_1-S_2\Vert _{L^p(T')}^\tau + ct^{1+\tau s/2} d(T)^{\tau s/2-1}\Vert S_1-S_2\Vert _{L^p(T)}^\tau , \end{aligned}$$

and

$$\begin{aligned} Y_4 := ct^{1+\tau (k-1)}\delta _1(T)^{1-\tau (k-1)-2\tau /p}\Vert S_1-S_2\Vert _{L^p(T)}^\tau , \quad \hbox {if}\;\;\delta _1(T)>2kt/{\tilde{c}}, \end{aligned}$$

otherwise \(Y_4:=0\).

Remark

In all cases we considered above but Case 1 (e), we used the simple inequality \(|\Delta ^k_hS_1(x)| \le |\Delta ^k_hS_2(x)| + |\Delta ^k_h(S_1-S_2)(x)|\) to estimate \(\Vert \Delta _h^k S_1\Vert _{L^\tau (G)}^\tau \) for various sets G, and this works because these sets are of relatively small measure. As Example 3.3 shows, this approach in principle cannot be used in Case 1 (e), and this is the main difficulty in this proof. The gist of our approach in going around is to estimate \(|\Delta ^k_hS_1(x)|\) by using \(|\Delta ^k_hS_2(x+b)|\) with some shift b, where \(|\Delta ^k_hS_2(x+b)|\) is not used to estimate other terms \(|\Delta ^k_hS_1(x')|\) (there is a one-to-one correspondence between these quantities).

4.7 Case 2

Let \(\Omega _h^\star \) be the set of all \(x\in \Omega \) such that \([x, x+kh] \subset \Omega \), \([x, x+kh]\cap A \ne \emptyset \) for some \(A\in \mathcal {A}\) with \(d(A) > 2kt/{\tilde{c}}\), and \([x, x+kh] \cap T = \emptyset \) for all \(T\in \mathcal {T}\) with \(d(T) \ge 2kt/{\tilde{c}}\).

Denote by \(\mathcal {V}_A\) the set of all vertices on \(\partial A\), and set \(B_v:=B(v, 4kt/{\tilde{c}})\), \(v\in \mathcal {V}_A\).

We next indicate how we estimate \(|\Delta _h^k S_1(x)|\) in different cases.

Case 2 (a). If \([x, x+kh] \subset A\), then \( \Delta _h^kS_1(x)= \Delta _h^kS_2(x) =0 \) and no estimate is needed.

Case 2 (b). If \([x, x+kh] \subset \cup _{v\in \mathcal {V}_A} B(v, 2kt/{\tilde{c}})\), we estimate \(|\Delta _h^k S_1(x)|\) trivially:

$$\begin{aligned} |\Delta _h^k S_1(x)| \le |\Delta _h^k S_2(x)| + 2^k\sum _{\ell =0}^k|S_1(x+\ell h)-S_2(x+\ell h)|. \end{aligned}$$

Case 2 (c). Let \([x, x+kh]\) intersect the edge \(E=:[w_1, w_2]\) from \(\partial A\), that is shared with \(A'\in \mathcal {A}\) and \([x, x+kh] \not \subset \cup _{v\in \mathcal {V}_A} B_v\). Let \(y:= E\cap [x, x+kh]\). Evidently, \(|y-w_j| > kt/{\tilde{c}}\), \(j=1,2\), and in light of Lemma 4.11, we have \([x, x+kh] \subset B(y, kt) \subset A\cup A'\). In this case, we use the inequality

$$\begin{aligned} |\Delta _h^k S_1(x)|&\le |\Delta _h^k S_2(x)|+|\Delta _h^k (S_1-S_2)(x)|\\&\le |\Delta _h^k S_2(x)|+ ct^{k-1}\Vert D_\nu ^{k-1}(S_1-S_2)\Vert _{L^\infty ([x, x+kh])}, \end{aligned}$$

which follows by (4.26).

The case when \([x, x+kh]\) intersects an edge from \(\partial A\) that is shared with some \(T\in \mathcal {T}\) is covered in Case 1 above.

We proceed further similarly as in Case 1 and in the proof of Theorem 4.5 to obtain

$$\begin{aligned} \Vert \Delta _h^kS_1\Vert _{L^\tau (\Omega _t^\star )}^\tau \le \Vert \Delta _h^kS_2\Vert _{L^\tau (\Omega _t^\star )}^\tau + Y_1+Y_2, \end{aligned}$$
(4.34)

where

$$\begin{aligned} Y_1&:= \sum _{A\in \mathcal {A}: d(A) \ge 2kt/{\tilde{c}}}t^{1+\tau (k-1)} cd(A)^{1-\tau (k-1)-2\tau /p}\Vert S_1-S_2\Vert _{L^p(A)}^\tau \\&\quad + \sum _{A\in \mathcal {A}: d(A) \ge 2kt/{\tilde{c}}} ct^2 d(A)^{\tau s-2}\Vert S_1-S_2\Vert _{L^p(A)}^\tau \end{aligned}$$

and

$$\begin{aligned} Y_2&:= \sum _{A\in \mathcal {A}: d(A) \le 2kt/{\tilde{c}}} cd(A)^{\tau s}\Vert S_1-S_2\Vert _{L^p(A)}^\tau \\&\quad + \sum _{T\in \mathcal {T}: d(T) \le 2kt/{\tilde{c}}} cd(T)^{\tau s}\Vert S_1-S_2\Vert _{L^p(T)}^\tau . \end{aligned}$$

4.8 Case 3

Let \(\Omega _h^{\star \star }\) be the set of all \(x\in \Omega \) such that

$$\begin{aligned}{}[x, x+kh] \subset \cup \{A\in \mathcal {A}: d(A) \le 2kt/{\tilde{c}}\} \cup \{T\in \mathcal {T}: d(T) \le 2kt/{\tilde{c}}\}. \end{aligned}$$

In this case, we estimate \(|\Delta _h^k S_1(x)|\) trivially just as in (4.27). We obtain

$$\begin{aligned} \Vert \Delta _h^kS_1\Vert _{L^\tau (\Omega _h^{\star \star })}^\tau&\le \Vert \Delta _h^kS_2\Vert _{L^\tau (\Omega _h^{\star \star })}^\tau + \sum _{A\in \mathcal {A}: d(A) \le 2kt/{\tilde{c}}}c\Vert S_1-S_2\Vert _{L^\tau (A)}^\tau \\&\quad + \sum _{T\in \mathcal {T}: d(T) \le 2kt/{\tilde{c}}} c\Vert S_1-S_2\Vert _{L^\tau (T)}^\tau \\&\le \Vert \Delta _h^kS_2\Vert _{L^\tau (\Omega _h^{\star \star })}^\tau + \sum _{A\in \mathcal {A}: d(A) \le 2kt/{\tilde{c}}} cd(A)^{\tau s}\Vert S_1-S_2\Vert _{L^p(A)}^\tau \\&\quad + \sum _{T\in \mathcal {T}: d(T) \le 2kt/{\tilde{c}}} cd(T)^{\tau s}\Vert S_1-S_2\Vert _{L^p(T)}^\tau . \end{aligned}$$

Just as in the proof of Theorem 3.4, it is important to note that in the above estimates only finitely many norms may overlap at a time. From above, (4.33), and (4.34), we obtain

$$\begin{aligned} \omega _k(S_1, t)_\tau ^\tau&\le \omega _k(S_2, t)_\tau ^\tau + {\mathbb {A}}_t + {\mathbb {T}}_t, \end{aligned}$$

where

$$\begin{aligned} {\mathbb {A}}_t:= & {} \sum _{A\in \mathcal {A}: d(A)> 2kt/{\tilde{c}}}t^{1+\tau (k-1)} cd(A)^{1-\tau (k-1)-2\tau /p}\Vert S_1-S_2\Vert _{L^p(A)}^\tau \\&+ \sum _{A\in \mathcal {A}: d(A) > 2kt/{\tilde{c}}} ct^2 d(A)^{\tau s-2}\Vert S_1-S_2\Vert _{L^p(A)}^\tau \\&+ \sum _{A\in \mathcal {A}: d(A) \le 2kt/{\tilde{c}}} cd(A)^{\tau s}\Vert S_1-S_2\Vert _{L^p(A)}^{\tau }, \end{aligned}$$

and

$$\begin{aligned} {\mathbb {T}}_t&:= \sum _{T\in \mathcal {T}: \delta _1(T)> 2kt/{\tilde{c}}} ct^{1+ \tau (k-1)}\delta _1(T)^{1-\tau (k-1)-2\tau /p}\Vert S_1-S_2\Vert _{L^p(T)}^\tau \\&\quad + \sum _{T\in \mathcal {T}: \delta _2(T)> 2kt/{\tilde{c}}} ct^{1+ \tau (k-1)}\delta _2(T)^{1-\tau (k-1)-2\tau /p}\Vert S_1-S_2\Vert _{L^p(T)}^\tau \\&\quad + \sum _{T\in \mathcal {T}: d(T) > 2kt/{\tilde{c}}} ct^{1+\tau s/2} d(T)^{\tau s/2-1}\Vert S_1-S_2\Vert _{L^p(T)}^\tau \\&\quad + \sum _{T\in \mathcal {T}: d(T) \le 2kt/{\tilde{c}}} cd(T)^{\tau s}\Vert S_1-S_2\Vert _{L^p(T)}^\tau . \end{aligned}$$

We insert this estimate in (2.1) and interchange the order of integration and summation to obtain

$$\begin{aligned} |S_1|_{B^{s,k}_\tau }^\tau \le |S_2|_{B^{s,k}_\tau }^\tau + Z_1 + Z_2, \end{aligned}$$

where

$$\begin{aligned} Z_1&:= c \sum _{A\in \mathcal {A}} d(A)^{1-\tau (k-1)-2\tau /p}\Vert S_1-S_2\Vert _{L^p(A)}^\tau \int _0^{{\tilde{c}}d(A)/2k} t^{-\tau s+\tau (k-1)} \mathrm{d}t \\&\quad + c\sum _{A\in \mathcal {A}} d(A)^{\tau s-2}\Vert S_1-S_2\Vert _{L^p(A)}^\tau \int _0^{{\tilde{c}}d(A)/2k} t^{-\tau s+1} \mathrm{d}t\\&\quad + c\sum _{A\in \mathcal {A}} d(A)^{\tau s}\Vert S_1-S_2\Vert _{L^p(A)}^\tau \int _{{\tilde{c}}d(A)/2k}^\infty t^{-\tau s-1} \mathrm{d}t \end{aligned}$$

and

$$\begin{aligned} Z_2&:= c \sum _{T\in \mathcal {T}} \delta _1(T)^{1-\tau (k-1)-2\tau /p}\Vert S_1-S_2\Vert _{L^p(T)}^\tau \int _0^{{\tilde{c}}\delta _1(T)/2k} t^{-\tau s+ \tau (k-1)} \mathrm{d}t \\&\quad + c \sum _{T\in \mathcal {T}} \delta _2(T)^{1-\tau (k-1)-2\tau /p}\Vert S_1-S_2\Vert _{L^p(T)}^\tau \int _0^{{\tilde{c}}\delta _2(T)/2k} t^{-\tau s+ \tau (k-1)} \mathrm{d}t \\&\quad + c\sum _{T\in \mathcal {T}} d(T)^{\tau s/2-1}\Vert S_1-S_2\Vert _{L^p(T)}^\tau \int _0^{{\tilde{c}}d(T)/2k} t^{-\tau s/2} \mathrm{d}t\\&\quad + c\sum _{T\in \mathcal {T}} d(T)^{s\tau }\Vert S_1-S_2\Vert _{L^p(T)}^\tau \int _{{\tilde{c}}d(T)/2k}^\infty t^{-\tau s-1} \mathrm{d}t. \end{aligned}$$

Observe that \(-\tau s + \tau (k-1) > -1\) is equivalent to \(s/2<k-1+1/p\), which holds true by the hypothesis, and \(-\tau s/2>-1\) is equivalent to \(s<2/\tau = s+2/p\), which is obvious. Therefore, all integrals above are convergent, and taking into account that \(2-2\tau /p-\tau s = 2\tau (1/\tau -1/p-s/2) = 0\), we obtain

$$\begin{aligned} |S_1|_{B^{s,k}_\tau }^\tau&\le |S_2|_{B^{s,k}_\tau }^\tau + c\sum _{A\in \mathcal {A}\cup \mathcal {T}} \Vert S_1-S_2\Vert _{L^p(A)}^\tau \\&\le |S_2|_{B^{s,k}_\tau }^\tau + cn^{\tau (1/\tau -1/p)}\left( \sum _{A\in \mathcal {A}\cup \mathcal {T}} \Vert S_1-S_2\Vert _{L^p(A)}^\tau \right) ^{\tau /p}\\&=|S_2|_{B^{s,k}_\tau }^\tau + cn^{\tau s/2}\Vert S\Vert _{L^p(\Omega )}^\tau , \end{aligned}$$

where we used Hölder’s inequality. This completes the proof of Theorem 4.2. \(\square \)