1 Introduction

The estimations of density derivatives play important roles in the exploration of structures in curves, comparison of regression curves, analysis of human growth data, mean shift clustering and hypothesis testing [16]. More precisely, let \((\Omega ,\mathscr {F},P)\) be a probability measurable space and \(X_{1},\ldots ,X_{n}\) be independent and identically distributed (i.i.d.) random samples with an unknown density function f. The purpose is to estimate the density derivative \(f^{(d)}\) with \(d\in \mathbb {N}\) from the observed data \(X_{1},\ldots ,X_{n}\).

In particular, the density derivative estimation model can be reduced to the density one, when the order \(d=0\). For density estimations, the classical kernel methods give nice estimations [9, 19, 21]. Compared with kernel estimators, the wavelet ones have better performances because of they can provide more local information and have fast wavelet algorithm [5, 15]. For instance, Donoho et al. [6] have made almost perfect achievements in wavelet estimations, which established an adaptive and optimal estimation (up to a logarithmic factor) for a univariate density function over \(L^p\)-risk (\(1\le p<\infty \)) on Besov spaces.

In contrast to the traditional adaptive estimations, Goldenshluger and Lepski [7] constructed a kernel estimator for density functions by using data-driven methods, and provided \(L^p\)-risk (\(1\le p<\infty \)) estimations over anisotropic Nikol’skii classes in 2014. Five years later, Liu and Wu [13] introduced a data-driven wavelet estimator and considered point-wise density estimations under the local anisotropic Hölder condition. Recently, Cao and Zeng [1] investigated the adaptive \(L^p\)-risk (\(1\le p<\infty \)) estimations under the independence hypothesis on Besov spaces by using the data-driven wavelet estimator.

Along with the density estimations, it is often necessary to estimate the derivatives of density function. Müller and Gasser [18] discussed kernel estimations for density derivatives over \(L^{2}\)-risk on Sobolev spaces. Then in 1996, Rao [20] explored wavelet density derivative estimations over \(L^{2}\)-risk on Sobolev spaces. Moreover, Rao’s estimates were generalized to unmatched Besov spaces \(B_{r,q}^s\) and \(L^{p}\)-risk (\(1\le p<\infty \)) in Ref. [3]. In 2013, Liu and Wang [12] defined the new linear and nonlinear wavelet estimators for density derivatives, and provided \(L^{p}\)-risk estimations on Besov spaces, respectively.

This paper investigates the adaptive wavelet estimations for density derivatives. Based on the classical linear wavelet estimator for density derivatives, we show the point-wise estimations under the local Hölder condition firstly. Furthermore, motivated by the works of Goldenshluger and Lepski [7] and Cao and Zeng [1], we introduce a data-driven wavelet estimator for adaptivity and prove a point-wise oracle inequality, which does not require any assumption on the underlying function f or \(f^{(d)}\) (except for the restrictions ensuring the existence of the model and of the risk). Finally, by using the point-wise oracle inequality, we give the point-wise estimations under the local Hölder condition and \(L^p\)-risk \((1\le p<\infty )\) estimations on Besov spaces respectively.

1.1 Wavelets and Function Spaces

We begin with an important concept in wavelet analysis in this subsection. A Multiresolution Analysis (MRA, [8, 17]) is a sequence of closed subspaces \(\{V_{j}\}_{j\in \mathbb {Z}}\) of the square integrable function space \(L^{2}(\mathbb {R})\) satisfying the following properties:

  1. (i).

       \(V_{j}\subset V_{j+1}\), \(j\in \mathbb {Z}\);

  2. (ii).

       \(\overline{\bigcup _{j\in \mathbb {Z}} V_{j}}=L^{2}(\mathbb {R})\) (the space \(\bigcup _{j\in \mathbb {Z}} V_{j}\) is dense in \(L^{2}(\mathbb {R})\));

  3. (iii).

       \(f(2\cdot )\in V_{j+1}\) if and only if \(f(\cdot )\in V_{j}\) for each \(j\in \mathbb {Z}\);

  4. (iv).

       There exists \(\varphi \in L^{2}(\mathbb {R})\) (scaling function) such that \(\{\varphi (\cdot -k),~k\in \mathbb {Z}\}\) forms an orthonormal basis of \(V_{0}=\overline{\textrm{span}\{\varphi (\cdot -k),k\in \mathbb {Z}\}}\).

Moreover, a wavelet function \(\psi \) can be derived from the scaling function \(\varphi \) in a simple way such that for fixed \(j_0\in \mathbb {N}\), both \(\{\varphi _{j_0k},\psi _{jk}\}_{j\ge j_0,k\in \mathbb {Z}}\) and \(\{\psi _{jk}\}_{j,k\in \mathbb {Z}}\) are orthonormal bases (wavelet bases) of \(L^{2}(\mathbb {R})\), where \(h_{jk}(\cdot ):=2^{\frac{j}{2}}h(2^{j}\cdot -k)\) for \(h=\varphi \) or \(\psi \). Hence, for each \(f\in L^2(\mathbb {R})\),

$$\begin{aligned} f=\sum _{k\in \mathbb {Z}}s_{j_0k}\varphi _{j_0k}+\sum _{j\ge j_0}\sum _{k\in \mathbb {Z}}d_{jk}\psi _{jk} \end{aligned}$$

holds in \(L^2\)-sense, where \(s_{jk}:=\langle f, \varphi _{jk}\rangle \) and \(d_{jk}:=\langle f,\psi _{jk}\rangle \). When \(\varphi \) is t regular, the above identity holds in \(L^{p}\)-sense \((p\ge 1)\). Here and after, a scaling function \(\varphi \) is called t regular [4] (\(t\in \mathbb {N}\)), if \(\varphi \in C^{t}(\mathbb {R})\) and \(|\varphi ^{(r)}(x)|\le C(1+|x|^{2})^{-l}\) for any \(l\in \mathbb {Z}~(r=0,1,\ldots ,t)\). For instance, Daubechies’s scaling function \(D_{2N}\) is t regular for large N and Meyer’s function possesses any order of regularity. Furthermore, it is easy to verify that the regularity of \(\varphi \) implies the regularity of \(\psi \).

As usual, the notation \(P_j\) stands for the orthogonal projection operator from \(L^{2}(\mathbb {R})\) onto the scaling space \(V_{j}\) with the orthonormal basis \(\{\varphi _{jk}\}_{k\in \mathbb {Z}}\). Thus, for each \(f\in L^2(\mathbb {R})\),

$$\begin{aligned} P_{j}f=\sum _{k\in \mathbb {Z}}s_{jk}\varphi _{jk} \end{aligned}$$

with \(s_{jk}:=\langle f,\varphi _{jk}\rangle \). If \(\varphi \) satisfies condition (\(\theta \)), i.e.,

$$\begin{aligned} \Theta _{\varphi }(\cdot ):=\sum _{k\in \mathbb {Z}}|\varphi (\cdot -k)|\in L^{\infty }(\mathbb {R}), \end{aligned}$$

then \(P_{j}f\) is well-defined for \(f\in L^{p}(\mathbb {R})~(1\le p\le \infty )\). Furthermore, Condition (\(\theta \)) can be followed by the regularity of \(\varphi \).

As in Refs. [13, 14], we shall investigate the point-wise estimations under the local Hölder condition. For a univariate function f, the local Hölder condition of order \(s>0\) at the point \(x_{0}\in \mathbb {R}\) means that for a fixed constant \(L>0\) and each \(x,y\in \Omega _{x_{0}}\) (a neighbourhood of the point \(x_0\)),

$$\begin{aligned} |f^{([s])}(y)-f^{([s])}(x)|\le L|y-x|^{s-[s]}, \end{aligned}$$

where [s] stands for the largest integer strictly small than s. Here, all those functions are denoted by \(H^{s}(\Omega _{x_{0}})\). Obviously, \(f\in H^{s+d}(\Omega _{x_{0}})\) if and only if \(f^{(d)}\in H^{s}(\Omega _{x_{0}})\) with \(d\in \mathbb {N}\).

The following lemma is necessary for the point-wise estimations.

Lemma 1.1

[14, 22, 23] Let \(\varphi \in L^{2}(\mathbb {R})\) be t regular scaling function and \(\psi \) be the corresponding wavelet. If \(f\in H^{s}(\Omega _{x_{0}})\cap L^{2}(\mathbb {R})\) with \(s>0\) and \(t\ge [s]\), then for \(x\in \Omega _{x_{0}}\) and sufficiently large j,

  1. (i).

    \(f(x)=\sum \limits _{k\in \mathbb {Z}}s_{j_{0}k}\varphi _{j_{0}k}(x)+ \sum \limits _{j=j_0}^{\infty }\sum \limits _{k\in \mathbb {Z}}d_{jk}\psi _{jk}(x)\) holds pointwisely;

  2. (ii).

    \(\sup \limits _{x\in \Omega _{x_{0}}}\sup \limits _{f\in H^{s}(\Omega _{x_{0}})\cap L^{2}(\mathbb {R})}|f(x)-P_{j}f(x)|\lesssim 2^{-js}.\)

Here and throughout, \(A\lesssim B\) stands for \(A\le cB\) with some constant \(c>0\); \(A\gtrsim B\) means \(B\lesssim A\); \(A\thicksim B\) denotes both \(A\lesssim B\) and \(A\gtrsim B\).

In this paper, the notation \(H^{s+d}(\Omega _{x_{0}},M)\) with \(d\in \mathbb {N}\) means that

$$\begin{aligned} H^{s+d}(\Omega _{x_{0}},M):= \{f\in H^{s+d}(\Omega _{x_{0}}), \Vert f^{(d)}\Vert _{1}\vee \Vert f^{(d)}\Vert _{\infty }\le M\}, \end{aligned}$$

where M is a positive constant and \(a\vee b:=\max \{a,~b\}\).

On the other hand, the Besov spaces are needed in order to establish \(L^p\)-risk estimations. Let \(W_r^n(\mathbb {R})\) be the Sobolev space with a non-negative integer exponent n,

$$\begin{aligned} W_r^n(\mathbb {R}):=\{f\in L^r(\mathbb {R}),~f^{(n)}\in L^r(\mathbb {R})\}, \end{aligned}$$

and \(\Vert f\Vert _{W_r^n}:=\Vert f\Vert _r+\Vert f^{(n)}\Vert _r.\) Then \(L^r(\mathbb {R})\) can be seen as \(W_r^0(\mathbb {R})\). For \(1\le r,q\le \infty \) and \(s=n+\alpha \) with \(\alpha \in (0,1]\), a Besov space \(B_{r,q}^{s}(\mathbb {R})\) is defined by

$$\begin{aligned} B_{r,q}^{s}(\mathbb {R}):=\{f\in W_r^n(\mathbb {R}),~\Vert t^{-\alpha }\omega _r^2(f^{(n)},t)\Vert _q^*<\infty \} \end{aligned}$$

with the norm \(\Vert f\Vert _{B_{r,q}^{s}}:=\Vert f\Vert _{W_r^n}+\Vert t^{-\alpha }\omega _r^2(f^{(n)},t)\Vert _q^*\). Here, \(\omega _r^2(f,t):=\sup _{|h|\le t}\Vert f(\cdot +2h)-2f(\cdot +h)+f(\cdot )\Vert _r\) denotes the smoothness modulus of f and

$$\begin{aligned}\Vert h\Vert _q^*:=\left\{ \begin{array}{ll} (\int _0^{+\infty }|h(t)|^q\frac{dt}{t})^{\frac{1}{q}}, & \hbox {if} 1\le q<\infty ; \\ \mathop {\mathrm {ess~sup}}\limits _{t\in \mathbb {R}}|h(t)|, & \hbox {if} q=\infty . \\ \end{array} \right. \end{aligned}$$

Then for \(f\in L^{r}(\mathbb {R})\), \(f\in W_{r}^{n+d}(\mathbb {R})\) if and only if \(f^{(d)}\in W_{r}^{n}(\mathbb {R})\), since \(f^{(n+d)}\in L^{r}(\mathbb {R})\) implies \(f^{(j)}\in L^{r}(\mathbb {R})~(j=1,2,\ldots ,n+d)\) (see Ref. [8]). Hence, \(f\in B_{r,q}^{s+d}(\mathbb {R})\) if and only if \(f^{(d)}\in B_{r,q}^{s}(\mathbb {R})\).

One advantage of wavelet bases is that they can characterize Besov spaces.

Lemma 1.2

[17] Let \(\varphi \) be t regular with \(t>s>0\) and \(\psi \) be the corresponding wavelet. Then for \(f\in L^{r}(\mathbb {R})\) and \(r,q\in [1,\infty ]\), the following conditions are equivalent:

  1. (i).

    \(f\in B^{s}_{r,q}(\mathbb {R});\)

  2. (ii).

    \(\{2^{js}\Vert P_{j}f-f\Vert _{r}\}_{j\in \mathbb {Z}}\in l^{q}(\mathbb {Z});\)

  3. (iii).

    \(\{2^{j(s-\frac{1}{r}+\frac{1}{2})}\Vert \{d_{j\cdot }\}\Vert _{l^r}\}_{j\in \mathbb {Z}}\in l^{q}(\mathbb {Z}).\)

The Besov norm of f can be given by

$$\begin{aligned} \Vert f\Vert _{B_{r,q}^{s}}:=\left\| \{s_{j_{0}\cdot }\}\right\| _{l^r}+ \left\| \left\{ 2^{j(s-\frac{1}{r}+\frac{1}{2})}\left\| \{d_{j\cdot }\}\right\| _{l^r}\right\} _{j\ge j_{0}}\right\| _{l^q}. \end{aligned}$$

Furthermore, Lemma 1.2 (i) and (ii) show that \(\Vert P_jf-f\Vert _r\lesssim 2^{-js}\) holds for \(f\in B^{s}_{r,q}(\mathbb {R})\). When \(r\le p\), Lemma 1.2 (i) and (iii) imply that with \(s'-\frac{1}{p}=s-\frac{1}{r}>0\),

$$\begin{aligned} B_{r,q}^s(\mathbb {R})\hookrightarrow B_{p,q}^{s'}(\mathbb {R}), \end{aligned}$$

where \(A\hookrightarrow B\) stands for a Banach space A continuously embedded in another Banach space B. All these claims can be found in Refs. [11, 24].

In this paper, the notation \(B_{r,q}^{s+d}(M)\) with \(M>0\) stands for

$$\begin{aligned} B_{r,q}^{s+d}(M): =\{f\in B_{r,q}^{s+d}(\mathbb {R}),~ \Vert f\Vert _{B_{r,q}^{s+d}}\vee \Vert f^{(d)}\Vert _{\infty }\le M\}. \end{aligned}$$

and

$$\begin{aligned} B_{r,q}^{s+d}(M,T): =\{f\in B_{r,q}^{s+d}(M),~supp~f\subseteq [-T,T]~\text{ with } \text{ some }~T>0\}. \end{aligned}$$
(1.1)

Moreover, \(L^\infty (M)\) is defined by the way. On the other hand, it follows form \(f\in B_{r,q}^{s+d}(M)\) that \(f^{(d)}\in B_{r,q}^{s}(\mathbb {R})\) and \(\Vert f^{(d)}\Vert _{B_{r,q}^{s}}\le M\).

1.2 Our Results

As in [3, 20], the linear wavelet estimator for density derivatives is introduced by

$$\begin{aligned} \widehat{f^{(d)}_{j}}(x):=\sum _{k\in \mathbb {Z}}\widehat{\alpha }_{jk}\varphi _{jk}(x), \end{aligned}$$
(1.2)

where \(\widehat{\alpha }_{jk}:=\frac{(-1)^{d}}{n}\sum _{i=1}^{n} [\varphi _{jk}]^{(d)}(X_{i})\) and \(\varphi \) is t regular with \(t\ge d\). Clearly, \(E\widehat{\alpha }_{jk}=\alpha _{jk}:=\langle f^{(d)},\varphi _{jk}\rangle \) and \(E\widehat{f^{(d)}_{j}}=P_{j}f^{(d)}\).

Next, we are in a position to introduce our results in this paper. The first theorem gives a linear wavelet point-wise estimation for density derivatives under the local Hölder condition.

Theorem 1.1

Let \(\varphi \) be t regular with \(t\ge d\ge 0\) and \(\widehat{f^{(d)}_{j^{*}}}\) be the linear wavelet estimator in (1.2). Then for \(0<s<t\) and \(2^{j^{*}}\thicksim n^{\frac{1}{2s+2d+1}}\),

$$\begin{aligned} \sup \limits _{x\in \Omega _{x_{0}}}\sup \limits _{f\in H^{s+d}(\Omega _{x_{0}},M)\cap L^{\infty }(M)} E\left| \widehat{f^{(d)}_{j^{*}}}(x)-f^{(d)}(x)\right| ^{p}\lesssim n^{-\frac{sp}{2s+2d+1}}. \end{aligned}$$

Remark 1.1

When the order \(d=0\), the density derivative estimation model can be reduced to the classical density one, and Theorem 1.1 coincides with the conclusion of Theorem 3 in one dimension in Ref. [13].

Remark 1.2

Note that the parameter j of the linear wavelet estimator depends on the smoothness index s of unknown density function f in Theorem 1.1, and the estimator in (1.2) is non-adaptive[6, 10, 11].

Motivated by the works in Refs. [1, 2, 7, 14], we provide a selection rule of parameter j in (1.2) only depending on the observed data \(X_{1},\ldots ,X_{n}\), which is so called data-driven version and totally adaptive estimator.

Let \(\mathcal {H}:=\left\{ 0,1,\ldots ,\lfloor \frac{1}{2d+1}\log _2{\frac{n}{\ln n}}\rfloor \right\} \) with \(\lfloor a\rfloor \) denoting the integer part of a. Thus, the selection rule of \(j=j_{0}\) in (1.2) is given by

$$\begin{aligned} \widehat{R}_{j}(x):= & \sup _{j'\in \mathcal {H}}\left[ \left| \widehat{f_{j\wedge j'}^{(d)}}(x)-\widehat{f_{j'}^{(d)}}(x)\right| -\tau _{n}(j\wedge j') -\tau _{n}(j')\right] _{+},\end{aligned}$$
(1.3)
$$\begin{aligned} j_{0}= & j_{0}(x)=\mathop {\text {arginf}}_{j\in \mathcal {H}} \left[ \widehat{R}_{j}(x)+2\tau _{n}(j)\right] . \end{aligned}$$
(1.4)

Here and throughout, \(a\wedge b:=\min \{a,~b\}\), \(a_{+}:=\max \{a,~0\}\) and

$$\begin{aligned} \tau _{n}(j):=\left( \frac{\lambda 2^{j(2d+1)}\ln n}{n}\right) ^{\frac{1}{2}}, \end{aligned}$$
(1.5)

where \(\lambda >0\) is a constant determined later on. Clearly, it only depends on the observed data \(X_1,\ldots ,X_n\). Thus, the data-driven wavelet estimator is obtained by

$$\begin{aligned} \widehat{f_{n}^{(d)}}(x):=\widehat{f_{j_{0}}^{(d)}}(x)=\sum _{k\in \mathbb {Z}}\widehat{\alpha }_{j_{0}k} \varphi _{j_{0}k}(x) \end{aligned}$$
(1.6)

with \(j_0\in \mathcal {H}\) being given in (1.4).

To introduce Theorem 1.2, let

$$\begin{aligned} B_{j}(x,f):=|P_jf^{(d)}(x)-f^{(d)}(x)|\quad \text{ and }\quad S_{n}(x,j):=\widehat{f_{j}^{(d)}}(x)-E\widehat{f_{j}^{(d)}}(x) \end{aligned}$$
(1.7)

be the bias and the stochastic error of \(\widehat{f_{j}^{(d)}}\), respectively. Furthermore, we define

$$\begin{aligned} B_{j}^{*}(x,f):=\sup _{j'\in \mathcal {H},~j'\ge j}B_{j'}(x,f)\quad \text{ and }\quad \aleph _{n}(x):=\sup _{j\in \mathcal {H}}\Big [|S_{n}(x,j)|-\tau _{n}(j)\Big ]_{+}, \end{aligned}$$
(1.8)

where \(\tau _{n}(j)\) is given by (1.5).

Then the following point-wise oracle inequality is established, which plays the key roles in the proofs of Theorems 1.31.4.

Theorem 1.2

For any \(x\in \mathbb {R}\), the estimator \(\widehat{f_{n}^{(d)}}(x)\) in (1.6) satisfies that

$$\begin{aligned} |\widehat{f_{n}^{(d)}}(x)-f^{(d)}(x)|\le \inf _{j\in \mathcal {H}}\left\{ 5B_{j}^{*}(x,f)+5\tau _{n}(j)\right\} +5\aleph _{n}(x), \end{aligned}$$

where \(\tau _{n}(j)\) is given by (1.5) and \(B_{j}^{*}(x,f),~\aleph _{n}(x)\) are determined by (1.8).

Moreover, by using Theorem 1.2, we obtain the adaptive point-wise estimation and \(L^p\)-risk \((1\le p<\infty )\) estimation based on the data-driven estimator in (1.6).

Theorem 1.3

Let \(\varphi \) be t regular with \(t\ge d\ge 0\). Then for \(0<s<t\), the data-driven estimator \(\widehat{f^{(d)}_{n}}\) in (1.6) satisfies

$$\begin{aligned} \sup \limits _{x\in \Omega _{x_{0}}} \sup \limits _{f\in H^{s+d}(\Omega _{x_{0}},M)\cap L^{\infty }(M)} E|\widehat{f^{(d)}_{n}}(x)-f^{(d)}(x)|^{p}\lesssim \Big (\frac{\ln n}{n}\Big )^{\frac{sp}{2s+2d+1}}. \end{aligned}$$

Remark 1.3

The same as Remark 1.1, when \(d=0\), Theorem 1.3 can be reduced to the conclusion of Theorem 4 in one dimension in Ref. [13].

Theorem 1.4

Let \(\varphi \) be t regular with \(t\ge d\ge 0\). Then for \(0<s<t\), \(r,q\in [1,\infty ]\) and \(p\in [1,\infty )\), the data-driven estimator \(\widehat{f_{n}^{(d)}}\) in (1.6) satisfies

$$\begin{aligned} \sup _{f\in B_{r,q}^{s+d}(M,T)\cap L^{\infty }(M)} E\Vert \widehat{f_{n}^{(d)}}I_{[-T,T]}-f^{(d)}\Vert _{p}^{p}\lesssim \Big (\frac{\ln n}{n}\Big )^{\theta p}, \end{aligned}$$

where

$$\begin{aligned} \theta :=\left\{ \begin{array}{rcl} & \frac{s}{2s+2d+1},~~ & \hbox {1}\le p<\frac{2sr}{2d+1}+r; \\ & \frac{sr}{(2d+1)p}, ~~& \hbox {p}\ge \frac{2sr}{2d+1}+r,~s\le \frac{1}{r};\\ & \frac{s-\frac{1}{r}+\frac{1}{p}}{2(s-\frac{1}{r})+2d+1}, ~~ & \hbox {p}\ge \frac{2sr}{2d+1}+r,~s>\frac{1}{r}. \end{array} \right. \end{aligned}$$

Remark 1.4

According to Theorem 3.3 and Theorem 4.3 in Ref. [12], the convergence rates in Theorem 1.4 are optimal (up to a logarithmic factor) for the case of \(s>\frac{1}{r}\). However, the situation is unclear for \(s\le \frac{1}{r}\). Therefore, one of our future work is to determine the optimality of this statistical model for the case \(s\le \frac{1}{r}\).

Remark 1.5

When \(d=0\) and \(s>\frac{1}{r}\), the convergence rate \(\theta =\min \left\{ \frac{s}{2s+1},~\frac{s-\frac{1}{r}+\frac{1}{p}}{2(s-\frac{1}{r})+1}\right\} \) coincides with the works of Donoho et al. in Ref. [6]. In addition, the estimation for the case \(s\le \frac{1}{r}\) is considered in Theorem 1.4.

2 Some Lemmas and Propositions

In this section, we provide some lemmas and propositions which are necessary in the proofs of main results. Rosenthal’s inequality is introduced first.

Rosenthal’s inequality [8].   Let \(p>0\) and \(X_1,X_2,\ldots ,X_n\) be the independent random variables satisfying \(EX_i=0\) and \(E|X_i|^p<\infty \) \((i=1,2,\ldots ,n)\). Subsequently, there exists \(C(p)>0\) such that

$$\begin{aligned} E\left| \sum \limits _{i=1}^nX_i\right| ^p\le C(p)\left\{ \sum \limits _{i=1}^nE|X_i|^pI_{\{p>2\}}+\left( \sum \limits _{i=1}^n EX_{i}^{2}\right) ^{\frac{p}{2}}\right\} . \end{aligned}$$

Next, the following lemma is established, which is important for the proof of Theorem 1.1.

Lemma 2.1

Let \(\varphi \) be t regular with \(t\ge d\) and \(\hat{\alpha }_{jk}\) be defined in (1.2). Then for \(f\in L^{\infty }(M)\) with \(M>0\) and \(2^{j}\le n\),

$$\begin{aligned} E|\hat{\alpha }_{jk}-\alpha _{jk}|^{p}\lesssim n^{-\frac{p}{2}}2^{jdp}, \end{aligned}$$

where the constant in \(``\lesssim "\) only depends on \(\varphi \) and M.

Proof

According to the definition of \(\hat{\alpha }_{jk}\), one has \(E\hat{\alpha }_{jk}=\alpha _{jk}\) and

$$\begin{aligned} E|\hat{\alpha }_{jk}-\alpha _{jk}|^{p} = \frac{1}{n^{p}}E\left| \sum _{i=1}^{n} \Big \{[\varphi _{jk}]^{(d)}(X_{i})-E[\varphi _{jk}]^{(d)}(X_{i})\Big \}\right| ^{p} = \frac{1}{n^{p}}E\left| \sum _{i=1}^{n}\eta _i\right| ^{p},~~~ \end{aligned}$$
(2.1)

where \(\eta _i:=[\varphi _{jk}]^{(d)}(X_{i})-E[\varphi _{jk}]^{(d)}(X_{i})\). Clearly, \(\{\eta _i\}_{i=1}^n\) are i.i.d. samples and \(E\eta _i=0,~i=1,\ldots ,n\).

On the other hand, for \(i=1,\ldots ,n,\)

$$\begin{aligned} E|\eta _i|^{2} \le E\left( [\varphi _{jk}]^{(d)}(X_{i})\right) ^{2} = 2^{j}2^{2jd}\int _{\mathbb {R}}[\varphi ^{(d)}(2^{j}x-k)]^{2}f(x)dx \lesssim 2^{2jd} \end{aligned}$$

and \(\Vert \eta _i\Vert _{\infty } \lesssim \Vert [\varphi _{jk}]^{(d)}\Vert _{\infty } \lesssim 2^{j(\frac{1}{2}+d)} \) by the regularity of \(\varphi \) and \(\Vert f\Vert _{\infty }\lesssim 1\). These with Rosenthal’s inequality and \(2^{j}\le n\) show that

$$\begin{aligned} & \hspace{-2pc}E\left| \sum _{i=1}^{n}\eta _i\right| ^{p}\lesssim \sum _{i=1}^{n}E|\eta _i|^{p}I_{\{p>2\}}+ \left( \sum _{i=1}^{n}E\eta _i^{2}\right) ^{\frac{p}{2}}\nonumber \\\lesssim & n^{\frac{p}{2}}2^{jdp}[(n^{-1}2^{j})^{\frac{p}{2}-1}I_{\{p>2\}}+1] \lesssim n^{\frac{p}{2}}2^{jdp}. \end{aligned}$$
(2.2)

Finally, the desired conclusion is concluded by (2.1) and (2.2). The proof is done. \(\square \)

We give the next lemma in order to prove Proposition 2.1.

Lemma 2.2

Let \(K_{j}(v,x):=(-1)^{d}\sum _{k\in \mathbb {Z}}[\varphi _{jk}]^{(d)}(v)\varphi _{jk}(x)\) and \(\varphi \) be t regular with \(t\ge d\ge 0\). Then for \(f\in L^{\infty }(M)\),

$$\begin{aligned} |K_{j}(v,x)|\le M_{1}2^{j(d+1)} ~~~~\text{ and }~~~~ E|K_{j}(X_{1},x)|^{2}\le M_{1}2^{j(2d+1)}, \end{aligned}$$

where \(M_1\ge 1\) is some constant.

Proof

By the definition of \(K_j(v,x)\), one finds easily that

$$\begin{aligned} |K_{j}(v,x)|=\left| 2^{j(d+1)}\sum _{k\in \mathbb {Z}}\varphi ^{(d)}(2^{j}v-k)\varphi (2^{j}x-k)\right| \le \Vert \Theta _{\varphi }\Vert _{\infty }\Vert \varphi ^{(d)}\Vert _{\infty }2^{j(d+1)} \end{aligned}$$
(2.3)

because of the regularity of \(\varphi \). On the other hand,

$$\begin{aligned} \int _\mathbb {R}|K_{j}(v,x)|dv\le & 2^{j(d+1)}\int _\mathbb {R} \sum _{k\in \mathbb {Z}} |\varphi ^{(d)}(2^{j}v-k)||\varphi (2^{j}x-k)| dv\\= & 2^{j(d+1)}\sum _{k\in \mathbb {Z}}|\varphi (2^{j}x-k)| \int _\mathbb {R} |\varphi ^{(d)}(2^{j}v-k)|dv\\\le & \Vert \Theta _{\varphi }\Vert _{\infty }\Vert \varphi ^{(d)}\Vert _{1} 2^{jd}. \end{aligned}$$

Furthermore,

$$\begin{aligned} E|K_{j}(X_{1},x)|^{2}\le & \Vert \Theta _{\varphi }\Vert _{\infty }\Vert \varphi ^{(d)}\Vert _{\infty }\Vert f\Vert _{\infty }2^{j(d+1)} \int _{\mathbb {R}}|K_{j}(v,x)|dv\nonumber \\\le & \Vert \Theta _{\varphi }\Vert _{\infty }^{2} \Vert \varphi ^{(d)}\Vert _{\infty }\Vert \varphi ^{(d)}\Vert _{1}M 2^{j(2d+1)}. \end{aligned}$$
(2.4)

Choosing \(M_1:=\max \{\Vert \Theta _{\varphi }\Vert _{\infty }\Vert \varphi ^{(d)}\Vert _{\infty }, ~\Vert \Theta _{\varphi }\Vert _{\infty }^{2} \Vert \varphi ^{(d)}\Vert _{\infty }\Vert \varphi ^{(d)}\Vert _{1}M,~1\}\), then it follows from (2.3)–(2.4) that the final conclusions. \(\square \)

To show Proposition 2.1, we need another well-known inequality.

Bernstein’s inequality [8]. Let \(\eta _{1},\ldots ,\eta _{n}\) be i.i.d. random variables with \(E\eta _{i}=0\), \(E\eta _{i}^{2}\le \sigma ^{2}\) and \(|\eta _{i}|\le M\) \((i=1,2,\ldots ,n)\). Then for any \(\epsilon >0\),

$$\begin{aligned} P\left\{ \left| \frac{1}{n}\sum _{i=1}^n\eta _{i}\right| \ge \epsilon \right\} \le 2\exp \Big \{-\frac{n\epsilon ^{2}}{2(\sigma ^{2}+M\epsilon /3)}\Big \}. \end{aligned}$$

Now, we introduce the first proposition which plays important roles in the proofs of Theorems 1.31.4.

Proposition 2.1

Let \(f\in L^{\infty }(M)\) and \(\varphi \) be t regular with \(t\ge d\ge 0\). Then for each \(x\in \mathbb {R}\) and \(p\in [1,\infty )\), there exists \(\lambda >6M_{1}^{2}p^{2}\) such that

$$\begin{aligned} E[\aleph _{n}(x)]^{p}\lesssim \left( \frac{\ln n}{n}\right) ^{\frac{p}{2}}, \end{aligned}$$

where \(\aleph _{n}(x)\) is given by (1.8) and \(M_1\ge 1\) is the constant in Lemma 2.2.

Proof

For each \(j\in \mathcal {H}\), one denotes

$$\begin{aligned} \overline{\tau _{n}(j)}:=\left( \frac{6M_{1}^{2}p2^{j(2d+1)}\lambda _j}{n}\right) ^{\frac{1}{2}}, \end{aligned}$$
(2.5)

where \(\lambda _{j}:=\max \{(2d+1)p j\ln 2,~1\}\). Note that the inequality \(\lambda \ln n \ge 6M_{1}^{2}p\lambda _j\) holds for large n, since \(\lambda >6M_{1}^{2}p^{2}\) and \(j\in \mathcal {H}\). Hence, \(\overline{\tau _{n}(j)}\le \tau _{n}(j)\) thanks to (1.5) and (2.5). Moreover,

$$\begin{aligned} \Big [|S_{n}(x,j)|-\tau _{n}(j)\Big ]_+ \le \Big [|S_{n}(x,j)|-\overline{\tau _{n}(j)}\Big ]_+. \end{aligned}$$
(2.6)

For any \(t\ge 0\),

$$\begin{aligned} P\left\{ \big [|S_{n}(x,j)|-\overline{\tau _{n}(j)}~\big ]_+>t\right\} =P\left\{ |S_{n}(x,j)|-\overline{\tau _{n}(j)}>t\right\} . \end{aligned}$$

Therefore,

$$\begin{aligned} E\Big [|S_{n}(x,j)|-\overline{\tau _{n}(j)}\Big ]_+^{p} =p\int _0^\infty t^{p-1} P\left\{ |S_{n}(x,j)|-\overline{\tau _{n}(j)}>t\right\} dt. \end{aligned}$$

This with variable substitution \(t=\omega \overline{\tau _{n}(j)}\) shows that

$$\begin{aligned} E\Big [|S_{n}(x,j)|-\overline{\tau _{n}(j)}\Big ]_+^{p}&\le p\int _{0}^{\infty }[\omega \overline{\tau _{n}(j)}]^{p-1}P\left\{ |S_{n}(x,j)|> \overline{\tau _{n}(j)}(\omega +1) \right\} \overline{\tau _{n}(j)} d\omega \nonumber \\&= p[\overline{\tau _{n}(j)}]^{p}\int _{0}^{\infty }\omega ^{p-1}P\left\{ |S_{n}(x,j)|> \overline{\tau _{n}(j)}(\omega +1)\right\} d\omega . \end{aligned}$$
(2.7)

On the other hand, it is easy to see that the estimator \(\widehat{f_{j}^{(d)}}(x)\) in (1.6) can be rewritten as

$$\begin{aligned} \widehat{f_{j}^{(d)}}(x)=\frac{1}{n}\sum _{i=1}^{n}K_{j}(X_{i},x), \end{aligned}$$

because \(K_{j}(v,x):=(-1)^{d}\sum _{k\in \mathbb {Z}}[\varphi _{jk}]^{(d)}(v)\varphi _{jk}(x)\) in Lemma 2.2. This with (1.7) and Lemma 2.2 implies that \(S_n(x,j) =\frac{1}{n}\sum _{i=1}^{n}[K_{j}(X_{i},x)-EK_{j}(X_{i},x)]\) and

$$\begin{aligned} |K_{j}(X_{i},x)|\le M_{1}2^{j(d+1)},~~ E|K_{j}(X_{i},x)|^{2}\le M_{1}2^{j(2d+1)}. \end{aligned}$$

Furthermore,

$$\begin{aligned} & P\left\{ |S_{n}(x,j)|> \overline{\tau _{n}(j)}(\omega +1)\right\} \nonumber \\ & \quad \le 2\exp \left\{ -\frac{n[\overline{\tau _{n}(j)}]^{2}(\omega +1)^{2}}{2[M_{1}2^{j(2d+1)}+2M_{1}2^{j(d+1)}\overline{\tau _{n}(j)}(\omega +1)/3]} \right\} \end{aligned}$$
(2.8)

thanks to Bernstein’s inequality.

For \(j\in \mathcal {H}\), \(\overline{\tau _{n}(j)}=\Big (\frac{6M_{1}^{2}p2^{j(2d+1)}\lambda _{j}}{n}\Big )^{\frac{1}{2}} \le 3M_{1}p\) holds for large n. Thus,

$$\begin{aligned} 2[M_{1}2^{j(2d+1)}+2M_{1}2^{j(d+1)}\overline{\tau _{n}(j)}(\omega +1)/3] \le 6M_{1}^{2}p2^{j(2d+1)}(\omega +1) \end{aligned}$$

due to \(M_{1},p\ge 1\) and \(\omega >0\). Substituting this above estimate into (2.8), one obtains that

$$\begin{aligned} P\left\{ |S_{n}(x,j)|> \overline{\tau _{n}(j)}(\omega +1)\right\} \le 2e^{-\lambda _j(\omega +1)}. \end{aligned}$$

Then it follows from \(\lambda _{j}=\max \{(2d+1)p j\ln 2,~1\}\ge 1\) that

$$\begin{aligned} P\left\{ |S_{n}(x,j)|> \overline{\tau _{n}(j)}(\omega +1)\right\} \le 2e^{-\lambda _j\omega }e^{-\lambda _j} \le 2e^{-\omega }e^{-\lambda _j}. \end{aligned}$$

Combining this with (2.7) and \(\overline{\tau _{n}(j)}:=\left( \frac{6M_{1}^{2}p2^{j(2d+1)}\lambda _j}{n}\right) ^{\frac{1}{2}}\), one concludes that

$$\begin{aligned} E\Big [|S_{n}(x,j)|&-\overline{\tau _{n}(j)}\Big ]_+^{p} \lesssim [\overline{\tau _{n}(j)}]^{p} e^{-\lambda _j}\int _{0}^{\infty }\omega ^{p-1}e^{-\omega }d\omega \lesssim \left( \frac{6M_{1}^{2}p2^{j(2d+1)}\lambda _{j}}{n}\right) ^{\frac{p}{2}}e^{-\lambda _j}. \end{aligned}$$

Hence, according to \(\lambda _{j}\lesssim \ln n\) and \(e^{-\lambda _{j}}\le 2^{-(2d+1)p j}\), one knows

$$\begin{aligned} \sum _{j\in \mathcal {H}}E\Big [|S_{n}(x,j)|-\overline{\tau _{n}(j)}\Big ]_+^{p} \lesssim \sum _{j\in \mathcal {H}}\left( \frac{\ln n}{n}\right) ^{\frac{p}{2}} 2^{(d+\frac{1}{2})p j}2^{-(2d+1)p j} \lesssim \left( \frac{\ln n}{n}\right) ^{\frac{p}{2}}. \end{aligned}$$

This with (1.8) and (2.6) leads to

$$\begin{aligned} E[\aleph _{n}(x)]^{p} \lesssim E\sup _{j\in \mathcal {H}}\Big [|S_{n}(x,j)|-\overline{\tau _{n}(j)}\Big ]_+^{p} \lesssim \sum _{j\in \mathcal {H}}E\Big [|S_{n}(x,j)|-\overline{\tau _{n}(j)}\Big ]_+^{p} \lesssim \left( \frac{\ln n}{n}\right) ^{\frac{p}{2}}, \end{aligned}$$

which completes the proof. \(\square \)

To introduce Proposition 2.2, we also need the following notations:

$$\begin{aligned} \mathfrak {M}(x,f):= & \inf _{j\in \mathcal {H}}\{B_{j}^{*}(x,f)+\tau _{n}(j)\},\end{aligned}$$
(2.9)
$$\begin{aligned} \Lambda _{m}:= & \{x\in [-T,T],~2^{m}\delta _n<\mathfrak {M}(x,f)\le 2^{m+1}\delta _n\}, \end{aligned}$$
(2.10)

where \(\delta _n=(\frac{C\ln n}{n})^{\frac{s}{2s+2d+1}}\), \(C>1\) is some constant and \(T>0\) is defined by (1.1).

Note that \(\mathfrak {M}(x,f)\le c_0:=\sup _x\mathfrak {M}(x,f)\), if \(\varphi \) is t regular and \(\Vert f^{(d)}\Vert _{\infty }\lesssim 1\). Then there exists

$$\begin{aligned} m_2:=\min \{m\in \mathbb {Z},~2^{m}\delta _n\ge c_0\} \end{aligned}$$
(2.11)

such that \(\Lambda _{m}=\emptyset \) for each \(m>m_{2}\). Obviously, \(m_{2}>0\) for large n.

Next, another useful proposition is provided which is one of the main ingredients in the proof of Theorem 1.4.

Proposition 2.2

Let \(f\in B_{r,q}^{s+d}(M)\) and \(\varphi \) be t regular with \(t\ge d\ge 0\). Then for \(m\in \mathbb {Z}\) satisfying \(0\le m\le m_2\) and each \(p\in [1,\infty )\),

$$\begin{aligned} Q_m:=\int _{\Lambda _{m}}[\mathfrak {M}(x,f)]^pdx\lesssim 2^{m(p-r-\frac{2sr}{2d+1})}\delta _n^{p}; \end{aligned}$$

Moreover, if \(s>\frac{1}{r}\) and \(r\le p\), then with \(s':=s-\frac{1}{r}+\frac{1}{p}\),

$$\begin{aligned} Q_m=\int _{\Lambda _{m}}[\mathfrak {M}(x,f)]^pdx\lesssim 2^{-\frac{2ms'p}{2d+1}}\delta _n^{\frac{s'}{s}p}, \end{aligned}$$

where \(\mathfrak {M}(x,f)\) and \(\Lambda _m\) are defined in (2.9)–(2.10) respectively.

Proof

The proof is similar to the second part of Proposition 3.2 in Ref. [2]. Here, one provides only some important steps to prove this proposition.

Take \(j_2\) satisfying \(c_12^{\frac{2m}{2d+1}}\delta _n^{-\frac{1}{s}}\le 2^{j_{2}}\le c_22^{\frac{2m}{2d+1}}\delta _n^{-\frac{1}{s}}\), where two positive constants \(c_1,c_2\) satisfy \( (2\,M)^{\frac{1}{s}}I_{\{r=\infty \}}<c_1<c_2< \min \left\{ \frac{C}{4c_0^{2}},~\frac{C}{4\lambda }\right\} ^{\frac{1}{2d+1}}.\) Then \(j_{2}\in \mathcal {H}\) and \(\tau _{n}(j_2)\le 2^{m-1}\delta _n\) for large n and \(0<m\le m_2\).

Clearly, by \(\Lambda _m=\{x\in [-T,T],~2^{m}\delta _n <\mathfrak {M}(x,f)\le 2^{m+1}\delta _n\}\),

$$\begin{aligned} Q_m=\int _{\Lambda _m}[\mathfrak {M}(x,f)]^pdx\le (2^{m+1}\delta _n)^p|\Lambda _m|, \end{aligned}$$
(2.12)

where \(|\Lambda _m|\) stands for the Lebesgue measure of the set \(\Lambda _m\). On the other hand,

$$\begin{aligned} |\Lambda _m| \le |\{x\in [-T,T],~B_{j_{2}}^{*}(x,f)>2^{m-1}\delta _n\}|. \end{aligned}$$
(2.13)

When \(1\le r<\infty \), according to Chebyshev’s inequality, (1.8), (2.13) and \(f\in B_{r,q}^{s+d}(M)\), one has

$$\begin{aligned} |\Lambda _m|\le & \sum _{j\in \mathcal {H},j\ge j_{2}}|\{x\in [-T,T],~B_{j}(x,f)>2^{m-1}\delta _n\}|\nonumber \\\le & \sum _{j\in \mathcal {H},j\ge j_{2}}\frac{\Vert B_{j}(\cdot ,f)\Vert _r^r}{(2^{m-1}\delta _n)^r} \lesssim 2^{-mr}\delta _n^{-r}2^{-j_{2}sr}. \end{aligned}$$
(2.14)

Substituting (2.14) into (2.12), one obtains that

$$\begin{aligned} Q_m\lesssim (2^{m+1}\delta _n)^{p}2^{-mr}\delta _n^{-r}2^{-j_{2}sr}\lesssim 2^{m(p-r)}\delta _n^{p-r}2^{-{j_2}sr}\lesssim 2^{m(p-r-\frac{2sr}{2d+1})}\delta _n^{p}\quad \end{aligned}$$

due to \(2^{j_{2}}\thicksim 2^{\frac{2m}{2d+1}}\delta _{n}^{-\frac{1}{s}}\).

For the case \(r=\infty \), it follows from \(f\in B_{r,q}^{s+d}(M)\) and \(m>0\) that \( B_{j_{2}}^{*}(x,f)= \sup _{j'\ge j_2}B_{j'}(x,f)\le M 2^{-j_{2}s}\le Mc_1^{^{-s}} 2^{-\frac{2ms}{2d+1}}\delta _n \le 2^{m-1}\delta _n\) thanks to the choice of \(2^{j_{2}}\ge c_12^{\frac{2m}{2d+1}}\delta _{n}^{-\frac{1}{s}}\) with \(c_1>(2M)^{\frac{1}{s}}\). Thus, \(|\Lambda _{m}|=0\) because of (2.13). Furthermore, it reduces to \( Q_m\le (2^{m+1}\delta _n)^p|\Lambda _{m}|=0 \) by (2.12).

Finally, one discusses the case of \(s>\frac{1}{r}\) and \(r\le p\). Note that \(f^{(d)}\in B_{r,q}^{s}\hookrightarrow B_{p,q}^{s'}\) with \(s'=s-\frac{1}{r}+\frac{1}{p}\). Similar to (2.14),

$$\begin{aligned} |\Lambda _m| \le \sum _{j\in \mathcal {H},j\ge j_{2}}\frac{\Vert B_{j}(\cdot ,f)\Vert _p^p}{(2^{m-1}\delta _n)^p} \lesssim 2^{-mp}\delta _n^{-p}2^{-j_{2}s'p}. \end{aligned}$$

This with (2.12) and \(2^{j_{2}}\thicksim 2^{\frac{2m}{2d+1}}\delta _{n}^{-\frac{1}{s}}\) implies that

$$\begin{aligned} Q_m \lesssim (2^{m+1}\delta _n)^{p}2^{-mp}\delta _n^{-p}2^{-j_{2}s'p} \lesssim 2^{-j_2s'p} \lesssim 2^{-\frac{2ms'p}{2d+1}}\delta _n^{\frac{s'}{s}p}. \end{aligned}$$

The proof is done. \(\square \)

3 Proofs of Theorems 1.11.4

This section is devoted to give the proofs of Theorems 1.11.4.

Proof of Theorem 1.1

. By the definition of \(\widehat{f^{(d)}_{j^{*}}}(x)\) and \(E\hat{\alpha }_{j^{*}k}=\alpha _{j^{*}k}\), it is clear to see that

$$\begin{aligned} E\Big |\widehat{f^{(d)}_{j^{*}}}(x)-E\widehat{f^{(d)}_{j^{*}}}(x)\Big |^{p} =E\left| \sum \limits _k(\hat{\alpha }_{j^{*}k}-\alpha _{j^{*}k}) \varphi _{j^{*}k}(x)\right| ^{p} \end{aligned}$$

Moreover, it follows from the Hölder inequality with \(\frac{1}{p}+\frac{1}{p'}=1\) \((p>1)\) that

$$\begin{aligned}&\quad E\Big |\widehat{f^{(d)}_{j^{*}}}(x)-E\widehat{f^{(d)}_{j^{*}}}(x)\Big |^{p} \le E\sum \limits _k|\hat{\alpha }_{j^{*}k}-\alpha _{j^{*}k}|^{p}|\varphi _{j^{*}k}(x)| \left[ \sum \limits _k|\varphi _{j^{*}k}(x)|\right] ^{\frac{p}{p'}}\nonumber \\&\le n^{-\frac{p}{2}}2^{j^{*}dp}\left[ \sum \limits _k|\varphi _{j^{*}k}(x)|\right] ^{1+\frac{p}{p'}} = n^{-\frac{p}{2}}2^{j^{*}dp}\left[ \sum \limits _k|\varphi _{j^{*}k}(x)|\right] ^{p} \lesssim 2^{j^{*}p(d+\frac{1}{2})}n^{-\frac{p}{2}} \end{aligned}$$
(3.1)

thanks to Lemma 2.1. When \(p=1\), the above estimate can be concluded directly without using the Hölder inequality.

On the other hand, Lemma 1.1 leads to \( \sup \limits _{x\in \Omega _{x_{0}}}\sup \limits _{f\in H^{s+d}(\Omega _{x_{0}},M)} \Big |E\widehat{f^{(d)}_{j^{*}}}(x)-f^{(d)}(x)\Big |^{p}\lesssim 2^{-j^{*}sp}. \) This with (3.1) and \(2^{j^{*}}\thicksim n^{\frac{1}{2s+2d+1}}\) shows

$$\begin{aligned} & \sup \limits _{x\in \Omega _{x_{0}}} \sup \limits _{f\in H^{s+d}(\Omega _{x_{0}},M)\cap L^{\infty }(M)} E\Big |\widehat{f^{(d)}_{j^{*}}}(x)-f^{(d)}(x)\Big |^{p}\\ & \quad \lesssim \sup \limits _{x\in \Omega _{x_{0}}}\sup \limits _{f\in H^{s+d}(\Omega _{x_{0}},M)\cap L^{\infty }(M)} \Big [ E\left| \widehat{f^{(d)}_{j^{*}}}(x)-E\widehat{f^{(d)}_{j^{*}}}(x)\right| ^{p} +\left| E\widehat{f^{(d)}_{j^{*}}}(x)-f^{(d)}(x)\right| ^{p} \Big ]\\ & \quad \lesssim 2^{j^{*}p(d+\frac{1}{2})}n^{-\frac{p}{2}}+2^{-j^{*}sp}\lesssim n^{-\frac{sp}{2s+2d+1}}. \end{aligned}$$

The proof is completed. \(\square \)

Proof of Theorem 1.2

. According to (1.3) and (1.5), one obtains that

$$\begin{aligned} \Big |\widehat{f_{j\wedge j_0}^{(d)}}(x)-\widehat{f_{j_{0}}^{(d)}}(x)\Big | \le \widehat{R}_{j}(x)+\tau _{n}(j\wedge j_0) +\tau _{n}(j_{0}) \le \widehat{R}_{j}(x)+2\tau _{n}(j_{0}). \end{aligned}$$
(3.2)

The same arguments as (3.2) implies

$$\begin{aligned} \Big |\widehat{f_{j_{0}\wedge j}^{(d)}}(x)-\widehat{f_{j}^{(d)}}(x)\Big | \le \widehat{R}_{j_{0}}(x)+2\tau _{n}(j). \end{aligned}$$
(3.3)

Moreover, combining (3.2) and (3.3), one concludes

$$\begin{aligned} \Big |\widehat{f_{j_{0}\wedge j}^{(d)}}(x)-\widehat{f_{j_{0}}^{(d)}}(x)\Big |+\Big |\widehat{f_{j_{0}\wedge j}^{(d)}}(x)-\widehat{f_{j}^{(d)}}(x)\Big | \le 2\widehat{R}_{j}(x)+4\tau _{n}(j) \end{aligned}$$
(3.4)

due to \(\widehat{f_{j_{0}\wedge j}^{(d)}}=\widehat{f_{j\wedge j_{0}}^{(d)}}\) and the selection of \(j_0\) in (1.4).

Clearly, by (1.8), \(|S_n(x,j)|\le [|S_n(x,j)|-\tau _{n}(j)]_++\tau _{n}(j)\le \aleph _{n}(x)+\tau _{n}(j).\) This with (1.7) and (1.8) shows that

$$\begin{aligned} \Big |\widehat{f_{j}^{(d)}}(x)-f^{(d)}(x)\Big |\le B_{j}(x,f)+|S_{n}(x,j)|\le B_{j}^{*}(x,f)+\aleph _{n}(x)+\tau _{n}(j). \end{aligned}$$
(3.5)

On the other hand, by using (1.3) and (1.7),

$$\begin{aligned} \widehat{R}_{j}(x)= & \sup _{j'\in \mathcal {H}}\Big [|\widehat{f_{j\wedge j'}^{(d)}}(x)-\widehat{f_{j'}^{(d)}}(x)|-\tau _{n}(j\wedge j')-\tau _{n}(j')\Big ]_{+}\\\le & \sup _{j'\in \mathcal {H}}\Big [|E\widehat{f_{j\wedge j'}^{(d)}}(x)-E\widehat{f_{j'}^{(d)}}(x)|\\ & \quad +|S_{n}(x,j\wedge j')|-\tau _{n}(j\wedge j')+|S_{n}(x,j')|-\tau _{n}(j')\Big ]_{+}. \end{aligned}$$

This with \(\sup _{j'\in \mathcal {H}} |E\widehat{f_{j\wedge j'}^{(d)}}(x)-E\widehat{f_{j'}^{(d)}}(x)| \le \sup _{\{j'\in \mathcal {H},~j'\ge j\}}\{B_{j\wedge j'}(x,f)+B_{j'}(x,f)\}\) and (1.8) leads to

$$\begin{aligned} \widehat{R}_{j}(x)\le 2B_{j}^{*}(x,f)+2\aleph _{n}(x). \end{aligned}$$
(3.6)

Hence, it follows from (3.4)–(3.6) that

$$\begin{aligned} \Big |\widehat{f_{j_{0}}^{(d)}}(x)-f^{(d)}(x)\Big |\le & \Big |\widehat{f_{j_{0}\wedge j}^{(d)}}(x)-\widehat{f_{j_{0}}^{(d)}}(x)\Big | +\Big |\widehat{f_{j_{0}\wedge j}^{(d)}}(x)-\widehat{f_{j}^{(d)}}(x)\Big | +\Big |\widehat{f_{j}^{(d)}}(x)-f^{(d)}(x)\Big |\\\le & 5B_{j}^{*}(x,f)+5\aleph _{n}(x)+5\tau _{n}(j) \end{aligned}$$

holds for each \(j\in \mathcal {H}\). Furthermore,

$$\begin{aligned} \Big |\widehat{f_{n}^{(d)}}(x)-f^{(d)}(x)\Big | =\Big |\widehat{f_{j_0}^{(d)}}(x)-f^{(d)}(x)\Big |\le \inf _{j\in \mathcal {H}}\left\{ 5B_{j}^{*}(x,f)+5\tau _{n}(j)\right\} +5\aleph _{n}(x) \end{aligned}$$

thanks to \(\widehat{f_{n}^{(d)}}(x)=\widehat{f_{j_0}^{(d)}}(x)\) in (1.6). Hence, Theorem 1.2 is proved. \(\square \)

Proof of Theorem 1.3

. Take \(j_{1}\) satisfying \(2^{j_{1}}\thicksim (\frac{n}{\ln n})^{\frac{1}{2s+2d+1}}\). Then \(j_{1}\in \mathcal {H}\) for large n and \(s>0\). Moreover, Theorem 1.2 yields that

$$\begin{aligned} E\Big |\widehat{f_{n}^{(d)}}(x)-f^{(d)}(x)\Big |^{p} \lesssim [B_{j_{1}}^{*}(x,f)]^{p}+[\tau _{n}(j_{1})]^{p}+E[\aleph _{n}(x)]^{p} \end{aligned}$$
(3.7)

holds for any \(x\in \Omega _{x_{0}}\).

By (1.5) and the given choice \(2^{j_{1}}\thicksim (\frac{n}{\ln n})^{\frac{1}{2s+2d+1}}\), one finds easily

$$\begin{aligned} [\tau _{n}(j_{1})]^{p}+E[\aleph _{n}(x)]^{p}\lesssim \Big (\frac{\ln n}{n}\Big )^{\frac{sp}{2s+2d+1}} \end{aligned}$$
(3.8)

due to Proposition 2.1. On the other hand, (1.7)–(1.8) and Lemma 1.1 lead to

$$\begin{aligned} [B_{j_{1}}^{*}(x,f)]^{p}: =\left[ \sup _{j'\in \mathcal {H},~j'\ge j_{1}}B_{j'}(x,f)\right] ^{p} =\left[ \sup _{j'\in \mathcal {H},~j'\ge j_{1}}|P_{j'}f^{(d)}(x)-f^{(d)}(x)|\right] ^{p} \lesssim 2^{-j_{1}sp} \end{aligned}$$

holds for any \(x\in \Omega _{x_{0}}\) and \(f\in H^{s+d}(\Omega _{x_{0}},M)\). This with the choice \(2^{j_{1}}\thicksim (\frac{n}{\ln n})^{\frac{1}{2s+2d+1}}\) implies that

$$\begin{aligned} \sup \limits _{x\in \Omega _{x_{0}}} \sup \limits _{f\in H^{s+d}(\Omega _{x_{0}},M)} [B_{j_{1}}^{*}(x,f)]^{p} \lesssim 2^{-j_{1}sp} \lesssim \Big (\frac{\ln n}{n}\Big )^{\frac{sp}{2s+2d+1}}. \end{aligned}$$
(3.9)

Finally, the desired conclusion can be concluded from (3.7)–(3.9). The proof is finished. \(\square \)

Proof of Theorem 1.4

. Recall that \(\Lambda _{m}=\{x\in [-T,T],~2^{m}\delta _n<\mathfrak {M}(x,f)\le 2^{m+1}\delta _n\}\) due to (2.10). Define \(\Lambda _{0}^{-}:=\{x\in [-T,T],~\mathfrak {M}(x,f)\le \delta _{n}\}\) with \(\delta _{n}=(\frac{C\ln n}{n})^{\frac{s}{2s+2d+1}}\). Then for each \(p\in [1,\infty )\),

$$\begin{aligned} E\Vert \widehat{f_{n}^{(d)}}I_{[-T,T]}-f^{(d)}\Vert _{p}^{p}= & E\int _{-T}^{T}\Big |\widehat{f_{n}^{(d)}}(x)-f^{(d)}(x)\Big |^{p}dx\nonumber \\\lesssim & \int _{-T}^{T}[\mathfrak {M}(x,f)]^{p}dx+\int _{-T}^{T}E[\aleph _{n}(x)]^{p}dx\nonumber \\\lesssim & \int _{\Lambda _{0}^{-}}[\mathfrak {M}(x,f)]^{p}dx+\nonumber \\ & \sum _{m=0}^{m_{2}}\int _{\Lambda _{m}}[\mathfrak {M}(x,f)]^{p}dx +\Big (\frac{\ln n}{n}\Big )^{\frac{p}{2}}\nonumber \\\lesssim & \sum _{m=0}^{m_{2}}Q_{m}+ \delta _n^{p} \end{aligned}$$
(3.10)

thanks to \(supp~f\subset [-T,T]\), Theorem 1.2, (2.9) and Proposition 2.1.

To complete the proof, one divides (3.10) into three regions. Recall that \(2^{m_{2}}\thicksim \delta _n^{-1}\) and \(\delta _n\thicksim (\frac{\ln n}{n})^{\frac{s}{2s+2d+1}}\) by (2.10)–(2.11). By Proposition 2.2, the following estimations are established.

(i). For \(1\le p<\frac{2sr}{2d+1}+r\),

$$\begin{aligned} \sum _{m=0}^{m_{2}}Q_{m}+ \delta _n^{p} \lesssim \delta _n^{p} \lesssim \Big (\frac{\ln n}{n}\Big )^{\frac{sp}{2s+2d+1}}. \end{aligned}$$
(3.11)

(ii). For \(p\ge \frac{2sr}{2d+1}+r\),

$$\begin{aligned} \sum _{m=0}^{m_{2}}Q_{m}+ \delta _n^{p} \lesssim 2^{m_{2}(p-r-\frac{2sr}{2d+1})}\delta _n^{p}+ \delta _n^{p} \lesssim \Big (\frac{\ln n}{n}\Big )^{\frac{sr}{2d+1}}. \end{aligned}$$
(3.12)

(iii). For the case \(p\ge \frac{2sr}{2d+1}+r\) and \(s>\frac{1}{r}\), take \(m_1\in \mathbb {Z}\) satisfying

$$\begin{aligned} 2^{m_{1}}\thicksim \delta _n^{\frac{s'p(\frac{1}{s}-\frac{1}{s'})}{(\frac{2s'}{2d+1}+1)p-\frac{2sr}{2d+1}-r}}. \end{aligned}$$
(3.13)

Clearly, \(0<m_1<m_2\) due to \(r<p,~p\ge \frac{2sr}{2d+1}+r\) and \(s>\frac{1}{r}\). Therefore,

$$\begin{aligned} \sum _{m=0}^{m_{2}}Q_{m}+ \delta _n^{p}\le & \sum _{m=0}^{m_1}Q_{m}+\sum _{m=m_1} ^{m_{2}}Q_{m}+ \delta _n^{p}\\\lesssim & 2^{m_{1}(p-r-\frac{2sr}{2d+1})}\delta _n^{p}+ 2^{-\frac{2m_{1}s'p}{2d+1}}\delta _n^{\frac{s'}{s}p}+ \delta _n^{p}. \end{aligned}$$

This with (3.13), \(\delta _n\thicksim (\frac{\ln n}{n})^{\frac{s}{2s+2d+1}}\) and \(s'=s-\frac{1}{r}+\frac{1}{p}\) tells that

$$\begin{aligned} \sum _{m=0}^{m_{2}}Q_{m}+ \delta _n^{p} \lesssim \Big (\frac{\ln n}{n}\Big )^{\frac{s'p}{2(s-\frac{1}{r})+2d+1}}. \end{aligned}$$

Finally, the desired conclusion follows from (3.10)–(3.12), which completes the proof. \(\square \)