1 Introduction

Density estimation has a long history [1, 3]. In 1996, Donoho et al. [4] established an adaptive and optimal estimate (up to a logarithmic factor) for compactly supported density functions on \({\mathbb {R}}^{1}\) with \(L^{p}\) risk (\(1\le p<\infty \)) over Besov spaces by using a non-linear wavelet estimator.

It is quite remarkable that if the assumption that the underlying density has compact support is disappearance, then the minimax behavior becomes completely different. In particular, Kerkyacharian and Picard [10] defined a linear estimator by an orthogonal scaling function and discussed the convergence rates of \(L^{p}\) risk for \(1\le p<\infty \) over one-dimensional Besov spaces in 1992. Although their density functions do not have compact support, the above estimation is non-adaptive and needs an additional condition (Condition N) for \(1\le p<2\).

How about adaptive estimation for un-compactly supported density functions? Juditsky and Lambert-Lacroix [9] studied the optimal convergence rates of \(L^{p}\) risk (\(1\le p<\infty \)) by using a biothogonal wavelet estimator, which density functions are in one-dimensional Hölder spaces. Seven years later, the \(L^{2}\) risk estimation in one-dimensional Besov spaces was investigated [18]. In 2014, Goldenshluger and Lepski [5] addressed this problem on \({\mathbb {R}}^{d}\) with \(L^{p}\) risk (\(1< p<\infty \)) over anistropic Nikol’skii classes. They constructed an adaptive estimator based on a data-driven selection rule from a fixed family of kernel estimators, and there are four different regions (convergence rates) with respect to the minimax behavior which is (nearly) optimal.

Compared with kernel estimators, the wavelet ones provide more local information which are effective for the estimation of density function with cusps, because they have the properties of time-frequency localization and multiresolution (see [2, 12,13,14]). Recently, this fact has been verified by numerical experiments in tables and figures in lots of literatures, including both density [7, 22] and regression estimates [6, 11]. What’s more, the fast wavelet algorithm is important in many practical fields and the algorithm advantage of wavelet is based on filter banks and Pyramid algorithm (see [15, 20, 21]).

In this current paper, we use the orthornormal scaling function to construct a data-driven estimator on isotropic Besov spaces and obtain the same upper bounds as Goldenshluger and Lepski [5]. Compared with their work, our auxiliary estimators are more concise. Furthermore, motivated by the work of Rebells [19], we provide another better convergence rates with density functions having independence hypothesis. It should be pointed out that this estimation reduces the dimension disaster effectively.

1.1 Wavelets and Besov Spaces

We begin with a classical concept in wavelet analysis. A multiresolution analysis (MRA, [17]) is a sequence of closed subspaces \(\{V_{j}\}_{j\in {\mathbb {Z}}}\) of the square integrable function space \(L^{2}({\mathbb {R}}^{d})\) satisfying the following properties:

  1. (i)

    .   \(V_{j}\subset V_{j+1}\), \(j\in {\mathbb {Z}}\);

  2. (ii)

    .   \(\overline{\bigcup _{j\in {\mathbb {Z}}} V_{j}}=L^{2}({\mathbb {R}}^{d})\) (the space \(\bigcup _{j\in {\mathbb {Z}}} V_{j}\) is dense in \(L^{2}({\mathbb {R}}^{d})\));

  3. (iii)

    .   \(f(2\cdot )\in V_{j+1}\) if and only if \(f(\cdot )\in V_{j}\) for each \(j\in {\mathbb {Z}}\);

  4. (iv)

    .   There exists \(\varphi \in L^{2}({\mathbb {R}}^{d})\) (scaling function) such that \(\{\varphi (\cdot -k),~k\in {\mathbb {Z}}^{d}\}\) forms an orthonormal basis of \(V_{0}=\overline{\mathrm {span}\{\varphi (\cdot -k),~k\in {\mathbb {Z}}^d\}}\).

When \(d=1\), a wavelet function \(\psi \) can be constructed from the scaling function \(\varphi \) in a simple way such that \(\{2^{j/2}\psi (2^{j}\cdot -k),~j,k\in {\mathbb {Z}}\}\) constitutes an orthonormal basis (wavelet basis) of \(L^{2}({\mathbb {R}})\). Examples include the Daubechies wavelets [8], which have compact supports in time domain. For \(d\ge 2\), the tensor product method gives an MRA \(\{V_{j}\}\) of \(L^{2}({\mathbb {R}}^{d})\) from one-dimensional MRA. In fact, with a scaling function \(\varphi \) of tensor products, we find \(2^{d}-1\) wavelet functions \(\psi ^{\ell }~(\ell =1,2,\cdots ,2^d-1)\) such that

$$\begin{aligned} \{2^{jd/2}\psi ^{\ell }(2^{j}\cdot -k),~~j\in {\mathbb {Z}},~k\in {\mathbb {Z}}^{d},~\ell =1,2,\cdots ,2^d-1\} \end{aligned}$$

constitutes an orthonormal basis (wavelet basis) of \(L^{2}({\mathbb {R}}^{d})\).

Let \(P_{j}\) be the orthogonal projection operator from \(L^{2}({\mathbb {R}}^{d})\) onto the scaling space \(V_{j}\) with the orthonormal basis \(\{\varphi _{jk}(\cdot )=2^{jd/2}\varphi (2^{j}\cdot -k),~k\in {\mathbb {Z}}^{d}\}\). Then for each \(f\in L^{2}({\mathbb {R}}^{d})\),

$$\begin{aligned} P_{j}f=\sum _{k\in {\mathbb {Z}}^{d}}\alpha _{jk}\varphi _{jk} \end{aligned}$$
(1.1)

with \(\alpha _{jk}{:}{=}\langle f,\varphi _{jk}\rangle \). Specially, when a scaling function \(\varphi \) is m-regular, the identity (1.1) holds in \(L^p({\mathbb {R}}^{d})\) for \(p\ge 1\) [8]. Here and after, m-regular means that \(\varphi \in C^{m}({\mathbb {R}}^{d})\) and \(|D^{\alpha }\varphi (x)|\le c_l(1+|x|^{2})^{-\frac{l}{2}}\) \((|\alpha |=0,1,\ldots ,m)\) for each \(l\in {\mathbb {Z}}\) and some independent positive constants \(c_l\). The Daubechies scaling function \(\underbrace{D_{2N}\times \cdots \times D_{2N}}_{d~\text {times}}\) with \(N>m+d\) is an example, and the tensor product of \(D_{2N}\) with large N is used in the whole paper.

One of advantages of wavelet bases is that they can characterize Besov spaces, which contain Hölder and \(L^{2}\)-Sobolev spaces as special examples. The next lemma provides an equivalent definition.

Lemma 1.1

([17]) Let \(\varphi \) be m-regular, \(\psi ^{\ell }~(\ell =1, 2, \cdots ,2^{d}-1)\) be the corresponding wavelets and \(f\in L^{r}({\mathbb {R}}^{d})\). If \(\alpha _{jk}{:}{=}\langle f,\varphi _{jk}\rangle \), \(\beta _{jk}^{\ell }=\langle f,\psi _{jk}^{\ell }\rangle \), \(r,q\in [1,\infty ]\) and \(0<s<m\), then the following assertions are equivalent:

  1. (i)

    .   \(f\in B^{s}_{r,q}({\mathbb {R}}^{d})\);

  2. (ii)

    .   \(\{2^{js}\Vert P_{j}f-f\Vert _{r}\}\in l_{q};\)

  3. (iii)

    .   \(\{2^{j(s-\frac{d}{r}+\frac{d}{2})}\Vert \beta _{j\cdot }\Vert _{l_r}\}\in l_{q}.\)

The Besov norm of f can be defined by

$$\begin{aligned} \Vert f\Vert _{B^{s}_{r,q}}{:}{=}\Vert \alpha _{j_{0}\cdot }\Vert _{l_r}+ \Vert (2^{j(s-\frac{d}{r}+\frac{d}{2})}\Vert \beta _{j\cdot }\Vert _{l_r})_{j\ge j_{0}}\Vert _{l_q}, \end{aligned}$$

where \(\Vert \alpha _{j_{0}\cdot }\Vert ^r_{l_r}{:}{=}\sum \limits _{k\in {\mathbb {Z}}^d}|\alpha _{j_0k}|^r\) and \(\Vert \beta _{j\cdot }\Vert _{l_r}^{r}=\sum \limits _{\ell =1}^{2^d-1}\sum \limits _{k\in {\mathbb {Z}}^{d}}| \beta ^{\ell }_{jk}|^{r}.\)

Moreover, Lemma 1.1 (i) and (ii) shows that \(\Vert P_jf-f\Vert _r\lesssim 2^{-js}\) holds for \(f\in B^{s}_{r,q}({\mathbb {R}}^{d})\). Here and throughout, the notations \(A\lesssim B\) denotes \(A\le cB\) with some fixed and independent constant \(c>0\); \(A\gtrsim B\) means \(B\lesssim A\); \(A\thicksim B\) stands for both \(A\lesssim B\) and \(A\gtrsim B\).

When \(r\le p\), Lemma 1.1 (i) and (iii) imply that with \(s'-\frac{d}{p}=s-\frac{d}{r}>0\),

$$\begin{aligned} B_{r,q}^s({\mathbb {R}}^d)\hookrightarrow B_{p,q}^{s'}({\mathbb {R}}^d), \end{aligned}$$

where \(A\hookrightarrow B\) stands for a Banach space A continuously embedded in another Banach space B. All these claims can be found in Ref. [23].

1.2 Wavelet Estimator and Selection Rule

It is well-known that the classical linear wavelet estimator is given by

$$\begin{aligned} {\widehat{f}}_{j}(x)=\sum _k{\widehat{\alpha }}_{jk}\varphi _{jk}(x) \end{aligned}$$

with \({\widehat{\alpha }}_{jk}{:}{=}\frac{1}{n}\sum _{i=1}^{n}\varphi _{jk}(X_{i})\). Moreover, the parameter \(j{:}{=}j(n)\) goes to infinity, as the sample size \(n\rightarrow \infty \). In general, it depends on the index s of unknown density function f and the estimator is non-adaptive [8, 10]. In this subsection, we give the selection rule of parameter j only depending on observations \(X_{1},\cdots ,X_{n}\), which is so called data-driven version.

Let \({\mathcal {H}}{:}{=}\left\{ 0,1,\cdots ,\lfloor \frac{1}{d}\log _2{\frac{n}{\ln n}}\rfloor \right\} \) with \(\lfloor a\rfloor \) denoting the largest integer smaller or equal to a and

$$\begin{aligned} \xi _{n}(x,j){:}{=}{\widehat{f}}_{j}(x)-E{\widehat{f}}_{j}(x) \end{aligned}$$
(1.2)

be the stochastic error of \({\widehat{f}}_{j}\). The most important step of the selection rule is to find a function \(U_n(x,j)\) such that the moments of random variables

$$\begin{aligned} v(x){:}{=}\sup _{j\in {\mathcal {H}}}\Big [|\xi _{n}(x,j)|-U_n(x,j)\Big ]_{+}\end{aligned}$$
(1.3)

are “small” for each \(x\in {\mathbb {R}}^{d}\), where \(a_{+}{:}{=}\max \{a,0\}\). According to Bernstein’s inequality in Sect. 3, the function \(U_n(x,j)\) can be defined by

$$\begin{aligned} U_{n}(x,j){:}{=}\sqrt{\frac{\lambda 2^{jd}\ln n}{n}\sigma _j(x)}+\frac{\lambda 2^{jd}\ln n}{n} \end{aligned}$$
(1.4)

with some constant \(\lambda >(5p+6)\Vert \Phi \Vert _{\infty }\). Moreover, this special choice of \(\lambda \) is used in Proposition 3.1, (3.15) and (4.1). Here and throughout,

$$\begin{aligned} \sigma _j(x){:}{=}\int _{{\mathbb {R}}^d}\Phi _{j}(x-t)f(t)dt=\int _{{\mathbb {R}}^d}2^{jd}\Phi [2^j(x-t)]f(t)dt \end{aligned}$$
(1.5)

with \(\Phi \in C_0({\mathbb {R}}^{d})\) satisfying \(\Phi \ge 0\) and

$$\begin{aligned} \left| \sum _k\varphi (x-k)\varphi (y-k)\right| \le \Phi (x-y), \end{aligned}$$
(1.6)

where \(C_0({\mathbb {R}}^{d})\) stands for the set of all compactly supported and continuous functions. Clearly, \(\sigma _j\in L^{1}({\mathbb {R}}^{d})\cap L^{\infty }({\mathbb {R}}^{d})\) holds for each \(j\in {\mathcal {H}}\), if \(f\in L^{\infty }({\mathbb {R}}^{d})\).

Note that \(U_n(x,j)\) depends on unknown density function f. Hence, we use a empirical counterpart \({\widehat{U}}_n(x,j)\) instead of that, i.e.,

$$\begin{aligned} {\widehat{U}}_{n}(x,j){:}{=}3\sqrt{\frac{\lambda 2^{jd}\ln n}{n}{\widehat{\sigma }}_j(x)}+\frac{3\lambda 2^{jd}\ln n}{n}, \end{aligned}$$
(1.7)

where \({\widehat{\sigma }}_{j}(x){:}{=}\frac{1}{n}\sum _{i=1}^n\Phi _{j}(x-X_{i})\). Then it is easy to find \(E{\widehat{\sigma }}_{j}(x)=\sigma _{j}(x).\)

Now, the selection rule of j would be shown as follows. For any \(x\in {\mathbb {R}}^{d}\), let

$$\begin{aligned}&{\widehat{R}}_{j}(x){:}{=}\sup _{j'\in {\mathcal {H}}}\left[ |{\widehat{f}}_{j\wedge j'}(x)-{\widehat{f}}_{j'}(x)|-{\widehat{U}}_{n}(x,j\wedge j') -{\widehat{U}}_{n}(x,j')\right] _{+},\nonumber \\&{\widehat{U}}_{n}^{*}(x,j){:}{=}\sup _{j'\in {\mathcal {H}}:j'\le j}{\widehat{U}}_{n}(x,j'). \end{aligned}$$
(1.8)

Here and after, \(a\wedge b{:}{=}\min \{a,b\}\) and \(a\vee b{:}{=}\max \{a,b\}\). Compared with the work of Goldenshluger and Lepski [5], the auxiliary estimator \({\widehat{f}}_{j\wedge j'}\) is more concise than theirs. Thus, the selection of \(j_{0}\) is given by

$$\begin{aligned} j_{0}=j_{0}(x)={\text {arginf}}_{j\in {\mathcal {H}}} \left[ {\widehat{R}}_{j}(x)+2{\widehat{U}}_{n}^{*}(x,j)\right] . \end{aligned}$$
(1.9)

Obviously, it only depends on the observation data \(X_1,\cdots ,X_n\) for any \(x\in {\mathbb {R}}^{d}\).

With \({\widehat{\alpha }}_{jk}=\frac{1}{n}\sum _{i=1}^{n}\varphi _{jk}(X_i)\) and \(j_0\) being given in (1.9), a data-driven wavelet estimator is shown by

$$\begin{aligned} {\widehat{f}}_{n,d}(x){:}{=}{\widehat{f}}_{j_{0}}(x)=\sum _k{\widehat{\alpha }}_{j_{0}k} \varphi _{j_{0}k}(x). \end{aligned}$$
(1.10)

Moreover, the estimator \({\widehat{f}}_{n,d}(x)\) is a Borel function thanks to the discrete set \({\mathcal {H}}\) and the continuity of \(\sum _k\varphi (x-k)\varphi (y-k)\) with \(\varphi =\underbrace{D_{2N}\times \cdots \times D_{2N}}_{d~\text {times}}\) for large N.

1.3 Main Results

We shall state main theorems of this paper and discuss relations to some other work in this subsection. For \(M>0\), the notation \(B_{r,q}^{s}(M)\) stands for a Besov ball, i.e.,

$$\begin{aligned} B_{r,q}^{s}(M){:}{=}\{f\in B_{r,q}^{s}({\mathbb {R}}^{d}),~f~\text{ is }~\text {density}~\text {function}~\text {and}~\Vert f\Vert _{B_{r,q}^{s}}\le M\}. \end{aligned}$$

Moreover, \(L^\infty (M)\) is defined by the way. Then the following theorem holds.

Theorem 1.1

Let \(0<s<m\) and \(r,q\in [1,\infty ]\). Then for \(p\in (1,\infty )\), the estimator \({\widehat{f}}_{n,d}\) in (1.10) satisfies

$$\begin{aligned} \sup _{f\in B_{r,q}^{s}(M)\cap L^{\infty }(M)}E\Vert {\widehat{f}}_{n,d}-f\Vert _{p}^{p}\lesssim \alpha _n(p,d)\Big (\frac{\ln n}{n}\Big )^{\beta (p,d) p}, \end{aligned}$$

where

$$\begin{aligned} \alpha _{n}(p,d){:}{=}\left\{ \begin{array}{rcl} \ln n,&{} ~~ &{} {p\le \frac{2sr+dr}{sr+d};}\\ 1,&{} ~~ &{} {otherwise,} \end{array} \right. \end{aligned}$$
(1.11)

and

$$\begin{aligned} \beta (p,d){:}{=}\left\{ \begin{array}{rcl} &{}\frac{s(1-\frac{1}{p})}{s+d-\frac{d}{r}}, ~~ &{}{p\le \frac{2sr+dr}{sr+d};} \\ &{}\frac{s}{2s+d},~~ &{}{\frac{2sr+dr}{sr+d}< p<\frac{2sr}{d}+r;} \\ &{}\frac{sr}{dp}, ~~&{}{p\ge \frac{2sr}{d}+r,~s\le \frac{d}{r};} \\ &{}\frac{s-\frac{d}{r}+\frac{d}{p}}{2(s-\frac{d}{r})+d}, ~~ &{}{p\ge \frac{2sr}{d}+r,~s>\frac{d}{r}}. \end{array} \right. \end{aligned}$$
(1.12)

Remark 1.1

When \(q=\infty \), Besov space \(B_{r,\infty }^s({\mathbb {R}}^d)\) reduces to Nikol’skii class \({\mathcal {N}}_r(s,{\mathbb {R}}^d)\) automatically. Then according to Theorem 3 of Goldenshluger and Lepski [5], the above estimation is optimal up to a logarithmic factor, since the lower bound estimation holds for all possible estimators including both kernel and wavelet ones.

Remark 1.2

For the case \(s>\frac{d}{r}\), the condition \(L^{\infty }(M)\) is not necessary because of \(B_{r,q}^{s}({\mathbb {R}}^d)\subset L^{\infty }({\mathbb {R}}^d)\) in this case [8]. On the other hand, the convergence rates in (1.11)–(1.12) with \(d=1\) and \(p=2\) coincide with Theorem 3 of Reynaud-Bouret et al. [18]; If \(d=1\) and \(r=q=\infty \), then \(B_{\infty ,\infty }^s({\mathbb {R}})=H^s({\mathbb {R}})\) and Theorem 4 of Juditsky et al. [9] can follows from the above theorem directly.

By a detail observation, the convergence exponents \(\beta (p,d)\) in Theorem 1.1 tend to zero as the dimension \(d\longrightarrow \infty \). Motivated by the work of Rebelles [19], we reduce the influence of the dimension and improve the convergence rates in Theorem 1.1 by the independence hypothesis of density functions.

As in Ref. [19], denote \({\mathcal {I}}_d{:}{=}\{1,\cdots ,d\}\). For a partition \({\mathcal {P}}\) of \({\mathcal {I}}_d\), a density function f has the independence structure \({\mathcal {P}}\), if

$$\begin{aligned} f(x)=\prod \limits _{I\in {\mathcal {P}}}f_{|I|}(x_{I}) \end{aligned}$$
(1.13)

with \(I=\{l_1,\cdots ,l_{|I|}\}\in {\mathcal {P}}\) and \(1\le l_1<\cdots <l_{|I|}\le d\). Here, \(x_I{:}{=}(x_{l_1},\cdots ,x_{l_{|I|}})\in {\mathbb {R}}^{|I|}\) and |I| denotes the cardinality of I. On the other hand, \( f\in B_{r,q}^{s}({\mathbb {R}}^{d},{\mathcal {P}}) \) if and only if \(f_{|I|}\in B_{r,q}^{s}({\mathbb {R}}^{|I|})\) for each \(I\in {\mathcal {P}}\); \(f\in L^{\infty }({\mathbb {R}}^{d},{\mathcal {P}})\) means \(f_{|I|}\in L^{\infty }({\mathbb {R}}^{|I|})\) for each \(I\in {\mathcal {P}}\). Furthermore, the following notations are needed:

$$\begin{aligned}&B_{r,q}^{s}(M,{\mathcal {P}}){:}{=}\{f\in B_{r,q}^{s}({\mathbb {R}}^{d},{\mathcal {P}}),~\Vert f_{|I|}\Vert _{B_{r,q}^{s}}\le M~\text{ for } \text{ any }~I\in {\mathcal {P}}\};\\&L^{\infty }(M,{\mathcal {P}}){:}{=}\{f\in L^{\infty }({\mathbb {R}}^{d},{\mathcal {P}}),~f_{|I|}\in L^{\infty }(M)~\text{ for } \text{ any }~I\in {\mathcal {P}}\}. \end{aligned}$$

For \(f_{|I|}\in B_{r,q}^{s}({\mathbb {R}}^{|I|})\), the corresponding wavelet estimator \({\widehat{f}}_{n,|I|}(x_{I})\) is given by (1.10). Then the estimator \({\widehat{f}}_{n,{\mathcal {P}}}\) for \(f\in B_{r,q}^{s}({\mathbb {R}}^{d},{\mathcal {P}})\) is defined by

$$\begin{aligned} {\widehat{f}}_{n,{\mathcal {P}}}(x)=\prod _{I\in {\mathcal {P}}}{\widehat{f}}_{n,|I|}(x_{I}). \end{aligned}$$
(1.14)

Next, we are in a position to introduce the most important result of this paper.

Theorem 1.2

Let \(0<s<m\) and \(r,q\in [1,\infty ]\). For any \(p\in (1,\infty )\),

$$\begin{aligned} \sup _{f\in B_{r,q}^{s}(M,{\mathcal {P}})\cap L^{\infty }(M,{\mathcal {P}})}E\Vert {\widehat{f}}_{n,{\mathcal {P}}}-f\Vert _{p}^{p}\lesssim \max _{I\in {\mathcal {P}}}\alpha _n(p,|I|)\Big (\frac{\ln n}{n}\Big )^{\beta (p,|I|) p}, \end{aligned}$$

where \(\alpha _n(p,|I|)\) and \(\beta (p,|I|)\) can be found in (1.11) and (1.12) respectively.

Remark 1.3

When \({\mathcal {P}}=\{\{1,\ldots ,d\}\}\), \(|I|=d\) and the result of Theorem 1.1 can be reached directly from Theorem 1.2. For another extreme case \({\mathcal {P}}=\{\{1\},\ldots ,\{d\}\}\), the convergence order dose not depend on the dimension d and the influence of the dimension on the accuracy of estimation is gone because of \(|I|=1\) in this case.

2 Oracle Inequality

In this section, we shall introduce a point-wise oracle inequality, which is one of main ingredients in later proofs. Let us begin with the following lemma.

Lemma 2.1

Let \({\mathcal {X}}_{j}(x)=\Big [|{\widehat{\sigma }}_{j}(x)-\sigma _{j}(x)|-U_{n}(x,j)\Big ]_+\) with \(j\in {\mathcal {H}}\). Then

$$\begin{aligned} \Big [{\widehat{U}}_{n}(x,j)-13U_{n}(x,j)\Big ]_+\le 2{\mathcal {X}}_{j}(x)\quad \text{ and }\quad \Big [U_{n}(x,j)-{\widehat{U}}_{n}(x,j)\Big ]_+\le {\mathcal {X}}_{j}(x), \end{aligned}$$

where \(U_{n}(x,j)\) and \({\widehat{U}}_{n}(x,j)\) are given by (1.4) and (1.7) respectively.

Proof

Define \({\mathcal {H}}_{0}{:}{=}\{j\in {\mathcal {H}},~\sigma _{j}(x)\ge 4\lambda 2^{jd}\frac{\ln n}{n}\}\). According to the definition of \({\mathcal {X}}_{j}(x)\),

$$\begin{aligned} |{\widehat{\sigma }}_{j}(x)-\sigma _{j}(x)|\le {\mathcal {X}}_{j}(x)+U_{n}(x,j). \end{aligned}$$

This with (1.4) and (1.7) leads to

$$\begin{aligned} |{\widehat{U}}_{n}(x,j)-3U_{n}(x,j)|= & {} \left| 3\sqrt{\frac{\lambda 2^{jd}\ln n}{n}}\left[ \sqrt{{\widehat{\sigma }}_{j}(x)}-\sqrt{\sigma _{j}(x)}\right] \right| \\= & {} \left| 3\sqrt{\frac{\lambda 2^{jd}\ln n}{n}}\frac{{\widehat{\sigma }}_{j}(x)-\sigma _{j}(x)}{\sqrt{{\widehat{\sigma }}_{j}(x)}+ \sqrt{\sigma _{j}(x)}}\right| \\\le & {} 3\sqrt{\frac{\lambda 2^{jd}\ln n}{n}}\frac{{\mathcal {X}}_{j}(x)+U_{n}(x,j)}{\sqrt{\sigma _{j}(x)}}.\end{aligned}$$

Then for any \(j\in {\mathcal {H}}_{0}\), the above inequality reduces to

$$\begin{aligned} |{\widehat{U}}_{n}(x,j)-3U_{n}(x,j)| \le \frac{3}{2}\sqrt{\sigma _{j}(x)}\frac{{\mathcal {X}}_{j}(x)+U_{n}(x,j)}{\sqrt{\sigma _{j}(x)}}\le \frac{3}{2}{\mathcal {X}}_{j}(x)+\frac{3}{2}U_{n}(x,j). \end{aligned}$$

Hence,

$$\begin{aligned} {\widehat{U}}_{n}(x,j)-3U_{n}(x,j)\le \frac{3}{2}{\mathcal {X}}_{j}(x)+\frac{3}{2}U_{n}(x,j)\end{aligned}$$

and

$$\begin{aligned} 3U_{n}(x,j)-{\widehat{U}}_{n}(x,j)\le \frac{3}{2}{\mathcal {X}}_{j}(x)+\frac{3}{2}U_{n}(x,j). \end{aligned}$$

Furthermore, by a simple calculation, one obtains that

$$\begin{aligned} \Big [{\widehat{U}}_{n}(x,j)-13U_{n}(x,j)\Big ]_+\le \Big [{\widehat{U}}_{n}(x,j)-\frac{9}{2}U_{n}(x,j)\Big ]_+\le \frac{3}{2}{\mathcal {X}}_{j}(x)\le 2{\mathcal {X}}_j(x)\end{aligned}$$

and

$$\begin{aligned} \Big [U_{n}(x,j)-{\widehat{U}}_{n}(x,j)\Big ]_+\le \Big [U_{n}(x,j)-\frac{2}{3}{\widehat{U}}_{n}(x,j)\Big ]_+\le {\mathcal {X}}_{j}(x). \end{aligned}$$

The desired conclusion is established for the case of \(j\in {\mathcal {H}}_{0}\).

It remains to show the case of \(j\in {\mathcal {H}}_{1}{:}{=}{\mathcal {H}}\backslash {\mathcal {H}}_{0}\). Clearly,

$$\begin{aligned} U_{n}(x,j)= \sqrt{\frac{\lambda 2^{jd}\ln n}{n}\sigma _{j}(x)}+\frac{\lambda 2^{jd}\ln n}{n}\le \frac{3\lambda 2^{jd}\ln n}{n}\end{aligned}$$
(2.1)

due to (1.4) and \(j\in {\mathcal {H}}_{1}\). This with \({\widehat{U}}_{n}(x,j)\ge \frac{3\lambda 2^{jd}\ln n}{n}\) in (1.7) implies

$$\begin{aligned} \Big [U_{n}(x,j)-{\widehat{U}}_{n}(x,j)\Big ]_+=0. \end{aligned}$$
(2.2)

On the other hand, according to the definition of \({\mathcal {X}}_{j}(x)\),

$$\begin{aligned} {\widehat{\sigma }}_{j}(x)\le \sigma _{j}(x)+{\mathcal {X}}_{j}(x)+U_{n}(x,j)\le \frac{7\lambda 2^{jd}\ln n}{n} +{\mathcal {X}}_{j}(x) \end{aligned}$$

thanks to \(j\in {\mathcal {H}}_{1}\) and (2.1). This with \(\sqrt{a+b}\le \sqrt{a}+\sqrt{b}\) shows that

$$\begin{aligned} {\widehat{U}}_{n}(x,j):= & {} 3\sqrt{\frac{\lambda 2^{jd}\ln n}{n}{\widehat{\sigma }}_{j}(x)}+\frac{3\lambda 2^{jd}\ln n}{n}\\\le & {} 3\sqrt{\frac{\lambda 2^{jd}\ln n}{n}{\mathcal {X}}_{j}(x)}+(3\sqrt{7}+3)\frac{\lambda 2^{jd}\ln n}{n}. \end{aligned}$$

Combining it with \(\sqrt{ab}\le \frac{a+b}{2}\) and \(U_{n}(x,j)\ge \frac{\lambda 2^{jd}\ln n}{n}\) in (1.4), one knows

$$\begin{aligned} {\widehat{U}}_{n}(x,j)\le \frac{3}{2}{\mathcal {X}}_{j}(x)+(3\sqrt{7}+\frac{9}{2})\frac{\lambda 2^{jd}\ln n}{n} \le \frac{3}{2}{\mathcal {X}}_{j}(x)+13U_{n}(x,j). \end{aligned}$$

Then it follows that

$$\begin{aligned} \Big [{\widehat{U}}_{n}(x,j)-13U_{n}(x,j)\Big ]_+\le 2{\mathcal {X}}_{j}(x). \end{aligned}$$
(2.3)

Hence, the lemma also holds for the case of \(j\in {\mathcal {H}}_{1}\) thanks to (2.2) and (2.3). The proof is done. \(\square \)

To state the point-wise oracle inequality, let \(B_j(x,f)\) be the bias of the estimator \(\widehat{f}_{j}(x)\), i.e.,

$$\begin{aligned} B_{j}(x,f){:}{=}|E{\widehat{f}}_{j}(x)-f(x)|=|P_jf(x)-f(x)|, \end{aligned}$$
(2.4)

and define

$$\begin{aligned} B_{j}^{*}(x,f){:}{=}\sup _{j'\in {\mathcal {H}},~j'\ge j}B_{j'}(x,f)\quad \text{ and }\quad U_{n}^{*}(x,j){:}{=}\sup _{j'\in {\mathcal {H}},~j'\le j}U_{n}(x,j'), \end{aligned}$$
(2.5)

where \(P_j\) and \(U_{n}(x,j)\) are given by (1.1) and (1.4) respectively.

The following oracle inequality is the main result of this section.

Theorem 2.1

For any \(x\in {\mathbb {R}}^{d}\), the estimator \({\widehat{f}}_{n,d}(x)\) in (1.10) satisfies that

$$\begin{aligned} \left| {\widehat{f}}_{n,d}(x)-f(x)\right| \le \inf _{j\in {\mathcal {H}}}\left\{ 5B_{j}^{*}(x,f)+53U_{n}^{*}(x,j)\right\} +5v(x)+12\omega (x), \end{aligned}$$

where v(x) is defined in (1.3) and

$$\begin{aligned} \omega (x){:}{=}\sup _{j\in {\mathcal {H}}}{\mathcal {X}}_j(x). \end{aligned}$$
(2.6)

Proof

It follows from the definition of \({\widehat{R}}_{j}(x)\) in (1.8) that

$$\begin{aligned} |{\widehat{f}}_{j\wedge j_0}(x)-{\widehat{f}}_{j_{0}}(x)|\le & {} {\widehat{R}}_{j}(x)+{\widehat{U}}_{n}(x,j\wedge j_0) +{\widehat{U}}_{n}(x,j_{0})\nonumber \\\le & {} {\widehat{R}}_{j}(x)+2{\widehat{U}}_{n}^{*}(x,j_{0}) \end{aligned}$$
(2.7)

thanks to (1.8). The same arguments as (2.7) show

$$\begin{aligned} |{\widehat{f}}_{j_{0}\wedge j}(x)-{\widehat{f}}_{j}(x)|\le {\widehat{R}}_{j_{0}}(x)+2{\widehat{U}}_{n}^{*}(x,j). \end{aligned}$$
(2.8)

Then combining (1.9) with (2.7)–(2.8), one obtains that

$$\begin{aligned} |{\widehat{f}}_{j_0}(x)-f(x)|\le & {} |{\widehat{f}}_{j_{0}\wedge j}(x)-{\widehat{f}}_{j_{0}}(x)|+|{\widehat{f}}_{j_{0}\wedge j}(x)-{\widehat{f}}_{j}(x)|+|{\widehat{f}}_{j}(x)-f(x)|\nonumber \\\le & {} 2{\widehat{R}}_{j}(x)+4{\widehat{U}}_{n}^{*}(x,j)+|{\widehat{f}}_{j}(x)-f(x)|. \end{aligned}$$
(2.9)

Clearly, by (1.3),

$$\begin{aligned} |\xi _n(x,j)|\le \Big [|\xi _n(x,j)|-U_n(x,j)\Big ]_++U_n(x,j)\le v(x)+U_n(x,j). \end{aligned}$$

Moreover, it follows from (2.5) that

$$\begin{aligned} |{\widehat{f}}_{j}(x)-f(x)|\le B_{j}(x,f)+|\xi _{n}(x,j)|\le B_{j}^{*}(x,f)+v(x)+U_{n}^{*}(x,j). \end{aligned}$$
(2.10)

On the other hand, according to (1.2) and (2.4),

$$\begin{aligned} {\widehat{R}}_{j}(x)= & {} \sup _{j'\in {\mathcal {H}}}\Big [|{\widehat{f}}_{j\wedge j'}(x)-{\widehat{f}}_{j'}(x)|-{\widehat{U}}_{n}(x,j\wedge j')-{\widehat{U}}_{n}(x,j')\Big ]_{+}\\\le & {} \sup _{j'\in {\mathcal {H}}}\Big [|E{\widehat{f}}_{j\wedge j'}(x)-E{\widehat{f}}_{j'}(x)|+|\xi _{n}(x,j\wedge j')|-U_{n}(x,j\wedge j')+|\xi _{n}(x,j')|\\&\qquad -U_{n}(x,j')+U_{n}(x,j\wedge j') -{\widehat{U}}_{n}(x,j\wedge j')+U_{n}(x,j')-{\widehat{U}}_{n}(x,j')\Big ]_{+}. \end{aligned}$$

This with \(\sup _{j'\in {\mathcal {H}}}|E{\widehat{f}}_{j\wedge j'}(x)-E{\widehat{f}}_{j'}(x)| \le \sup _{\{j'\in {\mathcal {H}},~j'\ge j\}}\{B_{j\wedge j'}(x,f)+B_{j'}(x,f)\}\) leads to

$$\begin{aligned} {\widehat{R}}_{j}(x)\le 2B_{j}^{*}(x,f)+2v(x)+2\omega (x) \end{aligned}$$
(2.11)

because of (2.5)–(2.6) and the second inequality of Lemma 2.1. Hence, it follows from (2.9)–(2.11) that

$$\begin{aligned} |{\widehat{f}}_{j_{0}}(x)-f(x)|\le 5B_{j}^{*}(x,f)+5v(x)+4\omega (x)+4{\widehat{U}}_n^{*}(x,j)+U_n^{*}(x,j).\qquad \end{aligned}$$
(2.12)

Note that the fact \(\big [\sup _\alpha F_\alpha -\sup _\alpha G_\alpha \big ]_+\le \sup _\alpha \big [F_\alpha -G_\alpha \big ]_+\). Then

$$\begin{aligned} {\widehat{U}}^{*}_n(x,j)-13U_n^{*}(x,j)\le \Big [{\widehat{U}}^{*}_n(x,j)-13U_n^{*}(x,j)\Big ]_{+}\le 2\sup _{j\in {\mathcal {H}}}{\mathcal {X}}_{j}(x)=2\omega (x) \end{aligned}$$

thanks to the first inequality of Lemma 2.1 and (2.6). Therefore, \({\widehat{U}}^{*}_n(x,j)\le 13U_n^{*}(x,j)+2\omega (x)\). This with (2.12) shows that

$$\begin{aligned} |{\widehat{f}}_{j_{0}}(x)-f(x)|\le 5B_{j}^{*}(x,f)+53U_n^{*}(x,j)+5v(x)+12\omega (x) \end{aligned}$$

holds for any \(j\in {\mathcal {H}}\). Furthermore,

$$\begin{aligned}&|{\widehat{f}}_{n,d}(x)-f(x)|=|{\widehat{f}}_{j_0}(x)-f(x)|\\&\quad \le \inf _{j\in {\mathcal {H}}}\left\{ 5B_{j}^{*}(x,f)+53U_{n}^{*}(x,j)\right\} +5v(x)+12\omega (x) \end{aligned}$$

due to \({\widehat{f}}_{n,d}(x)={\widehat{f}}_{j_0}(x)\) in (1.10), which finishes the proof. \(\square \)

3 Two Propositions

This section is devoted to prove two necessary propositions. The following classical inequality is needed to prove Proposition 3.1.

Bernstein’s inequality ([16]).  Let \(Y_{1},\cdots ,Y_{n}\) be i.i.d. random variables with \(EY_{i}^{2}\le \sigma ^{2}\) and \(|Y_{i}|\le M\) \((i=1,2,\cdots ,n)\). Then for any \(x>0\),

$$\begin{aligned} P\left\{ \left| \frac{1}{n}\sum _{i=1}^n(Y_i-EY_i)\right| \ge \sqrt{\frac{2\sigma ^2x}{n}} +\frac{4Mx}{3n}\right\} \le 2e^{-x}. \end{aligned}$$

Now, we state the first proposition, which plays an important role in the proof of the second one.

Proposition 3.1

Let v(x) and \(\omega (x)\) be given by (1.3) and (2.6) respectively. Then for each \(\gamma >0\), there exists \(\lambda >(5\gamma +6)\Phi _{\infty }\) such that

$$\begin{aligned} \int _{{\mathbb {R}}^d} E[v(x)]^{\gamma }dx\lesssim n^{-\frac{\gamma }{2}}\quad \text{ and }\quad \int _{{\mathbb {R}}^d} E[\omega (x)]^{\gamma }dx\lesssim n^{-\frac{\gamma }{2}}, \end{aligned}$$

where \(\Phi _{\infty }=\Vert \Phi \Vert _{\infty }\) and \(\Phi \) is defined in (1.5).

Proof

According to the definitions v(x) and \(\omega (x)\), one only needs to prove the first inequality and the second one is similar. Moreover, one will show \(\int _{{\mathbb {R}}^d} E[v(x)]^{\gamma }dx\lesssim n^{-\frac{\gamma }{2}}\) in two steps.

Step 1.    Define \(F(x){:}{=}f*I_{[-1,~1]^{d}}(x)\) and

$$\begin{aligned} \overline{U_{n}}(x,j){:}{=}\sqrt{\frac{\Phi _{\infty }2^{jd+1}\sigma _{j}(x)}{n}\lambda _j}+ \frac{\Phi _{\infty }2^{jd+2}}{3n}\lambda _j, \end{aligned}$$
(3.1)

where \(\lambda _{j}=\max \big \{\frac{1}{4},~(\gamma +1)jd\ln 2+\ln (F^{-1}(x)\wedge n^{l})\big \}\) with \(l=\frac{3\gamma }{2}+2\).

Note that \(\lambda \ln n\ge 2\Phi _{\infty }\lambda _j\) follows from \(\lambda >(5\gamma +6)\Phi _{\infty }\) and \((\gamma +1)jd\ln 2+\ln (F^{-1}(x)\wedge n^{l}) \le [(\gamma +1)+l]\ln n\) with \(j\in {\mathcal {H}}\). Then \(\overline{U_{n}}(x,j)\le U_{n}(x,j)\) due to (1.4) and (3.1). Furthermore,

$$\begin{aligned} \Big [|\xi _{n}(x,j)|-U_{n}(x,j)\Big ]_+\le \Big [|\xi _{n}(x,j)|-\overline{U_{n}}(x,j)\Big ]_+. \end{aligned}$$
(3.2)

For each \(t\ge 0\),

$$\begin{aligned} P\left\{ \big [|\xi _{n}(x,j)|-\overline{U_{n}}(x,j)\big ]_+>t\right\} =P\left\{ |\xi _{n}(x,j)|-\overline{U_{n}}(x,j)>t\right\} . \end{aligned}$$

Hence,

$$\begin{aligned} E\Big [|\xi _{n}(x,j)|-\overline{U_{n}}(x,j)\Big ]_+^{\gamma }=\gamma \int _0^\infty t^{\gamma -1} P\left\{ |\xi _{n}(x,j)|-\overline{U_{n}}(x,j)>t\right\} dt. \end{aligned}$$

This with variable substitution \(t=v\omega \) and \(\omega {:}{=}\sqrt{\frac{\Phi _{\infty }2^{jd+1}\sigma _{j}(x)}{n}}+ \frac{\Phi _{\infty }2^{jd+2}}{3n}\) shows

$$\begin{aligned}&E\Big [|\xi _{n}(x,j)|-\overline{U_{n}}(x,j)\Big ]_+^{\gamma } \le \gamma \int _{0}^{\infty }(v\omega )^{\gamma -1}\times \nonumber \\&P \left\{ |\xi _{n}(x,j)|> \sqrt{\frac{\Phi _{\infty }2^{jd+1}\sigma _{j}(x)}{n}}(\sqrt{v+\lambda _j})+\frac{\Phi _{\infty }2^{jd+2}}{3n} (v+\lambda _j)\right\} \omega dv \end{aligned}$$
(3.3)

thanks to \(v+\sqrt{\lambda _j}\ge \sqrt{v+\lambda _j}\) and \(\lambda _j\ge \frac{1}{4}\).

On the other hand,

$$\begin{aligned} \xi _n(x,j){:}{=}{\widehat{f}}_{j}(x)-E{\widehat{f}}_{j}(x) =\frac{1}{n}\sum _{i=1}^{n}[K_{j}(x,X_i)-EK_{j}(x,X_i)] \end{aligned}$$

with \(K(x,y)=\sum _k\varphi (x-k)\varphi (y-k)\). Then by (1.6),

$$\begin{aligned} |K_{j}(x,X_i)|\le 2^{jd}\Phi _{\infty } \quad \text{ and }\quad EK_{j}^{2}(x,X_i)\le 2^{jd}\Phi _{\infty }\sigma _{j}(x). \end{aligned}$$

Combining these with Bernstein’s inequality, one concludes that

$$\begin{aligned} P\left\{ |\xi _{n}(x,j)|> \sqrt{\frac{\Phi _{\infty }2^{jd+1}\sigma _{j}(x)}{n}}(\sqrt{v+\lambda _j})+ \frac{\Phi _{\infty }2^{jd+2}}{3n} (v+\lambda _j)\right\} \le 2e^{-(v+\lambda _j)}. \end{aligned}$$

This with (3.3) implies that

$$\begin{aligned}&E\Big [|\xi _{n}(x,j)|-\overline{U_{n}}(x,j)\Big ]_+^{\gamma }\\&\quad \le 2\gamma \omega ^{\gamma }\int _{0}^{\infty }v^{\gamma -1}e^{-(v+\lambda _j)}dv =2\gamma \omega ^{\gamma }e^{-\lambda _j}\int _{0}^{\infty }v^{\gamma -1}e^{-v}dv\\&\quad =2\gamma \Gamma (\gamma )\omega ^{\gamma }e^{-\lambda _j}=2\Gamma (\gamma +1) \left[ \sqrt{\frac{\Phi _{\infty }2^{jd+1}\sigma _{j}(x)}{n}}+ \frac{\Phi _{\infty }2^{jd+2}}{3n}\right] ^{\gamma } e^{-\lambda _j} \end{aligned}$$

due to \(\omega {:}{=}\sqrt{\frac{\Phi _{\infty }2^{jd+1}\sigma _{j}(x)}{n}}+ \frac{\Phi _{\infty }2^{jd+2}}{3n}\). Note that \(\sigma _{j}(x)=\int _{{\mathbb {R}}^d}\Phi _{j}(t-x)f(t)dt\lesssim 2^{jd}\) and \(e^{-\lambda _{j}}\le 2^{-jd(\gamma +1)}[F(x)\vee n^{-l}]\). Then

$$\begin{aligned} \sum _{j\in {\mathcal {H}}}E\Big [|\xi _{n}(x,j)|-\overline{U_{n}}(x,j)\Big ]_+^{\gamma }\lesssim & {} \sum _{j\in {\mathcal {H}}} \Big (\frac{2^{jd}}{\sqrt{n}}\Big )^{\gamma }2^{-jd(\gamma +1)}[F(x)\vee n^{-l}]\\\lesssim & {} n^{-\frac{\gamma }{2}}[F(x)\vee n^{-l}]. \end{aligned}$$

It follows from (1.3) and (3.2) that

$$\begin{aligned} E[v(x)]^{\gamma } \le \sum _{j\in {\mathcal {H}}}E\Big [|\xi _{n}(x,j)|-\overline{U_{n}}(x,j)\Big ]_+^{\gamma } \lesssim n^{-\frac{\gamma }{2}}[F(x)\vee n^{-l}]. \end{aligned}$$
(3.4)

Step 2.   The second step is devoted to prove \(\int _{{\mathbb {R}}^d} E[v(x)]^{\gamma }dx\lesssim n^{-\frac{\gamma }{2}}\) by Step 1. Denote

$$\begin{aligned} T_{1}{:}{=}\left\{ x\in {\mathbb {R}}^{d},~F(x)>n^{-l}\right\} \quad \text{ and }\quad T_{2}={\mathbb {R}}^{d}\backslash T_1. \end{aligned}$$

Then with (3.4), one obtains

$$\begin{aligned} \int _{T_1}E[v(x)]^{\gamma }dx\lesssim n^{-\frac{\gamma }{2}}\int _{{\mathbb {R}}^d} F(x)dx\lesssim n^{-\frac{\gamma }{2}} \end{aligned}$$
(3.5)

thanks to \(F(x){:}{=}f*I_{[-1,~1]^{d}}(x)\in L^1({\mathbb {R}}^d)\).

Next, the main work is to prove \(\int _{T_2} E[v(x)]^{\gamma }dx\lesssim n^{-\frac{\gamma }{2}}\). Define

$$\begin{aligned} U(x){:}{=}\prod _{i=1}^d[x_i-1,~x_i+1],\qquad {\widehat{D}}(x){:}{=}\left\{ \sum _{i=1}^nI\{X_{i}\in U(x)\}<2\right\} \end{aligned}$$

and \(\overline{{\widehat{D}}(x)}=[{\widehat{D}}(x)]^c\), where \(A^{c}\) means the complement of the set A.

Without loss of the generality, \(\mathrm {supp}~\Phi \subseteq [-1,~1]^{d}\) is assumed in this paper. Then

$$\begin{aligned} E|K_j(x,X_i)|\le & {} \int _{{\mathbb {R}}^d}\Phi _j(x-t)f(t)dt \le 2^{jd}\int _{U(x)}\Phi (2^j(x-t))f(t)dt\\\le & {} 2^{jd}\Phi _\infty F(x) \end{aligned}$$

because of (1.6) and \(F(x)=\int _{{\mathbb {R}}^d} I_{U(x)}(t)f(t)dt\). Moreover,

$$\begin{aligned} |\xi _n(x,j)|I_{\{ {\widehat{D}}(x)\}}\le & {} \frac{1}{n}\sum _{i=1}^n\big [|K_j(x,X_i)|+E|K_j(x,X_i)|\big ]I_{\{ {\widehat{D}}(x)\}}\\\le & {} \Phi _\infty 2^{jd}[n^{-1}+F(x)]. \end{aligned}$$

By \(l\ge 1\) and \(\lambda>(5\gamma +6)\Phi _{\infty }>2\Phi _\infty \), for each \(x\in T_2\),

$$\begin{aligned} |\xi _n(x,j)|I_{\{ {\widehat{D}}(x)\}}\le \Phi _\infty 2^{jd}(n^{-1}+n^{-l})\le \Phi _\infty 2^{jd+1}n^{-1}<U_{n}(x,j), \end{aligned}$$

which implies that \( \sup _{j\in {\mathcal {H}}}\big [|\xi _n(x,j)|-U_n(x,j)\big ]_+\cdot I_{\{ {\widehat{D}}(x)\}}=0 \) holds for \(x\in T_{2}\). Hence,

$$\begin{aligned} \int _{T_2}E[v(x)]^{\gamma }I_{\{{\widehat{D}}(x)\}}dx=0. \end{aligned}$$
(3.6)

For the case \(\int _{T_2}E[v(x)]^{\gamma }I_{\big \{\overline{{\widehat{D}}(x)}\big \}}dx\). Note that \(|\xi _n(x,j)|\lesssim \Vert K_j\Vert _{_{\infty }}\lesssim 2^{jd}\le n\) follows from \(j\in {\mathcal {H}}\). Then with \(v(x){:}{=}\sup _{j\in {\mathcal {H}}}\big [|\xi _n(x,j)|-U_n(x,j)\big ]_+\),

$$\begin{aligned} \int _{T_2}E[v(x)]^{\gamma }I_{\big \{ \overline{{\widehat{D}}(x)}\big \}}dx&\le \int _{T_2}E\left[ \sup _{j\in {\mathcal {H}}}|\xi _n(x,j)|\right] ^{\gamma }I_{\big \{\overline{{\widehat{D}}(x)}\big \}}dx\nonumber \\&\quad \lesssim n^\gamma \int _{T_2}P\left\{ \overline{{\widehat{D}}(x)}\right\} dx. \end{aligned}$$
(3.7)

According to Markov’s inequality, for each \(z>0\),

$$\begin{aligned} P\left\{ \overline{{\widehat{D}}(x)}\right\} =P\left\{ \sum _{i=1}^nI\{X_{i}\in U(x)\}\ge 2\right\} \le \frac{E[\exp (z\sum _{i=1}^nI\{X_{i}\in U(x)\})]}{e^{2z}}. \end{aligned}$$

On the other hand,

$$\begin{aligned} E\left[ \exp \left( z\sum _{i=1}^nI\{X_{i}\in U(x)\}\right) \right]\le & {} \left[ \int _{t\in U(x)}e^zf(t)dt+\int _{t\notin U(x)}f(t)dt\right] ^n\\= & {} [e^zF(x)+1-F(x)]^n. \end{aligned}$$

These with \((t+1)^n\le e^{nt}\) imply that

$$\begin{aligned} P\left\{ \overline{{\widehat{D}}(x)}\right\} \le e^{-2z}[(e^z-1)F(x)+1]^n\le \exp \{-2z+(e^z-1)nF(x)\}. \end{aligned}$$
(3.8)

Put \(z=\ln 2-\ln (nF(x))\). Then \(z>0\) by \(l\ge 1\) and \(F(x)\le n^{-l}\) in \(T_2\). Furthermore, (3.8) reduces to

$$\begin{aligned} P\left\{ \overline{{\widehat{D}}(x)}\right\} \lesssim n^2F^2(x)e^{-nF(x)}\lesssim n^2F^2(x)\lesssim n^{2-l}F(x) \end{aligned}$$

thanks to \(0\le nF(x)\le n^{-l+1}\) with \(x\in T_2\). This with (3.7) leads to

$$\begin{aligned} \int _{T_2}E[v(x)]^{\gamma }I_{\big \{\overline{{\widehat{D}}(x)}\big \}}dx\lesssim n^{\gamma +2-l}\int _{{\mathbb {R}}^d} F(x)dx \lesssim n^{\gamma +2-l}\lesssim n^{-\frac{\gamma }{2}} \end{aligned}$$
(3.9)

because of \(F\in L^1({\mathbb {R}}^d)\) and \(l=\frac{3\gamma }{2}+2\).

Finally, the desired conclusion follows from (3.5), (3.6) and (3.9). The proof is completed.\(\square \)

Before giving another proposition, we need three more notations. Define

$$\begin{aligned}&U_{f}(x){:}{=}\inf _{j\in {\mathcal {H}}}\{B_{j}^{*}(x,f)+U_{n}^{*}(x,j)\}, \end{aligned}$$
(3.10)
$$\begin{aligned}&\Omega _{m}{:}{=}\{x\in {\mathbb {R}}^{d},~2^{m}\delta _n<U_{f}(x)\le 2^{m+1}\delta _n\}, \end{aligned}$$
(3.11)
$$\begin{aligned}&\Omega _{m_{0}}^{-}{:}{=}\{x\in {\mathbb {R}}^{d},~U_{f}(x)\le 2^{m_{0}}\delta _n\}, \end{aligned}$$
(3.12)

where \(\delta _n=(\frac{C\ln n}{n})^{\frac{s}{2s+d}}\) and \(m_{0}\in {\mathbb {Z}}\) satisfies \(c' \delta _n^{\frac{sr+d}{sr+dr-d}}\le 2^{m_{0}}\le c'' \delta _n^{\frac{sr+d}{sr+dr-d}}\) with some constants \(1<c'<c''<\infty \) and \(C>0\).

Note that \(U_{f}(x)\le c_0{:}{=}\sup _xU_{f}(x)\). Then there exists

$$\begin{aligned} m_2{:}{=}\min \{m\in {\mathbb {Z}},~2^{m}\delta _n\ge c_0\} \end{aligned}$$
(3.13)

such that \(\Omega _{m}=\emptyset \) for each \(m>m_{2}\). Clearly, \(m_{0}<0<m_{2}\) for large n.

Proposition 3.2

Denote

$$\begin{aligned} J_{m_{0}}^{-}{:}{=}E\int _{\Omega _{m_{0}}^{-}}|{\widehat{f}}_{n,d}(x)-f(x)|^pdx \quad \text{ and }\quad J_m{:}{=}E\int _{\Omega _{m}}[U_f(x)]^pdx. \end{aligned}$$

Then the following statements hold:

(1). For each \(p>1\),

$$\begin{aligned} J_{m_{0}}^{-}\lesssim (\ln n) (2^{m_{0}}\delta _n)^{p-1}+n^{-\frac{p}{2}}; \end{aligned}$$

(2). Let \(f\in B_{r,q}^{s}(M)\cap L^{\infty }(M)\) and \(m\in {\mathbb {Z}}\) satisfy \(m_{0}\le m\le 0\). Then

$$\begin{aligned} J_{m}\lesssim 2^{m(p-\frac{2sr+dr}{sr+d})}\delta _n^{p}; \end{aligned}$$

(3). Let \(f\in B_{r,q}^{s}(M)\cap L^{\infty }(M)\) and \(m\in {\mathbb {Z}}\) satisfy \(0\le m\le m_2\). Then

$$\begin{aligned} J_{m}\lesssim 2^{m(p-r-\frac{2sr}{d})}\delta _n^{p}; \end{aligned}$$

Moreover, if \(s>\frac{d}{r}\) and \(r\le p\), then with \(s'{:}{=}s-\frac{d}{r}+\frac{d}{p}\),

$$\begin{aligned} J_{m}\lesssim 2^{-\frac{2ms'p}{d}}\delta _n^{\frac{s'}{s}p}. \end{aligned}$$

Proof

(1). According to Theorem 2.1,

$$\begin{aligned} |{\widehat{f}}_{n,d}(x)-f(x)|\lesssim U_{f}(x)+\Delta (x), \end{aligned}$$

where \(\Delta (x)=v(x)+\omega (x)\) and \(U_f(x)\) is given by (3.10). Then for each \(p>1\),

$$\begin{aligned}&J_{m_{0}}^-=E\int _{\Omega _{m_{0}}^-}|{\widehat{f}}_{n,d}(x)-f(x)|^pdx\\&\quad \lesssim E\int _{\Omega _{m_{0}}^-}[U_{f}(x)+\Delta (x)]^{p-1}|{\widehat{f}}_{n,d}(x)-f(x)|dx. \end{aligned}$$

Moreover, \(U_{f}(x)\le 2^{m_{0}}\delta _n\) follows from (3.12). Hence,

$$\begin{aligned} J_{m_{0}}^- \lesssim (2^{m_{0}}\delta _n)^{p-1}E\Vert {\widehat{f}}_{n,d}-f\Vert _1 +E\int _{\Omega _{m_{0}}^-}[\Delta (x)]^{p-1}[2^{m_{0}}\delta _n+\Delta (x)]dx.\nonumber \\ \end{aligned}$$
(3.14)

On the other hand, \( |{\widehat{f}}_{n,d}(x)|\le \frac{1}{n}\sum _{i=1}^n\Phi _{j_{0}}(x-X_i) \) due to \({\widehat{f}}_{n,d}(x)=\sum _k{\widehat{\alpha }}_{j_{0}k}\varphi _{j_{0}k}(x)\) and \(|\sum _k\varphi (x-k)\varphi (y-k)|\le \Phi (x-y)\). Then

$$\begin{aligned}&\Vert {\widehat{f}}_{n,d}\Vert _{1}\le \frac{1}{n}\sum _{i=1}^n\int _{{\mathbb {R}}^d}\Phi _{j_{0}}(x-X_i)dx\nonumber \\ {}&=\frac{1}{n}\sum _{i=1}^n\int _{{\cup }_{j\in {\mathcal {H}}}~~ \{x,~j_0(x)=j\}}\Phi _{j}(x-X_i)dx \le \Vert \Phi \Vert _{1}\ln n \end{aligned}$$

because of \({\mathcal {H}}\) is a discrete set and the cardinality of \({\mathcal {H}}\) is no more than \(\ln n\). Therefore,

$$\begin{aligned} \Vert {\widehat{f}}_{n,d}-f\Vert _{1}\le \Vert {\widehat{f}}_{n,d}\Vert _1+\Vert f\Vert _{1}\lesssim \ln n. \end{aligned}$$

This with (3.14) and Proposition 3.1 leads to

$$\begin{aligned} J_{m_{0}}^-\lesssim (\ln n) (2^{m_{0}}\delta _n)^{p-1}+2^{m_{0}}\delta _n n^{-\frac{p-1}{2}}+n^{-\frac{p}{2}}.\end{aligned}$$
(3.15)

It follows from \(2^{m_0}\thicksim \delta _n^{\frac{sr+d}{sr+dr-d}}\) that \(2^{m_0}\delta _n n^{-\frac{p-1}{2}}\lesssim n^{-\frac{p}{2}}\) holds for \(sr-dr+d>0\) and \(2^{m_0}\delta _n n^{-\frac{p-1}{2}}\lesssim (2^{m_{0}}\delta _n)^{p-1}\) holds for \(sr-dr+d\le 0\) and \(p>1\). Combining these with (3.15), one concludes that

$$\begin{aligned} J_{m_{0}}^-\lesssim (\ln n) (2^{m_{0}}\delta _n)^{p-1}+n^{-\frac{p}{2}}, \end{aligned}$$

which is the first desired conclusion.

(2). Clearly, by \(\Omega _m=\{x\in {\mathbb {R}}^{d},~2^{m}\delta _n <U_{f}(x)\le 2^{m+1}\delta _n\}\),

$$\begin{aligned} J_m=\int _{\Omega _m}[U_{f}(x)]^pdx\le (2^{m+1}\delta _n)^p|\Omega _m|, \end{aligned}$$
(3.16)

where \(|\Omega _m|\) stands for the Lebesgue measure of the set \(\Omega _m\). On the other hand, (3.10) tells that \(U_{f}(x)=\inf _{j\in {\mathcal {H}}}\{B_{j}^{*}(x,f)+U_{n}^{*}(x,j)\}\). Then for each \(j\in {\mathcal {H}}\),

$$\begin{aligned} |\Omega _m|\le{}|\{x\in{\mathbb {R}}^d,~U_{n}^{*}(x,j)>2^{m-1}\delta _n\}|\nonumber \\&\quad +\sum _{j'\in {\mathcal {H}},~j'\ge j}|\{x\in {\mathbb {R}}^d,~B_{j'}(x,f)>2^{m-1}\delta _n\}|\nonumber \\&{:}{=}J_m^1(j)+J_m^2(j), \end{aligned}$$
(3.17)

since \(B_{j}^{*}(x,f)=\sup _{j'\in {\mathcal {H}},~j'\ge j}B_{j'}(x,f)\). Moreover, (3.16) reduces to

$$\begin{aligned} J_m\le (2^{m+1}\delta _n)^p[J_m^1(j)+J_m^2(j)]. \end{aligned}$$
(3.18)

If \(1\le r<\infty \), by using Chebyshev’s inequality and \(f\in B_{r,q}^s(M)\),

$$\begin{aligned} J_m^2(j)\le & {} \frac{\sum _{j'\in {\mathcal {H}},~j'\ge j}\Vert B_{j'}(\cdot ,f)\Vert _r^r}{(2^{m-1}\delta _n)^r} \lesssim 2^{-mr}\delta _n^{-r}\sum _{j'\in {\mathcal {H}},~j'\ge j}2^{-j'sr}\nonumber \\\lesssim & {} 2^{-mr}\delta _n^{-r}2^{-jsr}. \end{aligned}$$
(3.19)

To estimate \(J_m^1(j)\), one chooses \(j_{1}\in {\mathbb {Z}}\) satisfying

$$\begin{aligned} c_12^{\frac{md(2-r)}{sr+d}}\delta _n^{-\frac{d}{s}}\le 2^{j_{1}d}\le c_22^{\frac{md(2-r)}{sr+d}}\delta _n^{-\frac{d}{s}} \end{aligned}$$

with two constants \(c_2>c_1>1\). Thus, \(j_{1}\in {\mathcal {H}}\) for \(m_0\le m\le 0\) and large n. In fact, if \(r>2\), then

$$\begin{aligned} 1<c_1\delta _n^{-\frac{d}{s}}\le 2^{j_1d}\le c_22^{\frac{m_0d(2-r)}{sr+d}}\delta _n^{-\frac{d}{s}}\le c_2c'^{\frac{d(2-r)}{sr+d}}\delta _n^{-(\frac{d}{s}+\frac{d(r-2)}{sr+dr-d})}<\frac{n}{\ln n}\nonumber \\ \end{aligned}$$
(3.20)

thanks to the choice of \(2^{m_0}\) and \(\frac{s}{2s+d}(\frac{d}{s}+\frac{d(r-2)}{sr+dr-d})<1\). If \(1\le r\le 2\), then

$$\begin{aligned} 1<c_1c'^{\frac{d(2-r)}{sr+d}}\delta _n^{-(\frac{d}{s}+\frac{d(r-2)}{sr+dr-d})}\le c_12^{\frac{m_0d(2-r)}{sr+d}}\delta _n^{-\frac{d}{s}} \le 2^{j_1d}\le c_2\delta _n^{-\frac{d}{s}}<\frac{n}{\ln n}\nonumber \\ \end{aligned}$$
(3.21)

due to the choice of \(2^{m_0}\), \(\frac{d}{s}+\frac{d(r-2)}{sr+dr-d}>0\) and \(c_1,c'>1\). Hence, \(j_{1}\in {\mathcal {H}}\) follows from (3.20) and (3.21).

Recall that \( c' \delta _n^{\frac{sr+d}{sr+dr-d}}\le 2^{m_{0}}\le c'' \delta _n^{\frac{sr+d}{sr+dr-d}}\) and \(\delta _n=(\frac{C\ln n}{n})^{\frac{s}{2s+d}}\). Then by choosing C such that \(\max \{1,~(2M)^{\frac{d}{s}}\}<c_1<c_2<\frac{C}{4\lambda }\),

$$\begin{aligned} \lambda 2^{j_1d}\frac{\ln n}{n}\le c_2\lambda \frac{\ln n}{n}2^{\frac{md(2-r)}{sr+d}}\delta _n^{-\frac{d}{s}}=c_2\lambda C^{-1}2^{\frac{md(2-r)}{sr+d}} \delta _n^{2}<2^{m-2}\delta _n\end{aligned}$$
(3.22)

because of \(m\ge m_0\) and \(c'>1\).

Furthermore, according to the definition of \(U_{n}^{*}(x,j)\) and \(j_1\in {\mathcal {H}}\), one obtains that

$$\begin{aligned} J_m^1(j_1)\le & {} \left| \left\{ x\in {\mathbb {R}}^d,~\sup _{j'\le j_1}\sqrt{\frac{\lambda 2^{j'd}\ln n}{n}\sigma _{j'}(x)}+\frac{\lambda 2^{j_1d}\ln n}{n}>2^{m-1}\delta _n\right\} \right| \\\le & {} \sum _{j'\le j_1}\left| \left\{ x\in {\mathbb {R}}^d,~\sqrt{\frac{\lambda 2^{j'd}\ln n}{n}\sigma _{j'}(x)}>2^{m-2}\delta _n\right\} \right| \\= & {} \sum _{j'\le j_1}\left| \left\{ x\in {\mathbb {R}}^d,~\sigma _{j'}(x)>2^{2m-4}\delta _n^2\lambda ^{-1}2^{-j'd}\frac{n}{\ln n}\right\} \right| , \end{aligned}$$

where (3.22) is used in the second inequality. Moreover, it follows from \(\Vert \sigma _j\Vert _1\lesssim 1\) and (3.22) that

$$\begin{aligned} J_m^1(j_1)\le \Big (2^{2m-4}\delta _n^2\lambda ^{-1}\frac{n}{\ln n}\Big )^{-1}\sum _{j'\le j_1}\Vert \sigma _{j'}\Vert _{1}2^{j'd}\lesssim 2^{j_1d}2^{-2m}\delta _n^{-2}\frac{\ln n}{n}.\qquad \end{aligned}$$
(3.23)

For the case \(1\le r<\infty \), combining (3.18) with (3.23) and (3.19), one knows that

$$\begin{aligned}&J_m\le (2^{m+1}\delta _n)^p[J_m^1(j_1)+J_m^2(j_1)]\\&\quad \lesssim 2^{mp}\delta _n^p\Big (2^{-mr}\delta _n^{-r}2^{-j_1sr}+2^{j_1d}2^{-2m}\delta _n^{-2}\frac{\ln n}{n}\Big ). \end{aligned}$$

This with the choice of \({j_1}\) yields

$$\begin{aligned} J_m\lesssim 2^{m(p-\frac{2sr+dr}{sr+d})}\delta _n^{p}. \end{aligned}$$

If \(r=\infty \), then \(c_1(2^{m}\delta _n)^{-\frac{d}{s}}\le 2^{j_{1}d}\le c_2(2^{m}\delta _n)^{-\frac{d}{s}}\) also due to the choice of \({j_1}\). Moreover, \(f\in B_{\infty ,q}^{s}\subseteq B_{\infty ,\infty }^{s}\) follows from \(l^q\hookrightarrow l^\infty \). Then

$$\begin{aligned} \sup _{j'\ge j_1}B_{j'}(x,f)\le \sup _{j'\ge j_1}\Vert B_{j'}(\cdot ,f)\Vert _{\infty }\le M 2^{-j_{1}s}\le Mc_1^{-\frac{s}{d}} 2^m\delta _n \le 2^{m-1}\delta _n\nonumber \\ \end{aligned}$$
(3.24)

by choosing \(c_1\ge \max \{1,~(2M)^{\frac{d}{s}}\}\). Therefore, in view of (3.17),

$$\begin{aligned} J_m^2(j_1)=0. \end{aligned}$$

This with (3.18) and (3.23) shows

$$\begin{aligned} J_m\le (2^{m+1}\delta _n)^p[J_m^1(j_1)+J_m^2(j_1)]\lesssim 2^{mp}\delta _n^p2^{j_1d}2^{-2m}\delta _n^{-2}\frac{\ln n}{n}\lesssim 2^{m(p-2-\frac{d}{s})}\delta _n^{p}. \end{aligned}$$

The proof of the second estimation is completed.

(3). Take \(j_2\) satisfying \(c_32^{2m}\delta _n^{-\frac{d}{s}}\le 2^{j_{2}d}\le c_42^{2m}\delta _n^{-\frac{d}{s}}\). Then by \(\sigma _{j}(x)=\int _{{\mathbb {R}}^d}\Phi _{j}(t-x)f(t)dt<L\), there exist two positive constants

$$\begin{aligned} \max \{1,(2M)^{\frac{d}{s}}\}<c_3<c_4<\min \left\{ \frac{C}{4c_0^{2}},~C(2\sqrt{\lambda L}+2\lambda )^{-2}\right\} \end{aligned}$$

such that \(j_{2}\in {\mathcal {H}}\) and \(U_{n}^{*}(x,j_2)\le 2^{m-1}\delta _n\) for \(0<m\le m_2\). In fact, (3.13) tells that \(2^{m_{2}}\le 2c_0\delta _n^{-1}\). Then due to \(c_4<\frac{C}{4c_0^{2}}\),

$$\begin{aligned} 1<c_3\delta _n^{-\frac{d}{s}}\le 2^{j_{2}d}\le c_42^{2m_2}\delta _n^{-\frac{d}{s}}\le 4c_4c_0^2 \delta _n^{-(\frac{d}{s}+2)}<\frac{n}{\ln n}. \end{aligned}$$

Hence, \(j_{2}\in {\mathcal {H}}\). On the other hand, according to \(j_2\in {\mathcal {H}}\) and \(c_4<C(2\sqrt{\lambda L}+2\lambda )^{-2}\),

$$\begin{aligned} U_{n}^{*}(x,j_2)= & {} \sup _{j'\le j_2}\left\{ \sqrt{\frac{\lambda 2^{j'd}\ln n}{n}\sigma _{j}(x)}+\frac{\lambda 2^{j'd}\ln n}{n}\right\} \le (\sqrt{\lambda L}+\lambda )\sqrt{\frac{2^{j_2d}\ln n}{n}}\\\le & {} (\sqrt{\lambda L}+\lambda ) \sqrt{c_42^{2m}\delta _n^{-\frac{d}{s}}\frac{\ln n}{n}}\le (\sqrt{\lambda L}+\lambda ) \sqrt{c_4/C}2^{m}\delta _n\le 2^{m-1}\delta _n. \end{aligned}$$

This with (3.17) implies

$$\begin{aligned} J_{m}^{1}(j_{2})=0. \end{aligned}$$
(3.25)

When \(1\le r<\infty \), substituting (3.19) and (3.25) into (3.18), one obtains that

$$\begin{aligned} J_m\le (2^{m+1}\delta _n)^{p}[J_{m}^{1}(j_{2})+J_{m}^{2}(j_{2})]\lesssim 2^{m(p-r)}\delta _n^{p-r}2^{-{j_2}sr}\lesssim 2^{m(p-r-\frac{2sr}{d})}\delta _n^{p}. \end{aligned}$$

For the case \(r=\infty \), it follows from (3.24) and \(0<m\le m_2\) that

$$\begin{aligned} \sup _{j'\ge j_2}B_{j'}(x,f)\le M 2^{-j_{2}s}\le Mc_3^{^{-\frac{s}{d}}} 2^{-\frac{2ms}{d}}\delta _n \le 2^{m-1}\delta _n \end{aligned}$$

due to the choice of \(c_3\). Thus, \(J_m^2(j_2)=0\) because of (3.17). This with (3.18) and (3.25) leads to

$$\begin{aligned} J_m\le (2^{m+1}\delta _n)^p[J_m^1(j_2)+J_m^2(j_2)]=0. \end{aligned}$$

To finish the proof of proposition, the case of \(s>\frac{d}{r}\) and \(r\le p\) is considered. Note that \(B_{r,q}^{s}\subseteq B_{p,q}^{s'}\) with \(s'=s-\frac{d}{r}+\frac{d}{p}\). Similar to (3.19),

$$\begin{aligned} J_m^2(j)\le & {} \frac{\sum _{j'\in {\mathcal {H}},~j'\ge j}\Vert B_{j'}(\cdot ,f)\Vert _p^p}{(2^{m-1}\delta _n)^p} \lesssim 2^{-mp}\delta _n^{-p}\sum _{j'\in {\mathcal {H}},~j'\ge j}2^{-j's'p}\\\lesssim & {} 2^{-mp}\delta _n^{-p}2^{-js'p}. \end{aligned}$$

Substituting this above estimate and (3.25) into (3.18), one concludes that

$$\begin{aligned} J_m\le (2^{m+1}\delta _n)^{p}[J_{m}^{1}(j_{2})+J_{m}^{2}(j_{2})]\lesssim (2^{m}\delta _n)^{p}2^{-mp}\delta _n^{-p}2^{-j_2s'p} \lesssim 2^{-\frac{2ms'p}{d}}\delta _n^{\frac{s'}{s}p} \end{aligned}$$

thanks to \(2^{j_2d}\thicksim 2^{2m}\delta _n^{-\frac{d}{s}}\). The proof is done. \(\square \)

Remark 3.1

By a careful check of the above proofs, the choice of C in \(\delta _{n}=(\frac{C\ln n}{n})^{\frac{s}{2s+d}}\) should be chosen large in order to ensure the existence of the constants \(c_{1},c_{2},c_{3},c_{4}\). In particular, when \(r\ne 1\) and \(r\ne \infty \), we can choose \(C=1\) (i.e., \(\delta _{n}=(\frac{\ln n}{n})^{\frac{s}{2s+d}}\)), because the lower bound \(\max \{1,~(2M)^{\frac{d}{s}}\}\) of the constants \(c_1,c_3\) is unnecessary for \(1< r<\infty \).

4 Proofs

Now, we are ready to prove Theorem 1.1 and Theorem 1.2 respectively.

4.1 Proof of Theorem 1.1

Proof

According to Theorem 2.1, one obtains that

$$\begin{aligned} |{\widehat{f}}_{n,d}(x)-f(x)|\lesssim U_{f}(x)+v(x)+\omega (x), \end{aligned}$$

where \(U_{f}(x)\) is given by (3.10). This with Proposition 3.1 implies

$$\begin{aligned} E\Vert {\widehat{f}}_{n,d}-f\Vert _{p}^{p}= & {} E\int _{\Omega _{m_{0}}^{-}}|{\widehat{f}}_{n,d}(x)-f(x)|^{p}dx+ \sum _{m=m_{0}}^{\infty }E\int _{\Omega _{m}}|{\widehat{f}}_{n,d}(x)-f(x)|^{p}dx\nonumber \\\lesssim & {} E\int _{\Omega _{m_{0}}^{-}}|{\widehat{f}}_{n,d}(x)-f(x)|^{p}dx+ \sum _{m=m_{0}}^{m_{2}}E\int _{\Omega _{m}}[U_{f}(x)]^{p}dx+n^{-\frac{p}{2}}\nonumber \\= & {} J_{m_{0}}^{-}+\sum _{m=m_{0}}^{m_{2}}J_{m}+n^{-\frac{p}{2}}. \end{aligned}$$
(4.1)

Here, \(J_{m_{0}}^{-}\) and \(J_{m}\) are defined in Proposition 3.2.

To complete the proof, one divides (4.1) into four regions. Recall that \(2^{m_{0}}\thicksim \delta _n^{\frac{sr+d}{sr+dr-d}},~2^{m_{2}}\thicksim \delta _n^{-1}\) and \(\delta _n\thicksim (\frac{\ln n}{n})^{\frac{s}{2s+d}}\) by (3.12)–(3.13). Then with Proposition 3.2, the following estimations are established.

(i). For \(p\le \frac{2sr+dr}{sr+d}\),

$$\begin{aligned} J_{m_{0}}^{-}+\sum _{m=m_{0}}^{m_{2}}J_{m}\le & {} J_{m_{0}}^{-}+\sum _{m=m_{0}}^{0}J_{m}+\sum _{m=0}^{m_{2}}J_{m}\nonumber \\\lesssim & {} (\ln n)(2^{m_{0}}\delta _n)^{p-1}+2^{m_{0}(p-\frac{2sr+dr}{sr+d})}\delta _n^{p}+\delta _n^{p}+n^{-\frac{p}{2}}\nonumber \\\lesssim & {} (\ln n)\Big (\frac{\ln n}{n}\Big )^{\frac{s(p-1)}{s+d-\frac{d}{r}}}. \end{aligned}$$
(4.2)

Next, one continues to show the proofs of the rest regions based on the fact that \((2^{m_0}\delta _n)^{p-1}<\delta _n^{p}\) follows from \(p>\frac{2sr+dr}{sr+d}\).

(ii). For \(\frac{2sr+dr}{sr+d}< p<\frac{2sr}{d}+r\),

$$\begin{aligned} J_{m_{0}}^{-}+\sum _{m=m_{0}}^{m_{2}}J_{m}\le & {} J_{m_{0}}^{-}+\sum _{m=m_{0}}^{0}J_{m}+\sum _{m=0}^{m_{2}}J_{m}\nonumber \\\lesssim & {} (\ln n)(2^{m_{0}}\delta _n)^{p-1}+\delta _n^{p}+\delta _n^{p}+n^{-\frac{p}{2}}\nonumber \\\lesssim & {} \Big (\frac{\ln n}{n}\Big )^{\frac{sp}{2s+d}}. \end{aligned}$$
(4.3)

(iii). For \(p\ge \frac{2sr}{d}+r\),

$$\begin{aligned} J_{m_{0}}^{-}+\sum _{m=m_{0}}^{m_{2}}J_{m}\le & {} J_{m_{0}}^{-}+\sum _{m=m_{0}}^{0}J_{m}+\sum _{m=0}^{m_{2}}J_{m}\nonumber \\\lesssim & {} (\ln n)(2^{m_{0}}\delta _n)^{p-1}+\delta _n^{p}+2^{m_{2}(p-r-\frac{2sr}{d})}\delta _n^{p} +n^{-\frac{p}{2}}\nonumber \\\lesssim & {} \Big (\frac{\ln n}{n}\Big )^{\frac{sr}{d}}. \end{aligned}$$
(4.4)

(iv). For the case \(p\ge \frac{2sr}{d}+r\) and \(s>\frac{d}{r}\). Take \(m_1\in {\mathbb {Z}}\) satisfying

$$\begin{aligned} 2^{m_{1}}\thicksim \delta _n^{\frac{s'p(\frac{1}{s}-\frac{1}{s'})}{(\frac{2s'}{d}+1)p-\frac{2sr}{d}-r}} \end{aligned}$$

by balancing \(2^{m_{1}(p-r-\frac{2sr}{d})}\delta _n^{p}\) and \(2^{-\frac{2m_{1}s'p}{d}}\delta _n^{\frac{s'}{s}p}\). Then it follows from \(r<p\) and \(s>\frac{d}{r}\) that \(0<m_1<m_2\). Hence,

$$\begin{aligned} J_{m_{0}}^{-}+\sum _{m=m_{0}}^{m_{2}}J_{m}\le & {} J_{m_{0}}^{-}+\sum _{m=m_{0}}^{0}J_{m}+\sum _{m=0}^{m_1}J_{m}+\sum _{m=m_1} ^{m_{2}}J_{m}\\\lesssim & {} (\ln n)(2^{m_{0}}\delta _n)^{p-1}+\delta _n^{p}+2^{m_{1}(p-r-\frac{2sr}{d})}\delta _n^{p}\\&\quad + 2^{-\frac{2m_{1}s'p}{d}}\delta _n^{\frac{s'}{s}p}+n^{-\frac{p}{2}}. \end{aligned}$$

Then due to the choice of \(2^{m_1}\), \(\delta _n\thicksim (\frac{\ln n}{n})^{\frac{s}{2s+d}}\) and \(s'=s-\frac{d}{r}+\frac{d}{p}\), the above inequality reduces to

$$\begin{aligned} J_{m_{0}}^{-}+\sum _{m_{0}}^{m_{2}}J_{m}\lesssim \Big (\frac{\ln n}{n}\Big )^{\frac{s'p}{2(s-\frac{d}{r})+d}}. \end{aligned}$$
(4.5)

The proof of Theorem 1.1 is finished thanks to (4.1)–(4.5). \(\square \)

4.2 Proof of Theorem 1.2

Proof

It is easy to show that

$$\begin{aligned} \left| \prod _{i=1}^ma_i-\prod _{i=1}^mb_i\right| \le m\max _{i\in \{1,\cdots ,m\}}{\{|a_i|^{m-1},~|b_i|^{m-1}\}}\cdot \max _{i\in \{1,\cdots ,m\}}|a_i-b_i|. \end{aligned}$$

This with (1.13) and (1.14) leads to

$$\begin{aligned}&|{\widehat{f}}_{n,{\mathcal {P}}}(x)-f(x)|\nonumber \\&\quad \lesssim \max _{I\in {\mathcal {P}}}\left\{ |{\widehat{f}}_{n,|I|}(x_{I})|^{|{\mathcal {P}}|-1},~|f_{|I|}(x_{I})|^{|{\mathcal {P}}|-1}\right\} \cdot \max _{I\in {\mathcal {P}}}|{\widehat{f}}_{n,|I|}(x_{I})-f_{|I|}(x_{I})|.\nonumber \\ \end{aligned}$$
(4.6)

Obviously, \(|{\widehat{f}}_{n,|I|}(x_{I})|\le |{\widehat{f}}_{n,|I|}(x_{I})-f_{|I|}(x_I)|+|f_{|I|}(x_{I})|\) and

$$\begin{aligned} |{\widehat{f}}_{n,|I|}(x_{I})-f_{|I|}(x_{I})|^{(|{\mathcal {P}}|-1)p}\le & {} \left[ |{\widehat{f}}_{n,|I|}(x_{I})-f_{|I|}(x_{I})|+1\right] ^{(d-1)p}\nonumber \\\lesssim & {} |{\widehat{f}}_{n,|I|}(x_{I})-f_{|I|}(x_{I})|^{(d-1)p}+1. \end{aligned}$$
(4.7)

On the other hand, \(|f_{|I|}(x_{I})|\lesssim 1\) follows from \(f_{|I|}\in L^{\infty }(M)\). Combining this with (4.6) and (4.7), one concludes that

$$\begin{aligned}&|{\widehat{f}}_{n,{\mathcal {P}}}(x)-f(x)|^p\\&\quad \lesssim \left[ \max _{I\in {\mathcal {P}}}|{\widehat{f}}_{n,|I|}(x_{I})-f_{|I|}(x_{I})|^{(d-1)p}+1\right] \cdot \max _{I\in {\mathcal {P}}}|{\widehat{f}}_{n,|I|} (x_{I})-f_{|I|}(x_{I})|^{p}. \end{aligned}$$

Note that \(|{\widehat{f}}_{n,|I|}(x_{I})-f_{|I|}(x_{I})|^{(d-1)p}\) and \(|{\widehat{f}}_{n,|I|}(x_{I})-f_{|I|}(x_{I})|^{p}\) attain their maximum values for the same I. Therefore,

$$\begin{aligned} |{\widehat{f}}_{n,{\mathcal {P}}}(x)-f(x)|^p\lesssim \max _{I\in {\mathcal {P}}}|{\widehat{f}}_{n,|I|}(x_{I})-f_{|I|}(x_{I})|^{dp}+ \max _{I\in {\mathcal {P}}}|{\widehat{f}}_{n,|I|} (x_{I})-f_{|I|}(x_{I})|^{p}, \end{aligned}$$

which implies that

$$\begin{aligned} E\Vert {\widehat{f}}_{n,{\mathcal {P}}}-f\Vert _p^p\lesssim \max _{I\in {\mathcal {P}}}\left\{ E\Vert {\widehat{f}}_{n,|I|}-f_{|I|}\Vert _{pd}^{pd}+E\Vert {\widehat{f}}_{n,|I|}-f_{|I|}\Vert _p^p \right\} .\end{aligned}$$
(4.8)

According to Theorem 1.1 and \(f\in B_{r,q}^{s}(M,{\mathcal {P}})\cap L^{\infty }(M,{\mathcal {P}})\), one obtains that

$$\begin{aligned} E\Vert {\widehat{f}}_{n,|I|}-f_{|I|}\Vert _{pd}^{pd}\lesssim \alpha _{n}(pd,|I|)\Big (\frac{\ln n}{n}\Big )^{\beta (pd,|I|)pd}\end{aligned}$$
(4.9)

and

$$\begin{aligned} E\Vert {\widehat{f}}_{n,|I|}-f_{|I|}\Vert _{p}^{p}\lesssim \alpha _{n}(p,|I|)\Big (\frac{\ln n}{n}\Big )^{\beta (p,|I|)p}.\end{aligned}$$
(4.10)

Moreover, it follows from (1.11) that for each \(I\in {\mathcal {P}}\),

$$\begin{aligned} \alpha _{n}(pd,|I|)\le \alpha _{n}(p,|I|).\end{aligned}$$
(4.11)

Hence, in order to conclude the final conclusion of Theorem 1.2, it is sufficient to show \(\beta (pd,|I|)d\ge \beta (p,|I|)\) for each \(I\in {\mathcal {P}}\) because of (4.8)–(4.11).

It is equivalent to prove that \(\beta (pd,\ell )d\ge \beta (p,\ell )\) holds for each \(\ell \in \{1,\cdots ,d\}\). By (1.12),

$$\begin{aligned} \beta (pd,\ell )d=\left\{ \begin{array}{rcl} &{}\frac{ds(1-\frac{1}{pd})}{s+\ell -\frac{\ell }{r}}, ~~ &{}{pd\le \frac{2sr+\ell r}{sr+\ell };}\\ &{}\frac{ds}{2s+\ell },~~ &{}{\frac{2sr+\ell r}{sr+\ell }< pd<\frac{2sr}{\ell }+r;} \\ &{}\frac{s r}{p\ell }, ~~&{}{pd\ge \frac{2sr}{\ell }+r,~s\le \frac{\ell }{r};}\\ &{}\frac{d(s-\frac{\ell }{r}+\frac{\ell }{pd})}{2(s-\frac{\ell }{r})+\ell }, ~~ &{}{pd\ge \frac{2sr}{\ell }+r,~s>\frac{\ell }{r}}. \end{array} \right. \end{aligned}$$

Therefore, (i). For \(p\ge \frac{2sr}{\ell }+r\) and \(s>\frac{\ell }{r}\),

$$\begin{aligned} \beta (pd,\ell )d=\frac{d(s-\frac{\ell }{r}+\frac{\ell }{pd})}{2(s-\frac{\ell }{r})+\ell } \ge \frac{s-\frac{\ell }{r}+\frac{\ell }{p}}{2(s-\frac{\ell }{r})+\ell }=\beta (p,\ell ); \end{aligned}$$

(ii). For \(p\ge \frac{2sr}{\ell }+r\) and \(s\le \frac{\ell }{r}\),

$$\begin{aligned} \beta (pd,\ell )d=\frac{s r}{p\ell }=\beta (p,\ell ); \end{aligned}$$

(iii). If \(\frac{2sr+\ell r}{sr+\ell }< p<\frac{2sr}{\ell }+r\), then the possible values of \(\beta (pd,\ell )d\) are \(\frac{d(s-\frac{\ell }{r}+\frac{\ell }{pd})}{2(s-\frac{\ell }{r})+\ell }\) (for \(s>\frac{\ell }{r}\)), \(\frac{s r}{p\ell }\) (for \(s\le \frac{\ell }{r})\) and \(\frac{ds}{2s+\ell }\). Clearly,

$$\begin{aligned}&\frac{d(s-\frac{\ell }{r}+\frac{\ell }{pd})}{2(s-\frac{\ell }{r})+\ell } \ge \frac{s-\frac{\ell }{r}+\frac{\ell }{p}}{2(s-\frac{\ell }{r})+\ell } \ge \frac{s}{2s+\ell }\quad \text{ and }\quad \nonumber \\&\quad \min \left\{ \frac{s r}{p\ell },~\frac{d s}{2s+\ell }\right\} \ge \frac{s}{2s+\ell }. \quad \end{aligned}$$
(4.12)

Hence, \(\beta (pd,\ell )d\ge \frac{s}{2s+\ell }=\beta (p,\ell )\) holds in this region.

(iv). If \(p\le \frac{2sr+\ell r}{sr+\ell }\), then the possible values of \(\beta (pd,\ell )d\) are \(\frac{d(s-\frac{\ell }{r}+\frac{\ell }{pd})}{2(s-\frac{\ell }{r})+\ell }\) (for \(s>\frac{\ell }{r}\)), \(\frac{s r}{p\ell }\) (for \(s\le \frac{\ell }{r})\), \(\frac{ds}{2s+\ell }\) and \( \frac{ds(1-\frac{1}{pd})}{s+\ell -\frac{\ell }{r}}\). Due to (4.12) and \(d\ge 1\),

\( \min \left\{ \frac{d(s-\frac{\ell }{r}+\frac{\ell }{pd})}{2(s-\frac{\ell }{r})+\ell } ,~\frac{s r}{p\ell },~\frac{ds}{2s+\ell }\right\} \ge \frac{s}{2s+\ell } \ge \frac{s(1-\frac{1}{p})}{s+\ell -\frac{\ell }{r}} \quad \text{ and }\quad \frac{ds(1-\frac{1}{pd})}{s+\ell -\frac{\ell }{r}}\ge \frac{s(1-\frac{1}{p})}{s+\ell -\frac{\ell }{r}}. \)

Therefore, \(\beta (pd,\ell )d\ge \frac{s(1-\frac{1}{p})}{s+\ell -\frac{\ell }{r}}=\beta (p,\ell )\) follows in this region.

The proof is done. \(\square \)