1 Introduction and Preliminary

The density estimation for a statistical model with additive noise plays important roles in both statistics and econometrics [12]. More precisely, let \(Y_{1},\!Y_{2},\!\cdots ,Y_{n}\) be independent and identically distributed (i.i.d.) random data of the form

$$\begin{aligned} Y=X+\varepsilon , \end{aligned}$$
(1.1)

where X denotes a real valued random variable with the unknown probability density function f and \(\varepsilon \) stands for an independent random noise (error) with the probability density \(f_{\varepsilon }\). The problem is to estimate f by \(Y_{1},Y_{2},\cdots ,Y_{n}\) in some sense. It is well-known that the probability density g of Y equals to the convolution of f and \(f_\varepsilon \), it so called deconvolution problem. In particular, the model (1.1) reduces to the classical density model with no errors [5, 8], when \(f_\varepsilon \) degenerates to the Dirac functional \(\delta \).

In 1996, Delyon and Juditsky [4] investigated the noise-free density estimation by using compacted wavelets. Pensky and Vidakovic [17] and Walter [19] studied the deconvolution density estimations by using Meyer’s function over Sobolev space \(W_{2}^{s}(\mathbb {R})\) in 1999; Three years later, Fan and Koo [6] explored the MISE performance of wavelet deconvolution estimators over Besov space \(B_{r,q}^{s}(\mathbb {R})\) (\(1\le r\le 2\)). Lounici and Nickl [14] investigated the wavelet optimal \(L^\infty \) deconvolution estimations over \(B_{\infty ,\infty }^{s}(\mathbb {R})\). In 2014, Li and Liu [11] provided a completed deconvolution estimations over \(L^{p}~(1\le p<\infty )\) risk on \(B_{r,q}^{s}(\mathbb {R})~(r,q\in [1,\infty ])\) for moderately ill-posed noises by using wavelet bases.

It should be pointed out that the constructions of above wavelet estimators depend more or less on the unknown density function f. For example, the selection of their parameters depends on the unknown smoothness index s of f in linear wavelet estimators, and an upper bound of s in non-linear wavelet ones.

Goldenshluger and Lepski [7] constructed a data-driven kernel density estimator for the noise-free model in 2014, and the selection of parameters in estimator only depends on the observed data. Moreover, they studied the problem of adaptive minimax un-compactly supported density estimations on \(\mathbb {R}^{d}\) with \(L^{p}\) risk over anisotropic Nikol’skii classes. For deconvolution density estimations, Comte and Lacour [2] considered the \(L^{2}\) risk estimation by using a data-driven kernel deconvolution estimator over anistropic Nikol’skii and Sobolev classes. Three year later, Rebelles [18] extended above estimations to \(L^p\) risk over anistropic Nikol’skii classes. In 2017 and 2019, Lepski and Willer [9, 10] established the adaptive and optimal \(L^{p}\) risk estimations in the convolution structure density model via data-driven kernel method, which contains both classical density estimations and deconvolution density ones. Compared with kernel estimators, the wavelet ones provide more local information and fast algorithm. Recently, Cao and Zeng [1] constructed a data-driven wavelet estimator and attained the optimal estimate (up to a logarithmic factor) for un-compactly supported density functions over Besov spaces. However, there are few references to estimate density functions with some additive noises by using data-driven wavelet methods.

This paper provides a data-driven wavelet estimator for compactly supported density functions in deconvolution density model. It is totally adaptive because of the selection of its parameters only depends on the observed data. By using this estimator, we investigate the \(L^{p}~(1\le p<\infty )\) risk estimations for moderately ill-posed noises over Besov balls \(B^{s}_{r,q}(M,T)~(r,q\in [1,\infty ])\). On the other hand, our result includes the case of \(0<s\le \frac{1}{r}\) than the traditional wavelet ones, and the convergence rate is improved in the region of \(1\le p\le \frac{2sr+(2\beta +1)r}{sr+2\beta +1}\) than that for not necessarily compactly supported density estimations [9, 10], see Remark 4.2.

1.1 Preliminaries

We begin with the concept of Multiresolution Analysis (MRA, [8, 16]), which is a sequence of closed subspaces \(\{V_{j}\}_{j\in \mathbb {Z}}\) of the square integrable function space \(L^{2}(\mathbb {R})\) satisfying the following properties:

  1. (i)

       \(V_{j}\subset V_{j+1}\), \(j\in \mathbb {Z}\);

  2. (ii)

       \(\overline{\bigcup _{j\in \mathbb {Z}} V_{j}}=L^{2}(\mathbb {R})\) (the space \(\bigcup _{j\in \mathbb {Z}} V_{j}\) is dense in \(L^{2}(\mathbb {R})\));

  3. (iii)

       \(f(2\cdot )\in V_{j+1}\) if and only if \(f(\cdot )\in V_{j}\) for each \(j\in \mathbb {Z}\);

  4. (iv)

       There exists \(\varphi \in L^{2}(\mathbb {R})\) (scaling function) such that \(\{\varphi (\cdot -k),~k\in \mathbb {Z}\}\) forms an orthonormal basis of \(V_{0}=\overline{\textrm{span}\{\varphi (\cdot -k),~k\in \mathbb {Z}\}}\).

With the standard notation \(h_{jk}(\cdot ):=2^{\frac{j}{2}}h\left( 2^{j}\cdot -k\right) \) in wavelet analysis, we can derive a wavelet function \(\psi \) from a scaling function \(\varphi \) in a simple way such that for a fixed \(j\in \mathbb {Z}\)\(\{\psi _{jk}\}_{k\in \mathbb {Z}}\) constitutes an orthonormal basis of the orthogonal complement \(W_j\) of \(V_j\) in \(V_{j+1}\). Then for fixed \(j_0\in \mathbb {N}\), both \(\{\varphi _{j_0k},\psi _{jk}\}_{j\ge j_0,k\in \mathbb {Z}}\) and \(\{\psi _{jk}\}_{j,k\in \mathbb {Z}}\) are orthonormal bases (wavelet bases) of \(L^{2}(\mathbb {R})\). Thus, each \(f\in L^2(\mathbb {R})\) has the following expansion in \(L^2\) sense,

$$\begin{aligned} f=\sum _{k\in \mathbb {Z}}\alpha _{j_0k}\varphi _{j_0k}+\sum _{j\ge j_0}\sum _{k\in \mathbb {Z}}\beta _{jk}\psi _{jk} \end{aligned}$$

with \(\alpha _{jk}:=\langle f, \varphi _{jk}\rangle \) and \(\beta _{jk}:=\langle f,\psi _{jk}\rangle \).

As usual, let \(P_{j}\) be the orthogonal projective operator from \(L^{2}(\mathbb {R})\) onto the scaling space \(V_{j}\) with the orthonormal basis \(\{\varphi _{jk}\}_{k\in \mathbb {Z}}\). Then for each \(f\in L^{2}(\mathbb {R})\),

$$\begin{aligned} P_{j}f=\sum _{k\in \mathbb {Z}}\alpha _{jk}\varphi _{jk} \end{aligned}$$

with \(\alpha _{jk}:=\langle f,\varphi _{jk}\rangle \).

One of advantages of wavelet bases is that they can characterize Besov spaces, which contain Hölder and \(L^{2}\)-Sobolev spaces as special examples. To introduce Lemma 1.1, we need some notations: A scaling function \(\varphi \) is called m regular [3], if \(\varphi \in \mathcal {C}^{m}(\mathbb {R})\) and \(|\varphi ^{(r)}(x)|\le C(1+|x|^{2})^{-l}\) holds for each \(l\in \mathbb {Z}~(r=0,1,\cdots ,m)\); \(\Vert f\Vert _{r}\) denotes \(L^{r}(\mathbb {R})\) norm for \(f\in L^{r}(\mathbb {R})\), and \(\Vert \tau \Vert _{l^{r}}\) does \(l^{r}(\mathbb {Z})\), where

$$\begin{aligned} l^{r}(\mathbb {Z}):=\left\{ \begin{array}{rcl} \left\{ \tau =\{\tau _{k}\},~\displaystyle \sum _{k\in \mathbb {Z}}|\tau _{k}|^{r}<\infty \right\} , ~~ &{}{1\le r<\infty ;} \\ \left\{ \tau =\{\tau _{k}\},~~\displaystyle \sup _{k\in \mathbb {Z}}|\tau _{k}|<\infty \right\} , ~~ &{}{r=\infty }. \end{array} \right. \end{aligned}$$

Lemma 1.1

( [16]). Let \(\varphi \) be m regular with \(m>s>0\), \(\psi \) be the corresponding wavelet and \(f\in L^{r}(\mathbb {R})\). If \(\alpha _{jk}:=\langle f,\varphi _{jk}\rangle \), \(\beta _{jk}:=\langle f,\psi _{jk}\rangle \) and \(r,q\in [1,\infty ]\), then the following assertions are equivalent:

  1. (i)

       \(f\in B^{s}_{r,q}(\mathbb {R})\);

  2. (ii)

       \(\{2^{js}\Vert P_{j}f-f\Vert _{r}\}\in l^{q}(\mathbb {Z});\)

  3. (iii)

       \(\{2^{j\left( s-\frac{1}{r}+\frac{1}{2}\right) }\Vert \{\beta _{j\cdot }\}\Vert _{l^r}\}\in l^{q}(\mathbb {Z}).\)

The Besov norm of f can be defined by

$$\begin{aligned} \Vert f\Vert _{B^{s}_{r,q}}:=\Vert \{\alpha _{j_{0}\cdot }\}\Vert _{l^r}+ \left\| \left\{ 2^{j\left( s-\frac{1}{r}+\frac{1}{2}\right) }\Vert \{\beta _{j\cdot }\}\Vert _{l^r}\right\} _{j\ge j_{0}}\right\| _{l^q}. \end{aligned}$$

Moreover, Lemma 1.1 (i) and (ii) show that \(\Vert P_jf-f\Vert _r\lesssim 2^{-js}\) holds for \(f\in B^{s}_{r,q}(\mathbb {R})\). Here and after, the notation \(A\lesssim B\) denotes \(A\le cB\) with some fixed and independent constant \(c>0\); \(A\gtrsim B\) means \(B\lesssim A\); \(A\thicksim B\) stands for both \(A\lesssim B\) and \(A\gtrsim B\).

When \(r\le p\), Lemma 1.1 (i) and (iii) imply that with \(s'-\frac{1}{p}=s-\frac{1}{r}>0\),

$$\begin{aligned} B_{r,q}^s(\mathbb {R})\hookrightarrow B_{p,q}^{s'}(\mathbb {R}), \end{aligned}$$

where \(A\hookrightarrow B\) stands for a Banach space A continuously embedded in another Banach space B. All these claims can be found in Refs. [11, 20].

In this paper, the notation \(B_{r,q}^{s}(M)\) with \(M>0\) stands for a Besov ball, i.e.,

$$\begin{aligned} B_{r,q}^{s}(M): =\{f\in B_{r,q}^{s}(\mathbb {R}),~f~\text{ is } \text{ density } \text{ function } \text{ and }~\Vert f\Vert _{B_{r,q}^{s}}\le M\} \end{aligned}$$

and

$$\begin{aligned} B_{r,q}^{s}(M,T): =\{f\in B_{r,q}^{s}(M),~supp~f\subseteq [-T,T]~\text{ with }~T>0\}. \end{aligned}$$

Moreover, \(L^\infty (M)\) is defined by the way.

To introduce the assumptions on a noise function \(f_{\varepsilon }\), we need the Fourier transform \(f^{ft}\) of \(f\in L^{1}(\mathbb {R})\),

$$\begin{aligned} f^{ft}(t):=\int _{\mathbb {R}}f(x)e^{-itx}dx. \end{aligned}$$

A standard method extends the definition to an \(L^{2}(\mathbb {R})\) function. Furthermore, the following conditions are posed on the noise density function \(f_{\varepsilon }\). For \(\beta \ge 0\),

(T1)  \(\left| f_{\varepsilon }^{ft}(t)\right| \gtrsim (1+|t|^{2})^{-\frac{\beta }{2}};\)

(T2)  \(\left| (f_{\varepsilon }^{ft})^{(\ell )}(t)\right| \lesssim (1+|t|^{2})^{-\frac{\beta +\ell }{2}},~\ell =0,1,2.\)

Such a noise \(\varepsilon \) is said to be moderately ill-posed. Clearly, the Gamma distribution \(\Gamma (a,b)\) satisfies Conditions (T1)-(T2) with \(\beta =a.\) In particular, the index \(\beta =0\) corresponds to \(\varepsilon \) being the Dirac functional \(\delta \) which the model (1.1) reduces to the classical noise-free density estimation model.

1.2 Data-Driven Wavelet Estimator

This subsection is devoted to introduce the data-driven wavelet estimator for model (1.1). Under Condition (T1),

$$\begin{aligned} \widehat{\alpha }_{jk}:= {} \frac{2^{j/2}}{n}\sum _{i=1}^{n}(K_{j}\varphi )\left( 2^{j}Y_{i}-k\right) ~~~~\text { and }~~~~ (K_{j}\varphi )(x) := \frac{1}{2\pi }\int _{\mathbb {R}} e^{itx}\frac{\varphi ^{ft}(t)}{f_{\varepsilon }^{ft}(-2^{j}t)}dt\nonumber \\ \end{aligned}$$
(1.2)

are well-defined, where \(\varphi \) is m regular with \(m>\beta +1\). Then the classical linear wavelet estimator for deconvolution density model is given by

$$\begin{aligned} \widehat{f}_{j}(x)=\sum _{k\in \mathbb {Z}}\widehat{\alpha }_{jk}\varphi _{jk}(x). \end{aligned}$$
(1.3)

Clearly, \(E\widehat{\alpha }_{jk}=\alpha _{jk}\) and \(E\widehat{f}_{j}=P_{j}f\), the details please see Refs. [6, 11, 13]. In general, the parameter j in (1.3) depends on the smoothness index s of unknown density function f, and the estimator in (1.3) is non-adaptive [6, 11, 13].

Next, we give a selection rule of parameter j only depending on the observed data \(Y_{1},\cdots ,Y_{n}\), which is so called data-driven version. Let \(\mathcal {H}:=\left\{ 0,1,\cdots ,\left\lfloor \frac{1}{2\beta +1}\log _2{\frac{n}{\ln n}}\right\rfloor \right\} \) with \(\lfloor a\rfloor \) denoting the largest integer smaller or equal to a and

$$\begin{aligned} \xi _{n}(x,j):=\widehat{f}_{j}(x)-E\widehat{f}_{j}(x) \end{aligned}$$
(1.4)

be the stochastic error of \(\widehat{f}_{j}\). Moreover, for any \(x\in [-T,T]\),

$$\begin{aligned}{} & {} \widehat{R}_{j}(x): =\sup _{j'\in \mathcal {H}}\left[ \left| \widehat{f}_{j\wedge j'}(x)-\widehat{f}_{j'}(x)\right| -U_{n}(j\wedge j') -U_{n}(j')\right] _{+}, \end{aligned}$$
(1.5)
$$\begin{aligned}{} & {} U_{n}^{*}(j): =\sup _{j'\in \mathcal {H},~j'\le j}U_{n}(j'). \end{aligned}$$
(1.6)

Here and after, \(a\wedge b:=\min \{a,b\}\), \(a_{+}:=\max \{a,0\}\) and

$$\begin{aligned} U_{n}(j):=\sqrt{\frac{\lambda 2^{j(2\beta +1)}\ln n}{n}}+\frac{\lambda 2^{j(\beta +1)}\ln n}{n}, \end{aligned}$$
(1.7)

where the constant \(\lambda >0\) will be determined later on.

Thus, the selection of \(j=j_{0}\) in (1.3) is obtained by

$$\begin{aligned} j_{0}=j_{0}(x)=\mathop {\text {arginf}}_{j\in \mathcal {H}} \left[ \widehat{R}_{j}(x)+2U_{n}^{*}(j)\right] . \end{aligned}$$
(1.8)

Obviously, it only depends on the observed data \(Y_1,\cdots ,Y_n\). Then the data-driven wavelet estimator is shown by

$$\begin{aligned} \widehat{f}_{n}(x):=\widehat{f}_{j_{0}}(x)=\sum _{k\in \mathbb {Z}}\widehat{\alpha }_{j_{0}k} \varphi _{j_{0}k}(x) \end{aligned}$$
(1.9)

with \(j_0\in \mathcal {H}\) being given in (1.8).

2 Oracle Inequality

We shall state a point-wise oracle inequality in this section, which plays the key roles in the proofs of Proposition 3.2 and Theorem 4.1.

Let \(B_j(x,f)\) be the bias of the estimator \(\widehat{f}_{j}(x)\), i.e.,

$$\begin{aligned} B_{j}(x,f):=\left| E\widehat{f}_{j}(x)-f(x)\right| =\left| P_jf(x)-f(x)\right| \end{aligned}$$
(2.1)

and define

$$\begin{aligned} B_{j}^{*}(x,f):= \sup _{j'\in \mathcal {H},~j'\ge j}B_{j'}(x,f)\quad \text { and }\quad v(x):= \sup _{j\in \mathcal {H}}\Big [\left| \xi _{n}(x,j)\right| -U_n(j)\Big ]_{+}, \end{aligned}$$
(2.2)

where \(\xi _{n}(x,j)\) and \(U_{n}(j)\) are given by (1.4) and (1.7) respectively.

Theorem 2.1

For any \(x\in [-T,T]\), the estimator \(\widehat{f}_{n}(x)\) in (1.9) satisfies that

$$\begin{aligned} \left| \widehat{f}_{n}(x)-f(x)\right| \le \inf _{j\in \mathcal {H}}\left\{ 5B_{j}^{*}(x,f)+5U_{n}^{*}(j)\right\} +5v(x), \end{aligned}$$

where \(U_{n}^{*}(j)\) is defined in (1.6) and \(B_{j}^{*}(x,f),~v(x)\) are defined in (2.2).

Proof

Obviously, it follows from (1.5) and (1.6) that

$$\begin{aligned} \left| \widehat{f}_{j\wedge j_0}(x)-\widehat{f}_{j_{0}}(x)\right| \le \widehat{R}_{j}(x)+U_{n}(j\wedge j_0) +U_{n}(j_{0}) \le \widehat{R}_{j}(x)+2U_{n}^{*}(j_{0}).\nonumber \\ \end{aligned}$$
(2.3)

The same arguments as (2.3) show

$$\begin{aligned} \left| \widehat{f}_{j_{0}\wedge j}(x)-\widehat{f}_{j}(x)\right| \le \widehat{R}_{j_{0}}(x)+2U_{n}^{*}(j). \end{aligned}$$
(2.4)

Then combining (2.3) and (2.4), one obtains that

$$\begin{aligned} \left| \widehat{f}_{j_0}(x)-f(x)\right|\le & {} \left| \widehat{f}_{j_{0}\wedge j}(x)-\widehat{f}_{j_{0}}(x)\right| +\left| \widehat{f}_{j_{0}\wedge j}(x)-\widehat{f}_{j}(x)\right| + \left| \widehat{f}_{j}(x)-f(x)\right| \nonumber \\\le & {} 2\widehat{R}_{j}(x)+4U_{n}^{*}(j)+\left| \widehat{f}_{j}(x)-f(x)\right| \end{aligned}$$
(2.5)

due to \(\widehat{f}_{j_{0}\wedge j}=\widehat{f}_{j\wedge j_{0}}\) and the selection of \(j_0\) in (1.8).

Clearly, by (1.6) and (2.2),

$$\begin{aligned} |\xi _n(x,j)|\le \Big [|\xi _n(x,j)|-U_n(j)\Big ]_++U_n(j)\le v(x)+U_{n}^{*}(j). \end{aligned}$$

This with (2.1) and (2.2) implies that

$$\begin{aligned} \left| \widehat{f}_{j}(x)-f(x)\right| \le B_{j}(x,f)+\left| \xi _{n}(x,j)\right| \le B_{j}^{*}(x,f)+v(x)+U_{n}^{*}(j). \end{aligned}$$
(2.6)

On the other hand, according to (1.4) and (1.5),

$$\begin{aligned} \widehat{R}_{j}(x)=&{} \sup _{j'\in \mathcal {H}}\Big [\left| \widehat{f}_{j\wedge j'}(x)-\widehat{f}_{j'}(x)\right| -U_{n}(j\wedge j')-U_{n}(j')\Big ]_{+}\\\le&{} \sup _{j'\in \mathcal {H}}\Big [\left| E\widehat{f}_{j\wedge j'}(x)-E\widehat{f}_{j'}(x)\right| +\left| \xi _{n}(x,j\wedge j')\right| -U_{n}(j\wedge j')+\left| \xi _{n}(x,j')\right| -U_{n}(j')\Big ]_{+}. \end{aligned}$$

This with \(\displaystyle \sup \limits _{j'\in \mathcal {H}}\left| E\widehat{f}_{j\wedge j'}(x)-E\widehat{f}_{j'}(x)\right| \le \sup \limits _{j'\in \mathcal {H},j'\ge j}\{B_{j\wedge j'}(x,f)+B_{j'}(x,f)\}\) and (2.2) leads to

$$\begin{aligned} \widehat{R}_{j}(x)\le 2B_{j}^{*}(x,f)+2v(x). \end{aligned}$$
(2.7)

Hence, it follows from (2.5)–(2.7) that

$$\begin{aligned} |\widehat{f}_{j_{0}}(x)-f(x)| \le 5B_{j}^{*}(x,f)+5v(x)+5U_n^{*}(j) \end{aligned}$$

holds for any \(j\in \mathcal {H}\). Furthermore,

$$\begin{aligned} |\widehat{f}_{n}(x)-f(x)|=|\widehat{f}_{j_0}(x)-f(x)|\le \inf _{j\in \mathcal {H}}\left\{ 5B_{j}^{*}(x,f)+5U_{n}^{*}(j)\right\} +5v(x) \end{aligned}$$

thanks to \(\widehat{f}_{n}(x)=\widehat{f}_{j_0}(x)\) in (1.9). The proof is done. \(\square \)

3 Two Propositions

This section provides two necessary propositions. Some lemmas and classical inequality are needed, in order to prove Proposition 3.1.

Note that the next condition

(C1)  \(\varphi \in L^{1}(\mathbb {R})\) and \(|(\varphi ^{ft})^{(\ell )}(t)|\lesssim (1+|t|^{2})^{-\frac{m}{2}}\) with \(m>1\) and \(\ell =0,1,2\)  (see Ref. [13])

can be followed from the m regular of scaling function \(\varphi \). Then, the following lemma holds according to the work of Liu and Zeng [13].

Lemma 3.1

([13]). Let \(\varphi \) be m regular and Conditions (T1)–(T2) hold with \(\beta >1\) and \(m>\beta +1\). Then \(K_{j}\varphi \) in (1.2) satisfies that

$$\begin{aligned} \left| \sum _{k\in \mathbb {Z}}(K_{j}\varphi )(x-k)\varphi (y-k)\right| \le M_{0}2^{j\beta }\left( 1+|x-y|^{2}\right) ^{-1}, \end{aligned}$$

where \(M_{0}>0\) is some constant.

To introduce Lemma 3.2, we define

$$\begin{aligned} K^{*}_{j}(t,x):=2^{j}\sum _{k\in \mathbb {Z}}(K_{j}\varphi )\left( 2^{j}t-k\right) \varphi \left( 2^{j}x-k\right) , \end{aligned}$$
(3.1)

where \(K_{j}\varphi \) is given by (1.2). Then the estimator \(\widehat{f}_{j}(x)\) in (1.9) can be rewritten as \(\displaystyle \widehat{f}_{j}(x)=\frac{1}{n}\sum \nolimits _{i=1}^{n}K^{*}_{j}(Y_{i},x)\). Furthermore, the following lemma holds.

Lemma 3.2

Let \(\varphi \) be m regular and Conditions (T1)–(T2) hold with \(\beta >1\) and \(m>\beta +1\). Then \(K_{j}^{*}(t,x)\) in (3.1) satisfies that

$$\begin{aligned} \left| K_{j}^{*}(t,x)\right| \le M_{1}2^{j(\beta +1)} ~~~~\text{ and }~~~~ E\left| K_{j}^{*}(Y_{1},x)\right| ^{2}\le M_{1}2^{j(2\beta +1)}, \end{aligned}$$

where \(M_1>0\) is some constant.

Proof

According to the definition of \(K_{j}^{*}(t,x)\) in (3.1) and Lemma 3.1, one obtains

$$\begin{aligned} \left| K^{*}_{j}(t,x)\right| =\left| 2^{j}\sum _{k\in \mathbb {Z}}(K_{j}\varphi )(2^{j}t-k)\varphi (2^{j}x-k)\right| \le M_02^{j(\beta +1)}. \end{aligned}$$
(3.2)

On the other hand, \(\Vert g\Vert _{\infty }=\Vert f*f_{\varepsilon }\Vert _{\infty } \le \Vert f\Vert _{\infty }\Vert f_{\varepsilon }\Vert _{1}=\Vert f\Vert _{\infty }\). This with (3.1) and Lemma 3.1 leads to

$$\begin{aligned} E|K_{j}^{*}(Y_{1},x)|^{2}{}&{} \le \int _{\mathbb {R}}|K^{*}_{j}(t,x)|^{2}g(t)dt \nonumber \\ {}&\le \Vert f\Vert _{\infty }M_{0}^{2}2^{j(2\beta +2)}2^{-j}\int _{\mathbb {R}} (1+|2^{j}t-2^{j}x|^{2})^{-2} d(2^{j}t-2^{j}x)\nonumber \\{}&{} \le \int _{\mathbb {R}}(1+|t|^{2})^{-2}dt\Vert f\Vert _{\infty }M_{0}^{2}2^{j(2\beta +1)}. \end{aligned}$$
(3.3)

The desired conclusions are concluded by choosing \(M_1:=\max \{\Vert f\Vert _{\infty }M_{0}^{2}\int _{\mathbb {R}}(1+|t|^{2})^{-2}dt,~M_0\}\) and (3.2)–(3.3). \(\square \)

To show Proposition 3.1, we need a well-known inequality.

Bernstein’s inequality ([15]). Let \(Y_{1},\cdots ,Y_{n}\) be i.i.d. random variables with \(EY_{i}^{2}\le \sigma ^{2}\) and \(|Y_{i}|\le M\) \((i=1,2,\cdots ,n)\). Then for any \(x>0\),

$$\begin{aligned} P\left\{ \left| \frac{1}{n}\sum _{i=1}^n(Y_i-EY_i)\right| \ge \sqrt{\frac{2\sigma ^2x}{n}} +\frac{4Mx}{3n}\right\} \le 2e^{-x}. \end{aligned}$$

Now, we state the first proposition which is one of main ingredients in the proof of the second one.

Proposition 3.1

Let \(\varphi \) be m regular and Conditions (T1)–(T2) hold with \(\beta >1\) and \(m>\beta +1\). Then for each \(x\in [-T,T]\) and \(\gamma >0\), there exists \(\lambda >\max \{2M_1,~2M_1(\beta +2)\gamma \ln 2\}\) such that

$$\begin{aligned} E[v(x)]^{\gamma }\lesssim n^{-\frac{\gamma }{2}}, \end{aligned}$$

where v(x) is defined in (2.2) and \(M_1\) is the positive constant in Lemma 3.2.

Proof

For any \(j\in \mathcal {H}\), one defines

$$\begin{aligned} \overline{U_{n}}(j):=\sqrt{\frac{2M_{1}2^{j(2\beta +1)}}{n}\lambda _j} + \frac{4M_{1}2^{j(\beta +1)}}{3n}\lambda _j, \end{aligned}$$
(3.4)

where \(\lambda _{j}=\max \left\{ (\beta +2)\gamma j\ln 2,\frac{1}{4}\right\} \).

Note that \(\lambda \ln n\ge 2M_{1}\lambda _j\) for large n follows from \(\lambda >\max \{2M_1,2M_1(\beta +2)\gamma \ln 2\}\) and \(j\in \mathcal {H}\). Then \(\overline{U_{n}}(j)\le U_{n}(j)\) due to (1.7) and (3.4). Furthermore,

$$\begin{aligned} \Big [\left| \xi _{n}(x,j)\right| -U_{n}(j)\Big ]_+\le \Big [\left| \xi _{n}(x,j)\right| -\overline{U_{n}}(j)\Big ]_+. \end{aligned}$$
(3.5)

For each \(t\ge 0\),

$$\begin{aligned} P\left\{ \big [\left| \xi _{n}(x,j)\right| -\overline{U_{n}}(j)\big ]_+>t\right\} =P\left\{ \left| \xi _{n}(x,j)\right| -\overline{U_{n}}(j)>t\right\} . \end{aligned}$$

Hence,

$$\begin{aligned} E\Big [|\xi _{n}(x,j)|-\overline{U_{n}}(j)\Big ]_+^{\gamma } =\gamma \int _0^\infty t^{\gamma -1} P\left\{ \left| \xi _{n}(x,j)\right| -\overline{U_{n}}(j)>t\right\} dt. \end{aligned}$$

This with variable substitution \(t=v\omega \) and \(\omega :=\sqrt{\frac{2M_{1}2^{j(2\beta +1)}}{n}}+ \frac{4M_{1}2^{j(\beta +1)}}{3n}\) shows

$$\begin{aligned}&E\Big [|\xi _{n}(x,j)|-\overline{U_{n}}(j)\Big ]_+^{\gamma }\nonumber \\\le&\gamma \int _{0}^{\infty }(v\omega )^{\gamma -1}P\left\{ |\xi _{n}(x,j)|> \sqrt{\frac{2M_{1}2^{j(2\beta +1)}}{n}}(\sqrt{v+\lambda _j})+\frac{4M_{1}2^{j(\beta +1)}}{3n} (v+\lambda _j)\right\} \omega dv \end{aligned}$$
(3.6)

because of \(v+\sqrt{\lambda _j}\ge \sqrt{v+\lambda _j}\) and \(\lambda _j\ge \frac{1}{4}\).

On the other hand, \(\left| K_{j}^{*}(Y_{i},x)\right| \le M_{1}2^{j(\beta +1)}\), \(E\left| K_{j}^{*}(Y_{i},x)\right| ^{2}\le M_{1}2^{j(2\beta +1)} \) and

$$\begin{aligned} \xi _n(x,j) =\frac{1}{n}\sum _{i=1}^{n}\left[ K_{j}^{*}(Y_{i},x)-EK_{j}^{*}(Y_{i},x)\right] \end{aligned}$$

by Lemma 3.2. Then

$$\begin{aligned} P\Bigg \{\left| \xi _{n}(x,j)\right| > \sqrt{\frac{2M_{1}2^{j(2\beta +1)}}{n}}\left( \sqrt{v+\lambda _j}\right) + \frac{4M_{1}2^{j(\beta +1)}}{3n} (v+\lambda _j)\Bigg \}\le 2e^{-(v+\lambda _j)} \end{aligned}$$

thanks to Bernstein’s inequality. This with (3.6) implies that

$$\begin{aligned} E\Big [|\xi _{n}&(x,j)|-\overline{U_{n}}(j)\Big ]_+^{\gamma }\le 2\gamma \omega ^{\gamma }\int _{0}^{\infty }v^{\gamma -1}e^{-(v+\lambda _j)}dv\\&=2\gamma \omega ^{\gamma }e^{-\lambda _j}\int _{0}^{\infty }v^{\gamma -1}e^{-v}dv\\&=2\gamma \Gamma (\gamma )\omega ^{\gamma }e^{-\lambda _j}\\&=2\Gamma (\gamma +1) \left[ \sqrt{\frac{2M_{1}2^{j(2\beta +1)}}{n}}+ \frac{4M_{1}2^{j(\beta +1)}}{3n}\right] ^{\gamma } e^{-\lambda _j} \end{aligned}$$

due to \(\omega :=\sqrt{\frac{2M_{1}2^{j(2\beta +1)}}{n}}+ \frac{4M_{1}2^{j(\beta +1)}}{3n}\). Hence, according to \(e^{-\lambda _{j}}\le 2^{-(\beta +2)\gamma j}\), one knows

$$\begin{aligned} \sum _{j\in \mathcal {H}}E\Big [\left| \xi _{n}(x,j)\right| -\overline{U_{n}}(j)\Big ]_+^{\gamma } \lesssim \sum _{j\in \mathcal {H}}\left( \frac{2^{j(\beta +1)}}{\sqrt{n}}\right) ^{\gamma } 2^{-(\beta +2)\gamma j} \lesssim n^{-\frac{\gamma }{2}}. \end{aligned}$$
(3.7)

Combining (2.2), (3.5) and (3.7), one obtains that

$$\begin{aligned} E[v(x)]^{\gamma } \lesssim E\sup _{j\in \mathcal {H}}\Big [\left| \xi _{n}(x,j)\right| -\overline{U_{n}}(j)\Big ]_+^{\gamma } \lesssim \sum _{j\in \mathcal {H}}E\Big [|\xi _{n}(x,j)|-\overline{U_{n}}(j)\Big ]_+^{\gamma } \lesssim n^{-\frac{\gamma }{2}}, \end{aligned}$$

since \(\mathcal {H}\) is a discrete set, which completes the proof. \(\square \)

Before giving another proposition, we introduce the following notations:

$$\begin{aligned}{} & {} U_{f}(x):=\inf _{j\in \mathcal {H}}\left\{ B_{j}^{*}(x,f)+U_{n}^{*}(j)\right\} , \end{aligned}$$
(3.8)
$$\begin{aligned}{} & {} \Omega _{m}:=\left\{ x\in [-T,T],~2^{m}\delta _n<U_{f}(x)\le 2^{m+1}\delta _n\right\} , \end{aligned}$$
(3.9)
$$\begin{aligned}{} & {} \Omega _{0}^{-}:=\left\{ x\in [-T,T],~U_{f}(x)\le \delta _n\right\} , \end{aligned}$$
(3.10)

where \(\delta _n=\left( \frac{C\ln n}{n}\right) ^{\frac{s}{2s+2\beta +1}}\) and \(C>1\) is some constant.

Note that \(U_{f}(x)\le c_0:=\sup _xU_{f}(x)\). Then there exists

$$\begin{aligned} m_2:=\min \left\{ m\in \mathbb {Z},~2^{m}\delta _n\ge c_0\right\} \end{aligned}$$
(3.11)

such that \(\Omega _{m}=\emptyset \) for each \(m>m_{2}\). Clearly, \(m_{2}>0\) for large n.

Proposition 3.2

Let \(U_f(x),\Omega _m,\Omega _{0}^-\) be defined in (3.8)–(3.10) respectively and \(\varphi \) be m regular, Conditions (T1)–(T2) hold with \(\beta >1\) and \(m>\beta +1\). Then

$$\begin{aligned} J_{0}^{-}:=E\int _{\Omega _{0}^{-}}|\widehat{f}_{n}(x)-f(x)|^pdx \quad \text{ and }\quad J_m:=E\int _{\Omega _{m}}[U_f(x)]^pdx \end{aligned}$$

satisfy that

(1) For each \(p\in [1,\infty )\),

$$\begin{aligned} J_{0}^{-}\lesssim \delta _n^{p}; \end{aligned}$$

(2) If \(f\in B_{r,q}^{s}(M)\cap L^{\infty }(M)\) and \(m\in \mathbb {Z}\) satisfy \(0\le m\le m_2\), then

$$\begin{aligned} J_{m}\lesssim 2^{m\left( p-r-\frac{2sr}{2\beta +1}~\right) }~\delta _n^{p}; \end{aligned}$$

Moreover, if \(s>\frac{1}{r}\) and \(r\le p\), then with \(s':=s-\frac{1}{r}+\frac{1}{p}\),

$$\begin{aligned} J_{m}\lesssim 2^{-\frac{2ms'p}{2\beta +1}}\delta _n^{\frac{s'}{s}p}. \end{aligned}$$

Proof

(1) According to Theorem 2.1, one finds that

$$\begin{aligned} |\widehat{f}_{n}(x)-f(x)|\lesssim U_{f}(x)+v(x) \end{aligned}$$

holds for any \(x\in [-T,T]\), where v(x) and \(U_f(x)\) are given by (2.2) and (3.8) respectively. Then for each \(p\in [1,\infty )\), by using (3.10) and Proposition 3.1,

$$\begin{aligned} J_{0}^-=E\int _{\Omega _{0}^-}|\widehat{f}_{n}(x)-f(x)|^pdx\lesssim E\int _{\Omega _{0}^-}[U_{f}(x)+v(x)]^{p}dx \lesssim \delta _n^{p}+n^{-\frac{p}{2}}\lesssim \delta _n^{p} \end{aligned}$$

thanks to \(\delta _n\thicksim \left( \frac{\ln n}{n}\right) ^{\frac{s}{2s+2\beta +1}}\), which is the first desired conclusion.

(2) Take \(j_1\) satisfying \(c_12^{\frac{2\,m}{2\beta +1}}~~\delta _n^{-\frac{1}{s}}\le 2^{j_{1}}\le c_22^{\frac{2\,m}{2\beta +1}}~~\delta _n^{-\frac{1}{s}}\), where two positive constants \(c_1,c_2\) satisfy

$$\begin{aligned} (2M)^{\frac{1}{s}}I_{\{r=\infty \}}<c_1<c_2< \min \left\{ \frac{C}{4c_0^{2}},\frac{C}{4\left( \sqrt{\lambda }+\lambda \right) ^{2}} \right\} ^{\frac{1}{2\beta +1}}. \end{aligned}$$
(3.12)

Then \(j_{1}\in \mathcal {H}\) and \(U_{n}^{*}(x,j_1)\le 2^{m-1}\delta _n\) with \(0<m\le m_2\) and large n. In fact, (3.11) tells that \(2^{m_{2}}\le 2c_0\delta _n^{-1}\). Due to \(0<m\le m_{2}\), (3.12) and \(\delta _{n}= \left( \frac{C\ln n}{n}\right) ^{\frac{s}{2\,s+2\beta +1}}\), one concludes that

$$\begin{aligned} 1<c_1\delta _n^{-\frac{1}{s}}\le 2^{j_{1}} \le c_22^{\frac{2m_2}{2\beta +1}}~~\delta _n^{-\frac{1}{s}}\le c_2(2c_0)^{\frac{2}{2\beta +1}}~~\delta _n^{-\left( \frac{1}{s}+\frac{2}{2\beta +1}~~\right) } <\Big (\frac{n}{\ln n}\Big )^{\frac{1}{2\beta +1}}. \end{aligned}$$

Hence, \(j_{1}\in \mathcal {H}\). This with \(2^{j_{1}}\le c_{2}2^{\frac{2\,m}{2\beta +1}}~~\delta _{n}^{-\frac{1}{s}}\) and (3.12) shows that

$$\begin{aligned} U_{n}^{*}(j_1)\le&{} \sup _{j' \le j_1}\left\{ \sqrt{\frac{\lambda 2^{j'(2\beta +1)}\ln n}{n}}+\frac{\lambda 2^{j'(2\beta +1)}\ln n}{n}\right\} \nonumber \\\le&{} (\sqrt{\lambda }+\lambda )\sqrt{\frac{2^{j_1(2\beta +1)}\ln n}{n}}\nonumber \\\le&{} (\sqrt{\lambda }+\lambda ) \sqrt{c_2^{2\beta +1}2^{2m}\delta _n^{-\frac{2\beta +1}{s}}~\frac{\ln n}{n}} \nonumber \\\le&{} (\sqrt{\lambda }+\lambda ) \sqrt{c_2^{2\beta +1}/C}2^{m}\delta _n\le 2^{m-1}\delta _n. \end{aligned}$$
(3.13)

Clearly, by \(\Omega _m=\{x\in [-T,T],~2^{m}\delta _n <U_{f}(x)\le 2^{m+1}\delta _n\}\),

$$\begin{aligned} J_m=\int _{\Omega _m}[U_{f}(x)]^pdx\le (2^{m+1}\delta _n)^p|\Omega _m|, \end{aligned}$$
(3.14)

where \(|\Omega _m|\) stands for the Lebesgue measure of the set \(\Omega _m\). Moreover, (3.8) and (3.13) lead to

$$\begin{aligned} |\Omega _m|\le & {} \left| \left\{ x\in [-T,T],~U_{f}(x)>2^{m}\delta _n\right\} \right| \nonumber \\\le & {} \left| \left\{ x\in [-T,T],~B_{j_{1}}^{*}(x,f)+U_{n}^{*}(j_{1})>2^{m}\delta _n\right\} \right| \nonumber \\\le & {} \left| \left\{ x\in [-T,T],~B_{j_{1}}^{*}(x,f)>2^{m-1}\delta _n\right\} \right| . \end{aligned}$$
(3.15)

When \(1\le r<\infty \), by using Chebyshev’s inequality, (2.2), (3.15) and \(f\in B_{r,q}^s(M)\),

$$\begin{aligned} |\Omega _m|\le & {} \left| \left\{ x\in [-T,T],~B_{j_{1}}^{*}(x,f)>2^{m-1}\delta _n\right\} \right| \nonumber \\\le & {} \sum _{j\in \mathcal {H},j\ge j_{1}}\left| \left\{ x\in [-T,T],~B_{j}(x,f)>2^{m-1}\delta _n\right\} \right| \nonumber \\\le & {} \sum _{j\in \mathcal {H},j\ge j_{1}}\frac{\Vert B_{j}(\cdot ,f)\Vert _r^r}{\left( 2^{m-1}\delta _n\right) ^r} \lesssim 2^{-mr}\delta _n^{-r}2^{-j_{1}sr}. \end{aligned}$$
(3.16)

Substituting (3.16) into (3.14), one obtains that

$$\begin{aligned} J_m\lesssim (2^{m+1}\delta _n)^{p}2^{-mr}\delta _n^{-r}2^{-j_{1}sr}\lesssim 2^{m(p-r)}\delta _n^{p-r}2^{-{j_1}sr}\lesssim 2^{m\left( p-r-\frac{2sr}{2\beta +1}~~\right) }~\delta _n^{p}\quad \end{aligned}$$

due to \(2^{j_{1}}\thicksim 2^{\frac{2\,m}{2\beta +1}}~~\delta _{n}^{-\frac{1}{s}}\).

For the case \(r=\infty \), it follows from \(f\in B_{r,q}^{s}(M)\) and \(m>0\) that

$$\begin{aligned} B_{j_{1}}^{*}(x,f)= \sup _{j'\ge j_1}B_{j'}(x,f)\le M 2^{-j_{1}s}\le Mc_1^{^{-s}} ~2^{-\frac{2ms}{2\beta +1}}~~\delta _n \le 2^{m-1}\delta _n \end{aligned}$$

thanks to the choice of \(2^{j_{1}}\ge c_12^{\frac{2\,m}{2\beta +1}}\delta _{n}^{-\frac{1}{s}}\) with \(c_1>(2M)^{\frac{1}{s}}\). Thus, \(|\Omega _{m}|=0\) because of (3.15). Furthermore, it reduces to \( J_m\le \left( 2^{m+1}\delta _n\right) ^p|\Omega _{m}|=0 \) by (3.14).

Finally, one considers the case of \(s>\frac{1}{r}\) and \(r\le p\). Note that \(B_{r,q}^{s}\hookrightarrow B_{p,q}^{s'}\) with \(s'=s-\frac{1}{r}+\frac{1}{p}\). Similar to (3.16),

$$\begin{aligned} |\Omega _m|\le & {} \sum _{j\in \mathcal {H},j\ge j_{1}}\left| \left\{ x\in [-T,T],~B_{j}(x,f)>2^{m-1}\delta _n\right\} \right| \\\le & {} \sum _{j\in \mathcal {H},j\ge j_{1}}\frac{\Vert B_{j}(\cdot ,f)\Vert _p^p}{(2^{m-1}\delta _n)^p} \lesssim 2^{-mp}\delta _n^{-p}2^{-j_{1}s'p}. \end{aligned}$$

This with (3.14) and \(2^{j_{1}}\thicksim 2^{\frac{2\,m}{2\beta +1}}~~\delta _{n}^{-\frac{1}{s}}\) implies that

$$\begin{aligned} J_m \le (2^{m+1}\delta _n)^{p}2^{-mp}\delta _n^{-p}2^{-j_{1}s'p} \lesssim 2^{-j_1s'p} \lesssim 2^{-\frac{2ms'p}{2\beta +1}}~~\delta _n^{\frac{s'}{s}p}. \end{aligned}$$

The proof is done. \(\square \)

4 Main Result

This section is devoted to state and prove our main theorem.

Theorem 4.1

Let \(\varphi \) be m regular and the Conditions (T1)–(T2) hold with \(\beta >1\) and \(m> \beta +1\). Then for \(0<s<m\), \(r,q\in [1,\infty ]\) and \(p\in [1,\infty )\), the estimator \(\widehat{f}_{n}\) in (1.9) satisfies

$$\begin{aligned} \sup _{f\in B_{r,q}^{s}~(M,T)\cap L^{\infty }~(M)}E\Vert \widehat{f}_{n}I_{[-T,T]}-f\Vert _{p}^{p}\lesssim \Big (\frac{\ln n}{n}\Big )^{\theta p}, \end{aligned}$$

where

$$\begin{aligned} \theta :=\left\{ \begin{array}{rcl} &{}\frac{s}{2s+2\beta +1},\;\;\,\;\;\;\;\;~~ &{}{1\le p<\frac{2sr}{2\beta +1}+r;} \\ \\ &{}\frac{sr}{(2\beta +1)p}, \;\;\;\;\;\;\;\;\, ~~&{}{p\ge \frac{2sr}{2\beta +1}+r,~s\le \frac{1}{r};} \\ \\ &{}\frac{s-\frac{1}{r}+\frac{1}{p}}{2(s-\frac{1}{r})+2\beta +1}, ~~ &{}{p\ge \frac{2sr}{2\beta +1}+r,~s>\frac{1}{r}}. \end{array} \right. \end{aligned}$$

Remark 4.1

Note that \(\theta =\min \left\{ \frac{s}{2\,s+2\beta +1}, \frac{s-\frac{1}{r}+\frac{1}{p}}{2(s-\frac{1}{r})+2\beta +1}\right\} \) for \(s>\frac{1}{r}\) and \(\beta >1\), which coincides with Theorem 4 of Li and Liu [11]. On the other hand, the estimation in the case of \(0<s\le \frac{1}{r}\) is investigated, whereas it is none for that with traditional wavelet estimators.

Remark 4.2

When \(p\in \left[ 1,\frac{2sr+(2\beta +1)r}{sr+2\beta +1}\right] \subset \left[ 1,\frac{2sr}{2\beta +1}+r\right] \), the convergence rate \(\frac{s}{2s+2\beta +1}\) is improved than \(\frac{s(1-1/p)}{s-(2\beta +1)/r+(2\beta +1)}\) for not necessarily compactly supported density estimation with \(\alpha =d=1\) in Refs. [9, 10]. It is reasonable that the condition of compactly support is stricter.

Proof

It follows from Theorem 2.1 that

$$\begin{aligned} \left| \widehat{f}_{n}(x)-f(x)\right| \lesssim U_{f}(x)+v(x) \end{aligned}$$

holds for any \(x\in [-T,T]\). This with Proposition 3.1 leads to

$$\begin{aligned} E\left\| \widehat{f}_{n}I_{[-T,T]}-f\right\| _{p}^{p}=&{} E\int _{\Omega _{0}^{-}}\left| \widehat{f}_{n}(x)-f(x)\right| ^{p}dx + \sum _{m=0}^{\infty }E\int _{\Omega _{m}}\left| \widehat{f}_{n}(x)-f(x)\right| ^{p}dx\nonumber \\\lesssim&E\int _{\Omega _{0}^{-}}\left| \widehat{f}_{n}(x)-f(x)\right| ^{p}dx + \sum _{m=0}^{m_{2}}E\int _{\Omega _{m}}[U_{f}(x)]^{p}dx+n^{-\frac{p}{2}}\nonumber \\=&{} J_{0}^{-}+\sum _{m=0}^{m_{2}}J_{m}+n^{-\frac{p}{2}}. \end{aligned}$$
(4.1)

Here, \(J_{0}^{-}\) and \(J_{m}\) can be found in Proposition 3.2.

To complete the proof, one divides (4.1) into three regions. Recall that \(2^{m_{2}}\thicksim \delta _n^{-1}\) and \(\delta _n\thicksim \left( \frac{\ln n}{n}\right) ^{\frac{s}{2s+2\beta +1}}\) by (3.10)–(3.11). According to Proposition 3.2, the following estimations are established.

(i) For \(1\le p<\frac{2sr}{2\beta +1}+r\),

$$\begin{aligned} J_{0}^{-}+\sum _{m=0}^{m_{2}}J_{m}+n^{-\frac{p}{2}} \lesssim \delta _n^{p}+n^{-\frac{p}{2}} \lesssim \Big (\frac{\ln n}{n}\Big )^{\frac{sp}{2s+2\beta +1}}. \end{aligned}$$
(4.2)

(ii) For \(p\ge \frac{2sr}{2\beta +1}+r\),

$$\begin{aligned} {}&{} J_{0}^{-}+\sum _{m=0}^{m_{2}}J_{m}+n^{-\frac{p}{2}} \lesssim \delta _n^{p}+2^{m_{2}\left( p-r-\frac{2sr}{2\beta +1}~~\right) }~~\delta _n^{p}+n^{-\frac{p}{2}} \lesssim \Big (\frac{\ln n}{n}\Big )^{\frac{sr}{2\beta +1}}. \end{aligned}$$
(4.3)

(iii) For the case \(p\ge \frac{2sr}{2\beta +1}+r\) and \(s>\frac{1}{r}\). Take \(m_1\in \mathbb {Z}\) satisfying

$$\begin{aligned} 2^{m_{1}}\thicksim \delta _n^{\frac{s'p\left( \frac{1}{s}-\frac{1}{s'}\right) }{\left( \frac{2s'}{2\beta +1}~+1\right) p-\frac{2sr}{2\beta +1}-r}}~~~~~~. \end{aligned}$$
(4.4)

Then it follows from \(r<p,~p\ge \frac{2sr}{2\beta +1}+r\) and \(s>\frac{1}{r}\) that \(0<m_1<m_2\). Hence,

$$\begin{aligned} J_{0}^{-}+\sum _{m=0}^{m_{2}}J_{m}+n^{-\frac{p}{2}}\le&{} J_{0}^{-}+\sum _{m=0}^{m_1}J_{m}+\sum _{m=m_1} ^{m_{2}}J_{m}+n^{-\frac{p}{2}}\\ {}\lesssim&{} \delta _n^{p}+2^{m_{1}\left( p-r-\frac{2sr}{2\beta +1}~~\right) }~\delta _n^{p}+ 2^{-\frac{2m_{1}s'p}{2\beta +1}}~~\delta _n^{\frac{s'}{s}p}+n^{-\frac{p}{2}}. \end{aligned}$$

Combining with (4.4), \(\delta _n\thicksim \left( \frac{\ln n}{n}\right) ^{\frac{s}{2\,s+2\beta +1}}\) and \(s'=s-\frac{1}{r}+\frac{1}{p}\), the above inequality reduces to

$$\begin{aligned} J_{0}^{-}+\sum _{m=0}^{m_{2}}J_{m}+n^{-\frac{p}{2}}\lesssim \Big (\frac{\ln n}{n}\Big )^{\frac{s'p}{2(s-\frac{1}{r})+2\beta +1}}. \end{aligned}$$

This with (4.1)–(4.4) leads to the desired conclusion, which finishes the proof.

\(\square \)

\(\bullet \) Concluding remark

It is worth to note that we assume \(\beta >1\) in Theorem 4.1. For the case of \(\beta \in (0,1]\), the same conclusion of Theorem 4.1 holds, if the following Condition (T3) is additional.

$$\begin{aligned} \mathrm (T3)~~\left\{ \begin{aligned}&\int _{\mathbb {R}}\frac{|g_{\varepsilon }(x)|}{1+|y+2^{j}x|^{2}}dx\lesssim 2^{j(\beta -2)}|y|^{1-\beta }(|y|\ge 1)~~ \text {for}~~\beta \in (0,1);\\ \\&\frac{d}{dt}\frac{(f_{\varepsilon }^{ft})'(t)}{[f_{\varepsilon }^{ft}(t)]^{2}}\in L^{1}(\mathbb {R})~~\text {for}~~\beta =1, \end{aligned} \right. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \end{aligned}$$

where \(g_{\varepsilon }(x)=\mathcal {F}^{-1} \left\{ \frac{d}{dt}\frac{\left( f_{\varepsilon }^{ft}\right) '(t)}{\left[ f_{\varepsilon }^{ft}(t)\right] ^{2}}\right\} (x)\) and \(\mathcal {F}^{-1}\) means the inverse Fourier transform. Although Condition (T3) looks complicated and unnatural, the Gamma distribution provides an example, see Example 4.1 in Ref. [13].

If Condition (T3) is added, the next lemma can be concluded easily.

Lemma 4.1

([13]). Let \(\varphi \) be m regular and Conditions (T1)–(T3) hold with \(0<\beta \le 1\) and \(m>\beta +1\). Then \(K_{j}\varphi \) in (1.2) satisfies that

$$\begin{aligned} \left| \sum _{k\in \mathbb {Z}}(K_{j}\varphi )(x-k)\varphi (y-k)\right| \le M_{0}2^{j\beta }\left( 1+|x-y|^{2}\right) ^{-\frac{\beta +1}{2}}, \end{aligned}$$

where \(M_{0}>0\) is some constant.

Thus, the same conclusions of Lemma 3.2 are established for \(0<\beta \le 1\), which imply that the same conclusion of Theorem 4.1 for the case \(0<\beta \le 1\) holds under Conditions (T1)–(T3).