Data-Driven Wavelet Estimations for Density Derivatives

Cao, Kaikai; Zeng, Xiaochen

doi:10.1007/s40840-024-01766-5

Data-Driven Wavelet Estimations for Density Derivatives

Published: 16 September 2024

Volume 47, article number 169, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Bulletin of the Malaysian Mathematical Sciences Society Aims and scope Submit manuscript

Data-Driven Wavelet Estimations for Density Derivatives

Download PDF

Kaikai Cao¹ &
Xiaochen Zeng²

Abstract

This paper addresses the adaptive wavelet estimations for density derivatives by using data-driven methods. Based on the classical linear wavelet estimator of density derivatives, we provide a point-wise estimation under the local Hölder condition firstly. Moreover, we introduce a data-driven wavelet estimator for adaptivity and prove a point-wise oracle inequality, which does not require any assumption on the underlying function. Finally, by using the point-wise oracle inequality, the point-wise estimation under the local Hölder condition and $L^p$-risk ($1\le p<\infty $) estimation on Besov spaces are investigated respectively.

Convergence rates of wavelet density estimation for negatively dependent sample

Article Open access 14 June 2019

Pointwise density estimation based on negatively associated data

Article Open access 25 July 2019

Adaptive Wavelet Density Estimation Under Independence Hypothesis

Article 06 September 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The estimations of density derivatives play important roles in the exploration of structures in curves, comparison of regression curves, analysis of human growth data, mean shift clustering and hypothesis testing [16]. More precisely, let $(\Omega ,\mathscr {F},P)$ be a probability measurable space and $X_{1},\ldots ,X_{n}$ be independent and identically distributed (i.i.d.) random samples with an unknown density function f. The purpose is to estimate the density derivative $f^{(d)}$ with $d\in \mathbb {N}$ from the observed data $X_{1},\ldots ,X_{n}$.

In particular, the density derivative estimation model can be reduced to the density one, when the order $d=0$. For density estimations, the classical kernel methods give nice estimations [9, 19, 21]. Compared with kernel estimators, the wavelet ones have better performances because of they can provide more local information and have fast wavelet algorithm [5, 15]. For instance, Donoho et al. [6] have made almost perfect achievements in wavelet estimations, which established an adaptive and optimal estimation (up to a logarithmic factor) for a univariate density function over $L^p$-risk ($1\le p<\infty $) on Besov spaces.

In contrast to the traditional adaptive estimations, Goldenshluger and Lepski [7] constructed a kernel estimator for density functions by using data-driven methods, and provided $L^p$-risk ($1\le p<\infty $) estimations over anisotropic Nikol’skii classes in 2014. Five years later, Liu and Wu [13] introduced a data-driven wavelet estimator and considered point-wise density estimations under the local anisotropic Hölder condition. Recently, Cao and Zeng [1] investigated the adaptive $L^p$-risk ($1\le p<\infty $) estimations under the independence hypothesis on Besov spaces by using the data-driven wavelet estimator.

Along with the density estimations, it is often necessary to estimate the derivatives of density function. Müller and Gasser [18] discussed kernel estimations for density derivatives over $L^{2}$-risk on Sobolev spaces. Then in 1996, Rao [20] explored wavelet density derivative estimations over $L^{2}$-risk on Sobolev spaces. Moreover, Rao’s estimates were generalized to unmatched Besov spaces $B_{r,q}^s$ and $L^{p}$-risk ($1\le p<\infty $) in Ref. [3]. In 2013, Liu and Wang [12] defined the new linear and nonlinear wavelet estimators for density derivatives, and provided $L^{p}$-risk estimations on Besov spaces, respectively.

This paper investigates the adaptive wavelet estimations for density derivatives. Based on the classical linear wavelet estimator for density derivatives, we show the point-wise estimations under the local Hölder condition firstly. Furthermore, motivated by the works of Goldenshluger and Lepski [7] and Cao and Zeng [1], we introduce a data-driven wavelet estimator for adaptivity and prove a point-wise oracle inequality, which does not require any assumption on the underlying function f or $f^{(d)}$ (except for the restrictions ensuring the existence of the model and of the risk). Finally, by using the point-wise oracle inequality, we give the point-wise estimations under the local Hölder condition and $L^p$-risk $(1\le p<\infty )$ estimations on Besov spaces respectively.

1.1 Wavelets and Function Spaces

We begin with an important concept in wavelet analysis in this subsection. A Multiresolution Analysis (MRA, [8, 17]) is a sequence of closed subspaces $\{V_{j}\}_{j\in \mathbb {Z}}$ of the square integrable function space $L^{2}(\mathbb {R})$ satisfying the following properties:

(i).
$V_{j}\subset V_{j+1}$, $j\in \mathbb {Z}$;
(ii).
$\overline{\bigcup _{j\in \mathbb {Z}} V_{j}}=L^{2}(\mathbb {R})$ (the space $\bigcup _{j\in \mathbb {Z}} V_{j}$ is dense in $L^{2}(\mathbb {R})$);
(iii).
$f(2\cdot )\in V_{j+1}$ if and only if $f(\cdot )\in V_{j}$ for each $j\in \mathbb {Z}$;
(iv).
There exists $\varphi \in L^{2}(\mathbb {R})$ (scaling function) such that $\{\varphi (\cdot -k),~k\in \mathbb {Z}\}$ forms an orthonormal basis of $V_{0}=\overline{\textrm{span}\{\varphi (\cdot -k),k\in \mathbb {Z}\}}$.

Moreover, a wavelet function $\psi $ can be derived from the scaling function $\varphi $ in a simple way such that for fixed $j_0\in \mathbb {N}$, both $\{\varphi _{j_0k},\psi _{jk}\}_{j\ge j_0,k\in \mathbb {Z}}$ and $\{\psi _{jk}\}_{j,k\in \mathbb {Z}}$ are orthonormal bases (wavelet bases) of $L^{2}(\mathbb {R})$, where $h_{jk}(\cdot ):=2^{\frac{j}{2}}h(2^{j}\cdot -k)$ for $h=\varphi $ or $\psi $. Hence, for each $f\in L^2(\mathbb {R})$,

$$\begin{aligned} f=\sum _{k\in \mathbb {Z}}s_{j_0k}\varphi _{j_0k}+\sum _{j\ge j_0}\sum _{k\in \mathbb {Z}}d_{jk}\psi _{jk} \end{aligned}$$

holds in $L^2$-sense, where $s_{jk}:=\langle f, \varphi _{jk}\rangle $ and $d_{jk}:=\langle f,\psi _{jk}\rangle $. When $\varphi $ is t regular, the above identity holds in $L^{p}$-sense $(p\ge 1)$. Here and after, a scaling function $\varphi $ is called t regular [4] ($t\in \mathbb {N}$), if $\varphi \in C^{t}(\mathbb {R})$ and $|\varphi ^{(r)}(x)|\le C(1+|x|^{2})^{-l}$ for any $l\in \mathbb {Z}~(r=0,1,\ldots ,t)$. For instance, Daubechies’s scaling function $D_{2N}$ is t regular for large N and Meyer’s function possesses any order of regularity. Furthermore, it is easy to verify that the regularity of $\varphi $ implies the regularity of $\psi $.

As usual, the notation $P_j$ stands for the orthogonal projection operator from $L^{2}(\mathbb {R})$ onto the scaling space $V_{j}$ with the orthonormal basis $\{\varphi _{jk}\}_{k\in \mathbb {Z}}$. Thus, for each $f\in L^2(\mathbb {R})$,

$$\begin{aligned} P_{j}f=\sum _{k\in \mathbb {Z}}s_{jk}\varphi _{jk} \end{aligned}$$

with $s_{jk}:=\langle f,\varphi _{jk}\rangle $. If $\varphi $ satisfies condition ($\theta $), i.e.,

$$\begin{aligned} \Theta _{\varphi }(\cdot ):=\sum _{k\in \mathbb {Z}}|\varphi (\cdot -k)|\in L^{\infty }(\mathbb {R}), \end{aligned}$$

then $P_{j}f$ is well-defined for $f\in L^{p}(\mathbb {R})~(1\le p\le \infty )$. Furthermore, Condition ($\theta $) can be followed by the regularity of $\varphi $.

As in Refs. [13, 14], we shall investigate the point-wise estimations under the local Hölder condition. For a univariate function f, the local Hölder condition of order $s>0$ at the point $x_{0}\in \mathbb {R}$ means that for a fixed constant $L>0$ and each $x,y\in \Omega _{x_{0}}$ (a neighbourhood of the point $x_0$),

$$\begin{aligned} |f^{([s])}(y)-f^{([s])}(x)|\le L|y-x|^{s-[s]}, \end{aligned}$$

where [s] stands for the largest integer strictly small than s. Here, all those functions are denoted by $H^{s}(\Omega _{x_{0}})$. Obviously, $f\in H^{s+d}(\Omega _{x_{0}})$ if and only if $f^{(d)}\in H^{s}(\Omega _{x_{0}})$ with $d\in \mathbb {N}$.

The following lemma is necessary for the point-wise estimations.

Lemma 1.1

[14, 22, 23] Let $\varphi \in L^{2}(\mathbb {R})$ be t regular scaling function and $\psi $ be the corresponding wavelet. If $f\in H^{s}(\Omega _{x_{0}})\cap L^{2}(\mathbb {R})$ with $s>0$ and $t\ge [s]$, then for $x\in \Omega _{x_{0}}$ and sufficiently large j,

(i).
$f(x)=\sum \limits _{k\in \mathbb {Z}}s_{j_{0}k}\varphi _{j_{0}k}(x)+ \sum \limits _{j=j_0}^{\infty }\sum \limits _{k\in \mathbb {Z}}d_{jk}\psi _{jk}(x)$ holds pointwisely;
(ii).
$\sup \limits _{x\in \Omega _{x_{0}}}\sup \limits _{f\in H^{s}(\Omega _{x_{0}})\cap L^{2}(\mathbb {R})}|f(x)-P_{j}f(x)|\lesssim 2^{-js}.$

Here and throughout, $A\lesssim B$ stands for $A\le cB$ with some constant $c>0$; $A\gtrsim B$ means $B\lesssim A$; $A\thicksim B$ denotes both $A\lesssim B$ and $A\gtrsim B$.

In this paper, the notation $H^{s+d}(\Omega _{x_{0}},M)$ with $d\in \mathbb {N}$ means that

$$\begin{aligned} H^{s+d}(\Omega _{x_{0}},M):= \{f\in H^{s+d}(\Omega _{x_{0}}), \Vert f^{(d)}\Vert _{1}\vee \Vert f^{(d)}\Vert _{\infty }\le M\}, \end{aligned}$$

where M is a positive constant and $a\vee b:=\max \{a,~b\}$.

On the other hand, the Besov spaces are needed in order to establish $L^p$-risk estimations. Let $W_r^n(\mathbb {R})$ be the Sobolev space with a non-negative integer exponent n,

$$\begin{aligned} W_r^n(\mathbb {R}):=\{f\in L^r(\mathbb {R}),~f^{(n)}\in L^r(\mathbb {R})\}, \end{aligned}$$

and $\Vert f\Vert _{W_r^n}:=\Vert f\Vert _r+\Vert f^{(n)}\Vert _r.$ Then $L^r(\mathbb {R})$ can be seen as $W_r^0(\mathbb {R})$. For $1\le r,q\le \infty $ and $s=n+\alpha $ with $\alpha \in (0,1]$, a Besov space $B_{r,q}^{s}(\mathbb {R})$ is defined by

$$\begin{aligned} B_{r,q}^{s}(\mathbb {R}):=\{f\in W_r^n(\mathbb {R}),~\Vert t^{-\alpha }\omega _r^2(f^{(n)},t)\Vert _q^*<\infty \} \end{aligned}$$

with the norm $\Vert f\Vert _{B_{r,q}^{s}}:=\Vert f\Vert _{W_r^n}+\Vert t^{-\alpha }\omega _r^2(f^{(n)},t)\Vert _q^*$. Here, $\omega _r^2(f,t):=\sup _{|h|\le t}\Vert f(\cdot +2h)-2f(\cdot +h)+f(\cdot )\Vert _r$ denotes the smoothness modulus of f and

$$\begin{aligned}\Vert h\Vert _q^*:=\left\{ \begin{array}{ll} (\int _0^{+\infty }|h(t)|^q\frac{dt}{t})^{\frac{1}{q}}, & \hbox {if} 1\le q<\infty ; \\ \mathop {\mathrm {ess~sup}}\limits _{t\in \mathbb {R}}|h(t)|, & \hbox {if} q=\infty . \\ \end{array} \right. \end{aligned}$$

Then for $f\in L^{r}(\mathbb {R})$, $f\in W_{r}^{n+d}(\mathbb {R})$ if and only if $f^{(d)}\in W_{r}^{n}(\mathbb {R})$, since $f^{(n+d)}\in L^{r}(\mathbb {R})$ implies $f^{(j)}\in L^{r}(\mathbb {R})~(j=1,2,\ldots ,n+d)$ (see Ref. [8]). Hence, $f\in B_{r,q}^{s+d}(\mathbb {R})$ if and only if $f^{(d)}\in B_{r,q}^{s}(\mathbb {R})$.

One advantage of wavelet bases is that they can characterize Besov spaces.

Lemma 1.2

[17] Let $\varphi $ be t regular with $t>s>0$ and $\psi $ be the corresponding wavelet. Then for $f\in L^{r}(\mathbb {R})$ and $r,q\in [1,\infty ]$, the following conditions are equivalent:

(i).
$f\in B^{s}_{r,q}(\mathbb {R});$
(ii).
$\{2^{js}\Vert P_{j}f-f\Vert _{r}\}_{j\in \mathbb {Z}}\in l^{q}(\mathbb {Z});$
(iii).
$\{2^{j(s-\frac{1}{r}+\frac{1}{2})}\Vert \{d_{j\cdot }\}\Vert _{l^r}\}_{j\in \mathbb {Z}}\in l^{q}(\mathbb {Z}).$

The Besov norm of f can be given by

$$\begin{aligned} \Vert f\Vert _{B_{r,q}^{s}}:=\left\| \{s_{j_{0}\cdot }\}\right\| _{l^r}+ \left\| \left\{ 2^{j(s-\frac{1}{r}+\frac{1}{2})}\left\| \{d_{j\cdot }\}\right\| _{l^r}\right\} _{j\ge j_{0}}\right\| _{l^q}. \end{aligned}$$

Furthermore, Lemma 1.2 (i) and (ii) show that $\Vert P_jf-f\Vert _r\lesssim 2^{-js}$ holds for $f\in B^{s}_{r,q}(\mathbb {R})$. When $r\le p$, Lemma 1.2 (i) and (iii) imply that with $s'-\frac{1}{p}=s-\frac{1}{r}>0$,

$$\begin{aligned} B_{r,q}^s(\mathbb {R})\hookrightarrow B_{p,q}^{s'}(\mathbb {R}), \end{aligned}$$

where $A\hookrightarrow B$ stands for a Banach space A continuously embedded in another Banach space B. All these claims can be found in Refs. [11, 24].

In this paper, the notation $B_{r,q}^{s+d}(M)$ with $M>0$ stands for

$$\begin{aligned} B_{r,q}^{s+d}(M): =\{f\in B_{r,q}^{s+d}(\mathbb {R}),~ \Vert f\Vert _{B_{r,q}^{s+d}}\vee \Vert f^{(d)}\Vert _{\infty }\le M\}. \end{aligned}$$

and

$$\begin{aligned} B_{r,q}^{s+d}(M,T): =\{f\in B_{r,q}^{s+d}(M),~supp~f\subseteq [-T,T]~\text{ with } \text{ some }~T>0\}. \end{aligned}$$

(1.1)

Moreover, $L^\infty (M)$ is defined by the way. On the other hand, it follows form $f\in B_{r,q}^{s+d}(M)$ that $f^{(d)}\in B_{r,q}^{s}(\mathbb {R})$ and $\Vert f^{(d)}\Vert _{B_{r,q}^{s}}\le M$.

1.2 Our Results

As in [3, 20], the linear wavelet estimator for density derivatives is introduced by

$$\begin{aligned} \widehat{f^{(d)}_{j}}(x):=\sum _{k\in \mathbb {Z}}\widehat{\alpha }_{jk}\varphi _{jk}(x), \end{aligned}$$

(1.2)

where $\widehat{\alpha }_{jk}:=\frac{(-1)^{d}}{n}\sum _{i=1}^{n} [\varphi _{jk}]^{(d)}(X_{i})$ and $\varphi $ is t regular with $t\ge d$. Clearly, $E\widehat{\alpha }_{jk}=\alpha _{jk}:=\langle f^{(d)},\varphi _{jk}\rangle $ and $E\widehat{f^{(d)}_{j}}=P_{j}f^{(d)}$.

Next, we are in a position to introduce our results in this paper. The first theorem gives a linear wavelet point-wise estimation for density derivatives under the local Hölder condition.

Theorem 1.1

Let $\varphi $ be t regular with $t\ge d\ge 0$ and $\widehat{f^{(d)}_{j^{*}}}$ be the linear wavelet estimator in (1.2). Then for $0<s<t$ and $2^{j^{*}}\thicksim n^{\frac{1}{2s+2d+1}}$,

$$\begin{aligned} \sup \limits _{x\in \Omega _{x_{0}}}\sup \limits _{f\in H^{s+d}(\Omega _{x_{0}},M)\cap L^{\infty }(M)} E\left| \widehat{f^{(d)}_{j^{*}}}(x)-f^{(d)}(x)\right| ^{p}\lesssim n^{-\frac{sp}{2s+2d+1}}. \end{aligned}$$

Remark 1.1

When the order $d=0$, the density derivative estimation model can be reduced to the classical density one, and Theorem 1.1 coincides with the conclusion of Theorem 3 in one dimension in Ref. [13].

Remark 1.2

Note that the parameter j of the linear wavelet estimator depends on the smoothness index s of unknown density function f in Theorem 1.1, and the estimator in (1.2) is non-adaptive[6, 10, 11].

Motivated by the works in Refs. [1, 2, 7, 14], we provide a selection rule of parameter j in (1.2) only depending on the observed data $X_{1},\ldots ,X_{n}$, which is so called data-driven version and totally adaptive estimator.

Let $\mathcal {H}:=\left\{ 0,1,\ldots ,\lfloor \frac{1}{2d+1}\log _2{\frac{n}{\ln n}}\rfloor \right\} $ with $\lfloor a\rfloor $ denoting the integer part of a. Thus, the selection rule of $j=j_{0}$ in (1.2) is given by

$$\begin{aligned} \widehat{R}_{j}(x):= & \sup _{j'\in \mathcal {H}}\left[ \left| \widehat{f_{j\wedge j'}^{(d)}}(x)-\widehat{f_{j'}^{(d)}}(x)\right| -\tau _{n}(j\wedge j') -\tau _{n}(j')\right] _{+},\end{aligned}$$

(1.3)

$$\begin{aligned} j_{0}= & j_{0}(x)=\mathop {\text {arginf}}_{j\in \mathcal {H}} \left[ \widehat{R}_{j}(x)+2\tau _{n}(j)\right] . \end{aligned}$$

(1.4)

Here and throughout, $a\wedge b:=\min \{a,~b\}$, $a_{+}:=\max \{a,~0\}$ and

$$\begin{aligned} \tau _{n}(j):=\left( \frac{\lambda 2^{j(2d+1)}\ln n}{n}\right) ^{\frac{1}{2}}, \end{aligned}$$

(1.5)

where $\lambda >0$ is a constant determined later on. Clearly, it only depends on the observed data $X_1,\ldots ,X_n$. Thus, the data-driven wavelet estimator is obtained by

$$\begin{aligned} \widehat{f_{n}^{(d)}}(x):=\widehat{f_{j_{0}}^{(d)}}(x)=\sum _{k\in \mathbb {Z}}\widehat{\alpha }_{j_{0}k} \varphi _{j_{0}k}(x) \end{aligned}$$

(1.6)

with $j_0\in \mathcal {H}$ being given in (1.4).

To introduce Theorem 1.2, let

$$\begin{aligned} B_{j}(x,f):=|P_jf^{(d)}(x)-f^{(d)}(x)|\quad \text{ and }\quad S_{n}(x,j):=\widehat{f_{j}^{(d)}}(x)-E\widehat{f_{j}^{(d)}}(x) \end{aligned}$$

(1.7)

be the bias and the stochastic error of $\widehat{f_{j}^{(d)}}$, respectively. Furthermore, we define

$$\begin{aligned} B_{j}^{*}(x,f):=\sup _{j'\in \mathcal {H},~j'\ge j}B_{j'}(x,f)\quad \text{ and }\quad \aleph _{n}(x):=\sup _{j\in \mathcal {H}}\Big [|S_{n}(x,j)|-\tau _{n}(j)\Big ]_{+}, \end{aligned}$$

(1.8)

where $\tau _{n}(j)$ is given by (1.5).

Then the following point-wise oracle inequality is established, which plays the key roles in the proofs of Theorems 1.3–1.4.

Theorem 1.2

For any $x\in \mathbb {R}$, the estimator $\widehat{f_{n}^{(d)}}(x)$ in (1.6) satisfies that

$$\begin{aligned} |\widehat{f_{n}^{(d)}}(x)-f^{(d)}(x)|\le \inf _{j\in \mathcal {H}}\left\{ 5B_{j}^{*}(x,f)+5\tau _{n}(j)\right\} +5\aleph _{n}(x), \end{aligned}$$

where $\tau _{n}(j)$ is given by (1.5) and $B_{j}^{*}(x,f),~\aleph _{n}(x)$ are determined by (1.8).

Moreover, by using Theorem 1.2, we obtain the adaptive point-wise estimation and $L^p$-risk $(1\le p<\infty )$ estimation based on the data-driven estimator in (1.6).

Theorem 1.3

Let $\varphi $ be t regular with $t\ge d\ge 0$. Then for $0<s<t$, the data-driven estimator $\widehat{f^{(d)}_{n}}$ in (1.6) satisfies

$$\begin{aligned} \sup \limits _{x\in \Omega _{x_{0}}} \sup \limits _{f\in H^{s+d}(\Omega _{x_{0}},M)\cap L^{\infty }(M)} E|\widehat{f^{(d)}_{n}}(x)-f^{(d)}(x)|^{p}\lesssim \Big (\frac{\ln n}{n}\Big )^{\frac{sp}{2s+2d+1}}. \end{aligned}$$

Remark 1.3

The same as Remark 1.1, when $d=0$, Theorem 1.3 can be reduced to the conclusion of Theorem 4 in one dimension in Ref. [13].

Theorem 1.4

Let $\varphi $ be t regular with $t\ge d\ge 0$. Then for $0<s<t$, $r,q\in [1,\infty ]$ and $p\in [1,\infty )$, the data-driven estimator $\widehat{f_{n}^{(d)}}$ in (1.6) satisfies

$$\begin{aligned} \sup _{f\in B_{r,q}^{s+d}(M,T)\cap L^{\infty }(M)} E\Vert \widehat{f_{n}^{(d)}}I_{[-T,T]}-f^{(d)}\Vert _{p}^{p}\lesssim \Big (\frac{\ln n}{n}\Big )^{\theta p}, \end{aligned}$$

where

$$\begin{aligned} \theta :=\left\{ \begin{array}{rcl} & \frac{s}{2s+2d+1},~~ & \hbox {1}\le p<\frac{2sr}{2d+1}+r; \\ & \frac{sr}{(2d+1)p}, ~~& \hbox {p}\ge \frac{2sr}{2d+1}+r,~s\le \frac{1}{r};\\ & \frac{s-\frac{1}{r}+\frac{1}{p}}{2(s-\frac{1}{r})+2d+1}, ~~ & \hbox {p}\ge \frac{2sr}{2d+1}+r,~s>\frac{1}{r}. \end{array} \right. \end{aligned}$$

Remark 1.4

According to Theorem 3.3 and Theorem 4.3 in Ref. [12], the convergence rates in Theorem 1.4 are optimal (up to a logarithmic factor) for the case of $s>\frac{1}{r}$. However, the situation is unclear for $s\le \frac{1}{r}$. Therefore, one of our future work is to determine the optimality of this statistical model for the case $s\le \frac{1}{r}$.

Remark 1.5

When $d=0$ and $s>\frac{1}{r}$, the convergence rate $\theta =\min \left\{ \frac{s}{2s+1},~\frac{s-\frac{1}{r}+\frac{1}{p}}{2(s-\frac{1}{r})+1}\right\} $ coincides with the works of Donoho et al. in Ref. [6]. In addition, the estimation for the case $s\le \frac{1}{r}$ is considered in Theorem 1.4.

2 Some Lemmas and Propositions

In this section, we provide some lemmas and propositions which are necessary in the proofs of main results. Rosenthal’s inequality is introduced first.

Rosenthal’s inequality [8]. Let $p>0$ and $X_1,X_2,\ldots ,X_n$ be the independent random variables satisfying $EX_i=0$ and $E|X_i|^p<\infty $ $(i=1,2,\ldots ,n)$. Subsequently, there exists $C(p)>0$ such that

$$\begin{aligned} E\left| \sum \limits _{i=1}^nX_i\right| ^p\le C(p)\left\{ \sum \limits _{i=1}^nE|X_i|^pI_{\{p>2\}}+\left( \sum \limits _{i=1}^n EX_{i}^{2}\right) ^{\frac{p}{2}}\right\} . \end{aligned}$$

Next, the following lemma is established, which is important for the proof of Theorem 1.1.

Lemma 2.1

Let $\varphi $ be t regular with $t\ge d$ and $\hat{\alpha }_{jk}$ be defined in (1.2). Then for $f\in L^{\infty }(M)$ with $M>0$ and $2^{j}\le n$,

$$\begin{aligned} E|\hat{\alpha }_{jk}-\alpha _{jk}|^{p}\lesssim n^{-\frac{p}{2}}2^{jdp}, \end{aligned}$$

where the constant in $``\lesssim "$ only depends on $\varphi $ and M.

Proof

According to the definition of $\hat{\alpha }_{jk}$, one has $E\hat{\alpha }_{jk}=\alpha _{jk}$ and

$$\begin{aligned} E|\hat{\alpha }_{jk}-\alpha _{jk}|^{p} = \frac{1}{n^{p}}E\left| \sum _{i=1}^{n} \Big \{[\varphi _{jk}]^{(d)}(X_{i})-E[\varphi _{jk}]^{(d)}(X_{i})\Big \}\right| ^{p} = \frac{1}{n^{p}}E\left| \sum _{i=1}^{n}\eta _i\right| ^{p},~~~ \end{aligned}$$

(2.1)

where $\eta _i:=[\varphi _{jk}]^{(d)}(X_{i})-E[\varphi _{jk}]^{(d)}(X_{i})$. Clearly, $\{\eta _i\}_{i=1}^n$ are i.i.d. samples and $E\eta _i=0,~i=1,\ldots ,n$.

On the other hand, for $i=1,\ldots ,n,$

$$\begin{aligned} E|\eta _i|^{2} \le E\left( [\varphi _{jk}]^{(d)}(X_{i})\right) ^{2} = 2^{j}2^{2jd}\int _{\mathbb {R}}[\varphi ^{(d)}(2^{j}x-k)]^{2}f(x)dx \lesssim 2^{2jd} \end{aligned}$$

and $\Vert \eta _i\Vert _{\infty } \lesssim \Vert [\varphi _{jk}]^{(d)}\Vert _{\infty } \lesssim 2^{j(\frac{1}{2}+d)} $ by the regularity of $\varphi $ and $\Vert f\Vert _{\infty }\lesssim 1$. These with Rosenthal’s inequality and $2^{j}\le n$ show that

$$\begin{aligned} & \hspace{-2pc}E\left| \sum _{i=1}^{n}\eta _i\right| ^{p}\lesssim \sum _{i=1}^{n}E|\eta _i|^{p}I_{\{p>2\}}+ \left( \sum _{i=1}^{n}E\eta _i^{2}\right) ^{\frac{p}{2}}\nonumber \\\lesssim & n^{\frac{p}{2}}2^{jdp}[(n^{-1}2^{j})^{\frac{p}{2}-1}I_{\{p>2\}}+1] \lesssim n^{\frac{p}{2}}2^{jdp}. \end{aligned}$$

(2.2)

Finally, the desired conclusion is concluded by (2.1) and (2.2). The proof is done. $\square $

We give the next lemma in order to prove Proposition 2.1.

Lemma 2.2

Let $K_{j}(v,x):=(-1)^{d}\sum _{k\in \mathbb {Z}}[\varphi _{jk}]^{(d)}(v)\varphi _{jk}(x)$ and $\varphi $ be t regular with $t\ge d\ge 0$. Then for $f\in L^{\infty }(M)$,

$$\begin{aligned} |K_{j}(v,x)|\le M_{1}2^{j(d+1)} ~~~~\text{ and }~~~~ E|K_{j}(X_{1},x)|^{2}\le M_{1}2^{j(2d+1)}, \end{aligned}$$

where $M_1\ge 1$ is some constant.

Proof

By the definition of $K_j(v,x)$, one finds easily that

$$\begin{aligned} |K_{j}(v,x)|=\left| 2^{j(d+1)}\sum _{k\in \mathbb {Z}}\varphi ^{(d)}(2^{j}v-k)\varphi (2^{j}x-k)\right| \le \Vert \Theta _{\varphi }\Vert _{\infty }\Vert \varphi ^{(d)}\Vert _{\infty }2^{j(d+1)} \end{aligned}$$

(2.3)

because of the regularity of $\varphi $. On the other hand,

$$\begin{aligned} \int _\mathbb {R}|K_{j}(v,x)|dv\le & 2^{j(d+1)}\int _\mathbb {R} \sum _{k\in \mathbb {Z}} |\varphi ^{(d)}(2^{j}v-k)||\varphi (2^{j}x-k)| dv\\= & 2^{j(d+1)}\sum _{k\in \mathbb {Z}}|\varphi (2^{j}x-k)| \int _\mathbb {R} |\varphi ^{(d)}(2^{j}v-k)|dv\\\le & \Vert \Theta _{\varphi }\Vert _{\infty }\Vert \varphi ^{(d)}\Vert _{1} 2^{jd}. \end{aligned}$$

Furthermore,

$$\begin{aligned} E|K_{j}(X_{1},x)|^{2}\le & \Vert \Theta _{\varphi }\Vert _{\infty }\Vert \varphi ^{(d)}\Vert _{\infty }\Vert f\Vert _{\infty }2^{j(d+1)} \int _{\mathbb {R}}|K_{j}(v,x)|dv\nonumber \\\le & \Vert \Theta _{\varphi }\Vert _{\infty }^{2} \Vert \varphi ^{(d)}\Vert _{\infty }\Vert \varphi ^{(d)}\Vert _{1}M 2^{j(2d+1)}. \end{aligned}$$

(2.4)

Choosing $M_1:=\max \{\Vert \Theta _{\varphi }\Vert _{\infty }\Vert \varphi ^{(d)}\Vert _{\infty }, ~\Vert \Theta _{\varphi }\Vert _{\infty }^{2} \Vert \varphi ^{(d)}\Vert _{\infty }\Vert \varphi ^{(d)}\Vert _{1}M,~1\}$, then it follows from (2.3)–(2.4) that the final conclusions. $\square $

To show Proposition 2.1, we need another well-known inequality.

Bernstein’s inequality [8]. Let $\eta _{1},\ldots ,\eta _{n}$ be i.i.d. random variables with $E\eta _{i}=0$, $E\eta _{i}^{2}\le \sigma ^{2}$ and $|\eta _{i}|\le M$ $(i=1,2,\ldots ,n)$. Then for any $\epsilon >0$,

$$\begin{aligned} P\left\{ \left| \frac{1}{n}\sum _{i=1}^n\eta _{i}\right| \ge \epsilon \right\} \le 2\exp \Big \{-\frac{n\epsilon ^{2}}{2(\sigma ^{2}+M\epsilon /3)}\Big \}. \end{aligned}$$

Now, we introduce the first proposition which plays important roles in the proofs of Theorems 1.3–1.4.

Proposition 2.1

Let $f\in L^{\infty }(M)$ and $\varphi $ be t regular with $t\ge d\ge 0$. Then for each $x\in \mathbb {R}$ and $p\in [1,\infty )$, there exists $\lambda >6M_{1}^{2}p^{2}$ such that

$$\begin{aligned} E[\aleph _{n}(x)]^{p}\lesssim \left( \frac{\ln n}{n}\right) ^{\frac{p}{2}}, \end{aligned}$$

where $\aleph _{n}(x)$ is given by (1.8) and $M_1\ge 1$ is the constant in Lemma 2.2.

Proof

For each $j\in \mathcal {H}$, one denotes

$$\begin{aligned} \overline{\tau _{n}(j)}:=\left( \frac{6M_{1}^{2}p2^{j(2d+1)}\lambda _j}{n}\right) ^{\frac{1}{2}}, \end{aligned}$$

(2.5)

where $\lambda _{j}:=\max \{(2d+1)p j\ln 2,~1\}$. Note that the inequality $\lambda \ln n \ge 6M_{1}^{2}p\lambda _j$ holds for large n, since $\lambda >6M_{1}^{2}p^{2}$ and $j\in \mathcal {H}$. Hence, $\overline{\tau _{n}(j)}\le \tau _{n}(j)$ thanks to (1.5) and (2.5). Moreover,

$$\begin{aligned} \Big [|S_{n}(x,j)|-\tau _{n}(j)\Big ]_+ \le \Big [|S_{n}(x,j)|-\overline{\tau _{n}(j)}\Big ]_+. \end{aligned}$$

(2.6)

For any $t\ge 0$,

$$\begin{aligned} P\left\{ \big [|S_{n}(x,j)|-\overline{\tau _{n}(j)}~\big ]_+>t\right\} =P\left\{ |S_{n}(x,j)|-\overline{\tau _{n}(j)}>t\right\} . \end{aligned}$$

Therefore,

$$\begin{aligned} E\Big [|S_{n}(x,j)|-\overline{\tau _{n}(j)}\Big ]_+^{p} =p\int _0^\infty t^{p-1} P\left\{ |S_{n}(x,j)|-\overline{\tau _{n}(j)}>t\right\} dt. \end{aligned}$$

This with variable substitution $t=\omega \overline{\tau _{n}(j)}$ shows that

$$\begin{aligned} E\Big [|S_{n}(x,j)|-\overline{\tau _{n}(j)}\Big ]_+^{p}&\le p\int _{0}^{\infty }[\omega \overline{\tau _{n}(j)}]^{p-1}P\left\{ |S_{n}(x,j)|> \overline{\tau _{n}(j)}(\omega +1) \right\} \overline{\tau _{n}(j)} d\omega \nonumber \\&= p[\overline{\tau _{n}(j)}]^{p}\int _{0}^{\infty }\omega ^{p-1}P\left\{ |S_{n}(x,j)|> \overline{\tau _{n}(j)}(\omega +1)\right\} d\omega . \end{aligned}$$

(2.7)

On the other hand, it is easy to see that the estimator $\widehat{f_{j}^{(d)}}(x)$ in (1.6) can be rewritten as

$$\begin{aligned} \widehat{f_{j}^{(d)}}(x)=\frac{1}{n}\sum _{i=1}^{n}K_{j}(X_{i},x), \end{aligned}$$

because $K_{j}(v,x):=(-1)^{d}\sum _{k\in \mathbb {Z}}[\varphi _{jk}]^{(d)}(v)\varphi _{jk}(x)$ in Lemma 2.2. This with (1.7) and Lemma 2.2 implies that $S_n(x,j) =\frac{1}{n}\sum _{i=1}^{n}[K_{j}(X_{i},x)-EK_{j}(X_{i},x)]$ and

$$\begin{aligned} |K_{j}(X_{i},x)|\le M_{1}2^{j(d+1)},~~ E|K_{j}(X_{i},x)|^{2}\le M_{1}2^{j(2d+1)}. \end{aligned}$$

Furthermore,

$$\begin{aligned} & P\left\{ |S_{n}(x,j)|> \overline{\tau _{n}(j)}(\omega +1)\right\} \nonumber \\ & \quad \le 2\exp \left\{ -\frac{n[\overline{\tau _{n}(j)}]^{2}(\omega +1)^{2}}{2[M_{1}2^{j(2d+1)}+2M_{1}2^{j(d+1)}\overline{\tau _{n}(j)}(\omega +1)/3]} \right\} \end{aligned}$$

(2.8)

thanks to Bernstein’s inequality.

For $j\in \mathcal {H}$, $\overline{\tau _{n}(j)}=\Big (\frac{6M_{1}^{2}p2^{j(2d+1)}\lambda _{j}}{n}\Big )^{\frac{1}{2}} \le 3M_{1}p$ holds for large n. Thus,

$$\begin{aligned} 2[M_{1}2^{j(2d+1)}+2M_{1}2^{j(d+1)}\overline{\tau _{n}(j)}(\omega +1)/3] \le 6M_{1}^{2}p2^{j(2d+1)}(\omega +1) \end{aligned}$$

due to $M_{1},p\ge 1$ and $\omega >0$. Substituting this above estimate into (2.8), one obtains that

$$\begin{aligned} P\left\{ |S_{n}(x,j)|> \overline{\tau _{n}(j)}(\omega +1)\right\} \le 2e^{-\lambda _j(\omega +1)}. \end{aligned}$$

Then it follows from $\lambda _{j}=\max \{(2d+1)p j\ln 2,~1\}\ge 1$ that

$$\begin{aligned} P\left\{ |S_{n}(x,j)|> \overline{\tau _{n}(j)}(\omega +1)\right\} \le 2e^{-\lambda _j\omega }e^{-\lambda _j} \le 2e^{-\omega }e^{-\lambda _j}. \end{aligned}$$

Combining this with (2.7) and $\overline{\tau _{n}(j)}:=\left( \frac{6M_{1}^{2}p2^{j(2d+1)}\lambda _j}{n}\right) ^{\frac{1}{2}}$, one concludes that

$$\begin{aligned} E\Big [|S_{n}(x,j)|&-\overline{\tau _{n}(j)}\Big ]_+^{p} \lesssim [\overline{\tau _{n}(j)}]^{p} e^{-\lambda _j}\int _{0}^{\infty }\omega ^{p-1}e^{-\omega }d\omega \lesssim \left( \frac{6M_{1}^{2}p2^{j(2d+1)}\lambda _{j}}{n}\right) ^{\frac{p}{2}}e^{-\lambda _j}. \end{aligned}$$

Hence, according to $\lambda _{j}\lesssim \ln n$ and $e^{-\lambda _{j}}\le 2^{-(2d+1)p j}$, one knows

$$\begin{aligned} \sum _{j\in \mathcal {H}}E\Big [|S_{n}(x,j)|-\overline{\tau _{n}(j)}\Big ]_+^{p} \lesssim \sum _{j\in \mathcal {H}}\left( \frac{\ln n}{n}\right) ^{\frac{p}{2}} 2^{(d+\frac{1}{2})p j}2^{-(2d+1)p j} \lesssim \left( \frac{\ln n}{n}\right) ^{\frac{p}{2}}. \end{aligned}$$

This with (1.8) and (2.6) leads to

$$\begin{aligned} E[\aleph _{n}(x)]^{p} \lesssim E\sup _{j\in \mathcal {H}}\Big [|S_{n}(x,j)|-\overline{\tau _{n}(j)}\Big ]_+^{p} \lesssim \sum _{j\in \mathcal {H}}E\Big [|S_{n}(x,j)|-\overline{\tau _{n}(j)}\Big ]_+^{p} \lesssim \left( \frac{\ln n}{n}\right) ^{\frac{p}{2}}, \end{aligned}$$

which completes the proof. $\square $

To introduce Proposition 2.2, we also need the following notations:

$$\begin{aligned} \mathfrak {M}(x,f):= & \inf _{j\in \mathcal {H}}\{B_{j}^{*}(x,f)+\tau _{n}(j)\},\end{aligned}$$

(2.9)

$$\begin{aligned} \Lambda _{m}:= & \{x\in [-T,T],~2^{m}\delta _n<\mathfrak {M}(x,f)\le 2^{m+1}\delta _n\}, \end{aligned}$$

(2.10)

where $\delta _n=(\frac{C\ln n}{n})^{\frac{s}{2s+2d+1}}$, $C>1$ is some constant and $T>0$ is defined by (1.1).

Note that $\mathfrak {M}(x,f)\le c_0:=\sup _x\mathfrak {M}(x,f)$, if $\varphi $ is t regular and $\Vert f^{(d)}\Vert _{\infty }\lesssim 1$. Then there exists

$$\begin{aligned} m_2:=\min \{m\in \mathbb {Z},~2^{m}\delta _n\ge c_0\} \end{aligned}$$

(2.11)

such that $\Lambda _{m}=\emptyset $ for each $m>m_{2}$. Obviously, $m_{2}>0$ for large n.

Next, another useful proposition is provided which is one of the main ingredients in the proof of Theorem 1.4.

Proposition 2.2

Let $f\in B_{r,q}^{s+d}(M)$ and $\varphi $ be t regular with $t\ge d\ge 0$. Then for $m\in \mathbb {Z}$ satisfying $0\le m\le m_2$ and each $p\in [1,\infty )$,

$$\begin{aligned} Q_m:=\int _{\Lambda _{m}}[\mathfrak {M}(x,f)]^pdx\lesssim 2^{m(p-r-\frac{2sr}{2d+1})}\delta _n^{p}; \end{aligned}$$

Moreover, if $s>\frac{1}{r}$ and $r\le p$, then with $s':=s-\frac{1}{r}+\frac{1}{p}$,

$$\begin{aligned} Q_m=\int _{\Lambda _{m}}[\mathfrak {M}(x,f)]^pdx\lesssim 2^{-\frac{2ms'p}{2d+1}}\delta _n^{\frac{s'}{s}p}, \end{aligned}$$

where $\mathfrak {M}(x,f)$ and $\Lambda _m$ are defined in (2.9)–(2.10) respectively.

Proof

The proof is similar to the second part of Proposition 3.2 in Ref. [2]. Here, one provides only some important steps to prove this proposition.

Take $j_2$ satisfying $c_12^{\frac{2m}{2d+1}}\delta _n^{-\frac{1}{s}}\le 2^{j_{2}}\le c_22^{\frac{2m}{2d+1}}\delta _n^{-\frac{1}{s}}$, where two positive constants $c_1,c_2$ satisfy $ (2\,M)^{\frac{1}{s}}I_{\{r=\infty \}}<c_1<c_2< \min \left\{ \frac{C}{4c_0^{2}},~\frac{C}{4\lambda }\right\} ^{\frac{1}{2d+1}}.$ Then $j_{2}\in \mathcal {H}$ and $\tau _{n}(j_2)\le 2^{m-1}\delta _n$ for large n and $0<m\le m_2$.

Clearly, by $\Lambda _m=\{x\in [-T,T],~2^{m}\delta _n <\mathfrak {M}(x,f)\le 2^{m+1}\delta _n\}$,

$$\begin{aligned} Q_m=\int _{\Lambda _m}[\mathfrak {M}(x,f)]^pdx\le (2^{m+1}\delta _n)^p|\Lambda _m|, \end{aligned}$$

(2.12)

where $|\Lambda _m|$ stands for the Lebesgue measure of the set $\Lambda _m$. On the other hand,

$$\begin{aligned} |\Lambda _m| \le |\{x\in [-T,T],~B_{j_{2}}^{*}(x,f)>2^{m-1}\delta _n\}|. \end{aligned}$$

(2.13)

When $1\le r<\infty $, according to Chebyshev’s inequality, (1.8), (2.13) and $f\in B_{r,q}^{s+d}(M)$, one has

$$\begin{aligned} |\Lambda _m|\le & \sum _{j\in \mathcal {H},j\ge j_{2}}|\{x\in [-T,T],~B_{j}(x,f)>2^{m-1}\delta _n\}|\nonumber \\\le & \sum _{j\in \mathcal {H},j\ge j_{2}}\frac{\Vert B_{j}(\cdot ,f)\Vert _r^r}{(2^{m-1}\delta _n)^r} \lesssim 2^{-mr}\delta _n^{-r}2^{-j_{2}sr}. \end{aligned}$$

(2.14)

Substituting (2.14) into (2.12), one obtains that

$$\begin{aligned} Q_m\lesssim (2^{m+1}\delta _n)^{p}2^{-mr}\delta _n^{-r}2^{-j_{2}sr}\lesssim 2^{m(p-r)}\delta _n^{p-r}2^{-{j_2}sr}\lesssim 2^{m(p-r-\frac{2sr}{2d+1})}\delta _n^{p}\quad \end{aligned}$$

due to $2^{j_{2}}\thicksim 2^{\frac{2m}{2d+1}}\delta _{n}^{-\frac{1}{s}}$.

For the case $r=\infty $, it follows from $f\in B_{r,q}^{s+d}(M)$ and $m>0$ that $ B_{j_{2}}^{*}(x,f)= \sup _{j'\ge j_2}B_{j'}(x,f)\le M 2^{-j_{2}s}\le Mc_1^{^{-s}} 2^{-\frac{2ms}{2d+1}}\delta _n \le 2^{m-1}\delta _n$ thanks to the choice of $2^{j_{2}}\ge c_12^{\frac{2m}{2d+1}}\delta _{n}^{-\frac{1}{s}}$ with $c_1>(2M)^{\frac{1}{s}}$. Thus, $|\Lambda _{m}|=0$ because of (2.13). Furthermore, it reduces to $ Q_m\le (2^{m+1}\delta _n)^p|\Lambda _{m}|=0 $ by (2.12).

Finally, one discusses the case of $s>\frac{1}{r}$ and $r\le p$. Note that $f^{(d)}\in B_{r,q}^{s}\hookrightarrow B_{p,q}^{s'}$ with $s'=s-\frac{1}{r}+\frac{1}{p}$. Similar to (2.14),

$$\begin{aligned} |\Lambda _m| \le \sum _{j\in \mathcal {H},j\ge j_{2}}\frac{\Vert B_{j}(\cdot ,f)\Vert _p^p}{(2^{m-1}\delta _n)^p} \lesssim 2^{-mp}\delta _n^{-p}2^{-j_{2}s'p}. \end{aligned}$$

This with (2.12) and $2^{j_{2}}\thicksim 2^{\frac{2m}{2d+1}}\delta _{n}^{-\frac{1}{s}}$ implies that

$$\begin{aligned} Q_m \lesssim (2^{m+1}\delta _n)^{p}2^{-mp}\delta _n^{-p}2^{-j_{2}s'p} \lesssim 2^{-j_2s'p} \lesssim 2^{-\frac{2ms'p}{2d+1}}\delta _n^{\frac{s'}{s}p}. \end{aligned}$$

The proof is done. $\square $

3 Proofs of Theorems 1.1–1.4

This section is devoted to give the proofs of Theorems 1.1–1.4.

Proof of Theorem 1.1

. By the definition of $\widehat{f^{(d)}_{j^{*}}}(x)$ and $E\hat{\alpha }_{j^{*}k}=\alpha _{j^{*}k}$, it is clear to see that

$$\begin{aligned} E\Big |\widehat{f^{(d)}_{j^{*}}}(x)-E\widehat{f^{(d)}_{j^{*}}}(x)\Big |^{p} =E\left| \sum \limits _k(\hat{\alpha }_{j^{*}k}-\alpha _{j^{*}k}) \varphi _{j^{*}k}(x)\right| ^{p} \end{aligned}$$

Moreover, it follows from the Hölder inequality with $\frac{1}{p}+\frac{1}{p'}=1$ $(p>1)$ that

$$\begin{aligned}&\quad E\Big |\widehat{f^{(d)}_{j^{*}}}(x)-E\widehat{f^{(d)}_{j^{*}}}(x)\Big |^{p} \le E\sum \limits _k|\hat{\alpha }_{j^{*}k}-\alpha _{j^{*}k}|^{p}|\varphi _{j^{*}k}(x)| \left[ \sum \limits _k|\varphi _{j^{*}k}(x)|\right] ^{\frac{p}{p'}}\nonumber \\&\le n^{-\frac{p}{2}}2^{j^{*}dp}\left[ \sum \limits _k|\varphi _{j^{*}k}(x)|\right] ^{1+\frac{p}{p'}} = n^{-\frac{p}{2}}2^{j^{*}dp}\left[ \sum \limits _k|\varphi _{j^{*}k}(x)|\right] ^{p} \lesssim 2^{j^{*}p(d+\frac{1}{2})}n^{-\frac{p}{2}} \end{aligned}$$

(3.1)

thanks to Lemma 2.1. When $p=1$, the above estimate can be concluded directly without using the Hölder inequality.

On the other hand, Lemma 1.1 leads to $ \sup \limits _{x\in \Omega _{x_{0}}}\sup \limits _{f\in H^{s+d}(\Omega _{x_{0}},M)} \Big |E\widehat{f^{(d)}_{j^{*}}}(x)-f^{(d)}(x)\Big |^{p}\lesssim 2^{-j^{*}sp}. $ This with (3.1) and $2^{j^{*}}\thicksim n^{\frac{1}{2s+2d+1}}$ shows

$$\begin{aligned} & \sup \limits _{x\in \Omega _{x_{0}}} \sup \limits _{f\in H^{s+d}(\Omega _{x_{0}},M)\cap L^{\infty }(M)} E\Big |\widehat{f^{(d)}_{j^{*}}}(x)-f^{(d)}(x)\Big |^{p}\\ & \quad \lesssim \sup \limits _{x\in \Omega _{x_{0}}}\sup \limits _{f\in H^{s+d}(\Omega _{x_{0}},M)\cap L^{\infty }(M)} \Big [ E\left| \widehat{f^{(d)}_{j^{*}}}(x)-E\widehat{f^{(d)}_{j^{*}}}(x)\right| ^{p} +\left| E\widehat{f^{(d)}_{j^{*}}}(x)-f^{(d)}(x)\right| ^{p} \Big ]\\ & \quad \lesssim 2^{j^{*}p(d+\frac{1}{2})}n^{-\frac{p}{2}}+2^{-j^{*}sp}\lesssim n^{-\frac{sp}{2s+2d+1}}. \end{aligned}$$

The proof is completed. $\square $

Proof of Theorem 1.2

. According to (1.3) and (1.5), one obtains that

$$\begin{aligned} \Big |\widehat{f_{j\wedge j_0}^{(d)}}(x)-\widehat{f_{j_{0}}^{(d)}}(x)\Big | \le \widehat{R}_{j}(x)+\tau _{n}(j\wedge j_0) +\tau _{n}(j_{0}) \le \widehat{R}_{j}(x)+2\tau _{n}(j_{0}). \end{aligned}$$

(3.2)

The same arguments as (3.2) implies

$$\begin{aligned} \Big |\widehat{f_{j_{0}\wedge j}^{(d)}}(x)-\widehat{f_{j}^{(d)}}(x)\Big | \le \widehat{R}_{j_{0}}(x)+2\tau _{n}(j). \end{aligned}$$

(3.3)

Moreover, combining (3.2) and (3.3), one concludes

$$\begin{aligned} \Big |\widehat{f_{j_{0}\wedge j}^{(d)}}(x)-\widehat{f_{j_{0}}^{(d)}}(x)\Big |+\Big |\widehat{f_{j_{0}\wedge j}^{(d)}}(x)-\widehat{f_{j}^{(d)}}(x)\Big | \le 2\widehat{R}_{j}(x)+4\tau _{n}(j) \end{aligned}$$

(3.4)

due to $\widehat{f_{j_{0}\wedge j}^{(d)}}=\widehat{f_{j\wedge j_{0}}^{(d)}}$ and the selection of $j_0$ in (1.4).

Clearly, by (1.8), $|S_n(x,j)|\le [|S_n(x,j)|-\tau _{n}(j)]_++\tau _{n}(j)\le \aleph _{n}(x)+\tau _{n}(j).$ This with (1.7) and (1.8) shows that

$$\begin{aligned} \Big |\widehat{f_{j}^{(d)}}(x)-f^{(d)}(x)\Big |\le B_{j}(x,f)+|S_{n}(x,j)|\le B_{j}^{*}(x,f)+\aleph _{n}(x)+\tau _{n}(j). \end{aligned}$$

(3.5)

On the other hand, by using (1.3) and (1.7),

$$\begin{aligned} \widehat{R}_{j}(x)= & \sup _{j'\in \mathcal {H}}\Big [|\widehat{f_{j\wedge j'}^{(d)}}(x)-\widehat{f_{j'}^{(d)}}(x)|-\tau _{n}(j\wedge j')-\tau _{n}(j')\Big ]_{+}\\\le & \sup _{j'\in \mathcal {H}}\Big [|E\widehat{f_{j\wedge j'}^{(d)}}(x)-E\widehat{f_{j'}^{(d)}}(x)|\\ & \quad +|S_{n}(x,j\wedge j')|-\tau _{n}(j\wedge j')+|S_{n}(x,j')|-\tau _{n}(j')\Big ]_{+}. \end{aligned}$$

This with $\sup _{j'\in \mathcal {H}} |E\widehat{f_{j\wedge j'}^{(d)}}(x)-E\widehat{f_{j'}^{(d)}}(x)| \le \sup _{\{j'\in \mathcal {H},~j'\ge j\}}\{B_{j\wedge j'}(x,f)+B_{j'}(x,f)\}$ and (1.8) leads to

$$\begin{aligned} \widehat{R}_{j}(x)\le 2B_{j}^{*}(x,f)+2\aleph _{n}(x). \end{aligned}$$

(3.6)

Hence, it follows from (3.4)–(3.6) that

$$\begin{aligned} \Big |\widehat{f_{j_{0}}^{(d)}}(x)-f^{(d)}(x)\Big |\le & \Big |\widehat{f_{j_{0}\wedge j}^{(d)}}(x)-\widehat{f_{j_{0}}^{(d)}}(x)\Big | +\Big |\widehat{f_{j_{0}\wedge j}^{(d)}}(x)-\widehat{f_{j}^{(d)}}(x)\Big | +\Big |\widehat{f_{j}^{(d)}}(x)-f^{(d)}(x)\Big |\\\le & 5B_{j}^{*}(x,f)+5\aleph _{n}(x)+5\tau _{n}(j) \end{aligned}$$

holds for each $j\in \mathcal {H}$. Furthermore,

$$\begin{aligned} \Big |\widehat{f_{n}^{(d)}}(x)-f^{(d)}(x)\Big | =\Big |\widehat{f_{j_0}^{(d)}}(x)-f^{(d)}(x)\Big |\le \inf _{j\in \mathcal {H}}\left\{ 5B_{j}^{*}(x,f)+5\tau _{n}(j)\right\} +5\aleph _{n}(x) \end{aligned}$$

thanks to $\widehat{f_{n}^{(d)}}(x)=\widehat{f_{j_0}^{(d)}}(x)$ in (1.6). Hence, Theorem 1.2 is proved. $\square $

Proof of Theorem 1.3

. Take $j_{1}$ satisfying $2^{j_{1}}\thicksim (\frac{n}{\ln n})^{\frac{1}{2s+2d+1}}$. Then $j_{1}\in \mathcal {H}$ for large n and $s>0$. Moreover, Theorem 1.2 yields that

$$\begin{aligned} E\Big |\widehat{f_{n}^{(d)}}(x)-f^{(d)}(x)\Big |^{p} \lesssim [B_{j_{1}}^{*}(x,f)]^{p}+[\tau _{n}(j_{1})]^{p}+E[\aleph _{n}(x)]^{p} \end{aligned}$$

(3.7)

holds for any $x\in \Omega _{x_{0}}$.

By (1.5) and the given choice $2^{j_{1}}\thicksim (\frac{n}{\ln n})^{\frac{1}{2s+2d+1}}$, one finds easily

$$\begin{aligned} [\tau _{n}(j_{1})]^{p}+E[\aleph _{n}(x)]^{p}\lesssim \Big (\frac{\ln n}{n}\Big )^{\frac{sp}{2s+2d+1}} \end{aligned}$$

(3.8)

due to Proposition 2.1. On the other hand, (1.7)–(1.8) and Lemma 1.1 lead to

$$\begin{aligned} [B_{j_{1}}^{*}(x,f)]^{p}: =\left[ \sup _{j'\in \mathcal {H},~j'\ge j_{1}}B_{j'}(x,f)\right] ^{p} =\left[ \sup _{j'\in \mathcal {H},~j'\ge j_{1}}|P_{j'}f^{(d)}(x)-f^{(d)}(x)|\right] ^{p} \lesssim 2^{-j_{1}sp} \end{aligned}$$

holds for any $x\in \Omega _{x_{0}}$ and $f\in H^{s+d}(\Omega _{x_{0}},M)$. This with the choice $2^{j_{1}}\thicksim (\frac{n}{\ln n})^{\frac{1}{2s+2d+1}}$ implies that

$$\begin{aligned} \sup \limits _{x\in \Omega _{x_{0}}} \sup \limits _{f\in H^{s+d}(\Omega _{x_{0}},M)} [B_{j_{1}}^{*}(x,f)]^{p} \lesssim 2^{-j_{1}sp} \lesssim \Big (\frac{\ln n}{n}\Big )^{\frac{sp}{2s+2d+1}}. \end{aligned}$$

(3.9)

Finally, the desired conclusion can be concluded from (3.7)–(3.9). The proof is finished. $\square $

Proof of Theorem 1.4

. Recall that $\Lambda _{m}=\{x\in [-T,T],~2^{m}\delta _n<\mathfrak {M}(x,f)\le 2^{m+1}\delta _n\}$ due to (2.10). Define $\Lambda _{0}^{-}:=\{x\in [-T,T],~\mathfrak {M}(x,f)\le \delta _{n}\}$ with $\delta _{n}=(\frac{C\ln n}{n})^{\frac{s}{2s+2d+1}}$. Then for each $p\in [1,\infty )$,

$$\begin{aligned} E\Vert \widehat{f_{n}^{(d)}}I_{[-T,T]}-f^{(d)}\Vert _{p}^{p}= & E\int _{-T}^{T}\Big |\widehat{f_{n}^{(d)}}(x)-f^{(d)}(x)\Big |^{p}dx\nonumber \\\lesssim & \int _{-T}^{T}[\mathfrak {M}(x,f)]^{p}dx+\int _{-T}^{T}E[\aleph _{n}(x)]^{p}dx\nonumber \\\lesssim & \int _{\Lambda _{0}^{-}}[\mathfrak {M}(x,f)]^{p}dx+\nonumber \\ & \sum _{m=0}^{m_{2}}\int _{\Lambda _{m}}[\mathfrak {M}(x,f)]^{p}dx +\Big (\frac{\ln n}{n}\Big )^{\frac{p}{2}}\nonumber \\\lesssim & \sum _{m=0}^{m_{2}}Q_{m}+ \delta _n^{p} \end{aligned}$$

(3.10)

thanks to $supp~f\subset [-T,T]$, Theorem 1.2, (2.9) and Proposition 2.1.

To complete the proof, one divides (3.10) into three regions. Recall that $2^{m_{2}}\thicksim \delta _n^{-1}$ and $\delta _n\thicksim (\frac{\ln n}{n})^{\frac{s}{2s+2d+1}}$ by (2.10)–(2.11). By Proposition 2.2, the following estimations are established.

(i). For $1\le p<\frac{2sr}{2d+1}+r$,

$$\begin{aligned} \sum _{m=0}^{m_{2}}Q_{m}+ \delta _n^{p} \lesssim \delta _n^{p} \lesssim \Big (\frac{\ln n}{n}\Big )^{\frac{sp}{2s+2d+1}}. \end{aligned}$$

(3.11)

(ii). For $p\ge \frac{2sr}{2d+1}+r$,

$$\begin{aligned} \sum _{m=0}^{m_{2}}Q_{m}+ \delta _n^{p} \lesssim 2^{m_{2}(p-r-\frac{2sr}{2d+1})}\delta _n^{p}+ \delta _n^{p} \lesssim \Big (\frac{\ln n}{n}\Big )^{\frac{sr}{2d+1}}. \end{aligned}$$

(3.12)

(iii). For the case $p\ge \frac{2sr}{2d+1}+r$ and $s>\frac{1}{r}$, take $m_1\in \mathbb {Z}$ satisfying

$$\begin{aligned} 2^{m_{1}}\thicksim \delta _n^{\frac{s'p(\frac{1}{s}-\frac{1}{s'})}{(\frac{2s'}{2d+1}+1)p-\frac{2sr}{2d+1}-r}}. \end{aligned}$$

(3.13)

Clearly, $0<m_1<m_2$ due to $r<p,~p\ge \frac{2sr}{2d+1}+r$ and $s>\frac{1}{r}$. Therefore,

$$\begin{aligned} \sum _{m=0}^{m_{2}}Q_{m}+ \delta _n^{p}\le & \sum _{m=0}^{m_1}Q_{m}+\sum _{m=m_1} ^{m_{2}}Q_{m}+ \delta _n^{p}\\\lesssim & 2^{m_{1}(p-r-\frac{2sr}{2d+1})}\delta _n^{p}+ 2^{-\frac{2m_{1}s'p}{2d+1}}\delta _n^{\frac{s'}{s}p}+ \delta _n^{p}. \end{aligned}$$

This with (3.13), $\delta _n\thicksim (\frac{\ln n}{n})^{\frac{s}{2s+2d+1}}$ and $s'=s-\frac{1}{r}+\frac{1}{p}$ tells that

$$\begin{aligned} \sum _{m=0}^{m_{2}}Q_{m}+ \delta _n^{p} \lesssim \Big (\frac{\ln n}{n}\Big )^{\frac{s'p}{2(s-\frac{1}{r})+2d+1}}. \end{aligned}$$

Finally, the desired conclusion follows from (3.10)–(3.12), which completes the proof. $\square $

References

Cao, K.K., Zeng, X.C.: Adaptive wavelet density estimation under independence hypothesis. Results Math. 76(4), 196 (2021)
Article MathSciNet Google Scholar
Cao, K.K., Zeng, X.C.: A data-driven wavelet estimator for deconvolution density estimations. Results Math. 78(4), 156 (2023)
Article MathSciNet Google Scholar
Chaubey, Y.P., Doosti, H., Prakasa Rao, B.L.S.: Wavelet based estimation of the derivatives of a density for a negatively associated process. J. Stat. Theory Pract. 2, 453–463 (2008)
Daubechies, I.: Ten Lectures on Wavelets. SIAM, Philadelphia (1992)
Book Google Scholar
Daubechies, I.: The wavelet transform, time-frequency localization and signal analysis. IEEE Trans. Inform. Theory 36, 961–1005 (1990)
Article MathSciNet Google Scholar
Donoho, D.L., Johnstone, I.M., Kerkyacharian, G., Picard, D.: Density estimation by wavelet thresholding. Ann. Stat. 24(2), 508–539 (1996)
Article MathSciNet Google Scholar
Goldenshluger, A., Lepski, O.: On adaptive minimax density estimation on $\mathbb{R} ^{d}$. Probab. Theory Relat. Fields 159(3–4), 479–543 (2014)
Article Google Scholar
Härdle, W., Kerkyacharian, G., Picard, D., Tsybakov, A.: Wavelets, Approximation and Statistical Applications. Springer, New York (1998)
Book Google Scholar
Huang, S.Y.: Density estimation by wavelet-based reproducing kernels. Stat. Sinica 9, 137–151 (1999)
MathSciNet Google Scholar
Kerkyacharian, G., Picard, D.: Density estimation in Besov spaces. Stat. Probab. Lett. 13, 15–24 (1992)
Article MathSciNet Google Scholar
Li, R., Liu, Y.M.: Wavelet optimal estimations for a density with some additive noises. Appl. Comput. Harmon. Anal. 36(3), 416–433 (2014)
Article MathSciNet Google Scholar
Liu, Y.M., Wang, H.Y.: Wavelet estimations for density derivatives. Sci. China Math. 56(3), 483–495 (2013)
Article MathSciNet Google Scholar
Liu, Y.M., Wu, C.: Point-wise estimation for anisotropic densities. J. Multivar. Anal. 171, 112–125 (2019)
Article MathSciNet Google Scholar
Liu, Y.M., Wu, C.: Point-wise wavelet in the convolution structure density model. J. Fourier Anal. Appl. 26, 81 (2020)
Article MathSciNet Google Scholar
Mallat, S.: A theorey for multiresolution signal decomposition: the wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 11, 674–693 (1989)
Article Google Scholar
Markovich, L.A.: Gamma kernel estimation of the density derivative on the positive semi-axis by dependent data. REVSTAT-Stat. J. 14(3), 327–348 (2016)
MathSciNet Google Scholar
Meyer, Y.: Wavelets and Operators. Cambridge University Press, Cambridge (1992)
Google Scholar
Müller, H.G., Gasser, T.: Optimal convergence properties of kernel estimates of derivatives of a density function. In: Lecture Notes in Mathematics 757. Springer, Berlin, pp 144–154 (1979)
Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 35, 1065–1076 (1962)
Article MathSciNet Google Scholar
Prakasa Rao, B.L.S.: Nonparametric estimation of the derivates of a density by the method of wavelets. Bull. Inform. Cybernet. 28, 91–100 (1996)
Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. Ann. Math. Stat. 27, 832–835 (1956)
Article MathSciNet Google Scholar
Wu, C., Zeng, X.C., Mi, N.: Adaptive and optimal pointwise deconvolution density estimations by wavelets. Adv. Comput. Math. 47, 14 (2021)
Article MathSciNet Google Scholar
Wu, C., Wang, X.C., Wang, J.R.: Wavelet adaptive pointwise density estimations with super-smooth noises. Acta Math. Sinica (Chin. Ser.) 62(5), 687–702 (2019)
MathSciNet Google Scholar
Zeng, X.C.: A note on wavelet deconvolution density estimation. Int. J. Wavelets Multiresolut. Inf. Process. 15(6), 1750055 (2017)
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Nos. 12101459 and 12171016). The authors would like to thank the referees for their valuable suggestions, which greatly improve the readability of the article.

Author information

Authors and Affiliations

School of Mathematics and Statistics, Weifang University, Weifang, 261061, China
Kaikai Cao
Department of Mathematics, Beijing University of Technology, Beijing, 100124, China
Xiaochen Zeng

Authors

Kaikai Cao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochen Zeng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaochen Zeng.

Ethics declarations

Conflict of interest

The authors declare that they have no Conflict of interest.

Additional information

Communicated by Rosihan M. Ali.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cao, K., Zeng, X. Data-Driven Wavelet Estimations for Density Derivatives. Bull. Malays. Math. Sci. Soc. 47, 169 (2024). https://doi.org/10.1007/s40840-024-01766-5

Download citation

Received: 02 April 2024
Revised: 19 August 2024
Accepted: 20 August 2024
Published: 16 September 2024
DOI: https://doi.org/10.1007/s40840-024-01766-5

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Data-Driven Wavelet Estimations for Density Derivatives

Abstract

Similar content being viewed by others

Convergence rates of wavelet density estimation for negatively dependent sample

Pointwise density estimation based on negatively associated data

Adaptive Wavelet Density Estimation Under Independence Hypothesis

1 Introduction

1.1 Wavelets and Function Spaces

Lemma 1.1

Lemma 1.2

1.2 Our Results

Theorem 1.1

Remark 1.1

Remark 1.2

Theorem 1.2

Theorem 1.3

Remark 1.3

Theorem 1.4

Remark 1.4

Remark 1.5

2 Some Lemmas and Propositions

Lemma 2.1

Proof

Lemma 2.2

Proof

Proposition 2.1

Proof

Proposition 2.2

Proof

3 Proofs of Theorems 1.1–1.4

Proof of Theorem 1.1

Proof of Theorem 1.2

Proof of Theorem 1.3

Proof of Theorem 1.4

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation