Abstract
This paper addresses the adaptive wavelet estimations for density derivatives by using data-driven methods. Based on the classical linear wavelet estimator of density derivatives, we provide a point-wise estimation under the local Hölder condition firstly. Moreover, we introduce a data-driven wavelet estimator for adaptivity and prove a point-wise oracle inequality, which does not require any assumption on the underlying function. Finally, by using the point-wise oracle inequality, the point-wise estimation under the local Hölder condition and \(L^p\)-risk (\(1\le p<\infty \)) estimation on Besov spaces are investigated respectively.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The estimations of density derivatives play important roles in the exploration of structures in curves, comparison of regression curves, analysis of human growth data, mean shift clustering and hypothesis testing [16]. More precisely, let \((\Omega ,\mathscr {F},P)\) be a probability measurable space and \(X_{1},\ldots ,X_{n}\) be independent and identically distributed (i.i.d.) random samples with an unknown density function f. The purpose is to estimate the density derivative \(f^{(d)}\) with \(d\in \mathbb {N}\) from the observed data \(X_{1},\ldots ,X_{n}\).
In particular, the density derivative estimation model can be reduced to the density one, when the order \(d=0\). For density estimations, the classical kernel methods give nice estimations [9, 19, 21]. Compared with kernel estimators, the wavelet ones have better performances because of they can provide more local information and have fast wavelet algorithm [5, 15]. For instance, Donoho et al. [6] have made almost perfect achievements in wavelet estimations, which established an adaptive and optimal estimation (up to a logarithmic factor) for a univariate density function over \(L^p\)-risk (\(1\le p<\infty \)) on Besov spaces.
In contrast to the traditional adaptive estimations, Goldenshluger and Lepski [7] constructed a kernel estimator for density functions by using data-driven methods, and provided \(L^p\)-risk (\(1\le p<\infty \)) estimations over anisotropic Nikol’skii classes in 2014. Five years later, Liu and Wu [13] introduced a data-driven wavelet estimator and considered point-wise density estimations under the local anisotropic Hölder condition. Recently, Cao and Zeng [1] investigated the adaptive \(L^p\)-risk (\(1\le p<\infty \)) estimations under the independence hypothesis on Besov spaces by using the data-driven wavelet estimator.
Along with the density estimations, it is often necessary to estimate the derivatives of density function. Müller and Gasser [18] discussed kernel estimations for density derivatives over \(L^{2}\)-risk on Sobolev spaces. Then in 1996, Rao [20] explored wavelet density derivative estimations over \(L^{2}\)-risk on Sobolev spaces. Moreover, Rao’s estimates were generalized to unmatched Besov spaces \(B_{r,q}^s\) and \(L^{p}\)-risk (\(1\le p<\infty \)) in Ref. [3]. In 2013, Liu and Wang [12] defined the new linear and nonlinear wavelet estimators for density derivatives, and provided \(L^{p}\)-risk estimations on Besov spaces, respectively.
This paper investigates the adaptive wavelet estimations for density derivatives. Based on the classical linear wavelet estimator for density derivatives, we show the point-wise estimations under the local Hölder condition firstly. Furthermore, motivated by the works of Goldenshluger and Lepski [7] and Cao and Zeng [1], we introduce a data-driven wavelet estimator for adaptivity and prove a point-wise oracle inequality, which does not require any assumption on the underlying function f or \(f^{(d)}\) (except for the restrictions ensuring the existence of the model and of the risk). Finally, by using the point-wise oracle inequality, we give the point-wise estimations under the local Hölder condition and \(L^p\)-risk \((1\le p<\infty )\) estimations on Besov spaces respectively.
1.1 Wavelets and Function Spaces
We begin with an important concept in wavelet analysis in this subsection. A Multiresolution Analysis (MRA, [8, 17]) is a sequence of closed subspaces \(\{V_{j}\}_{j\in \mathbb {Z}}\) of the square integrable function space \(L^{2}(\mathbb {R})\) satisfying the following properties:
-
(i).
\(V_{j}\subset V_{j+1}\), \(j\in \mathbb {Z}\);
-
(ii).
\(\overline{\bigcup _{j\in \mathbb {Z}} V_{j}}=L^{2}(\mathbb {R})\) (the space \(\bigcup _{j\in \mathbb {Z}} V_{j}\) is dense in \(L^{2}(\mathbb {R})\));
-
(iii).
\(f(2\cdot )\in V_{j+1}\) if and only if \(f(\cdot )\in V_{j}\) for each \(j\in \mathbb {Z}\);
-
(iv).
There exists \(\varphi \in L^{2}(\mathbb {R})\) (scaling function) such that \(\{\varphi (\cdot -k),~k\in \mathbb {Z}\}\) forms an orthonormal basis of \(V_{0}=\overline{\textrm{span}\{\varphi (\cdot -k),k\in \mathbb {Z}\}}\).
Moreover, a wavelet function \(\psi \) can be derived from the scaling function \(\varphi \) in a simple way such that for fixed \(j_0\in \mathbb {N}\), both \(\{\varphi _{j_0k},\psi _{jk}\}_{j\ge j_0,k\in \mathbb {Z}}\) and \(\{\psi _{jk}\}_{j,k\in \mathbb {Z}}\) are orthonormal bases (wavelet bases) of \(L^{2}(\mathbb {R})\), where \(h_{jk}(\cdot ):=2^{\frac{j}{2}}h(2^{j}\cdot -k)\) for \(h=\varphi \) or \(\psi \). Hence, for each \(f\in L^2(\mathbb {R})\),
holds in \(L^2\)-sense, where \(s_{jk}:=\langle f, \varphi _{jk}\rangle \) and \(d_{jk}:=\langle f,\psi _{jk}\rangle \). When \(\varphi \) is t regular, the above identity holds in \(L^{p}\)-sense \((p\ge 1)\). Here and after, a scaling function \(\varphi \) is called t regular [4] (\(t\in \mathbb {N}\)), if \(\varphi \in C^{t}(\mathbb {R})\) and \(|\varphi ^{(r)}(x)|\le C(1+|x|^{2})^{-l}\) for any \(l\in \mathbb {Z}~(r=0,1,\ldots ,t)\). For instance, Daubechies’s scaling function \(D_{2N}\) is t regular for large N and Meyer’s function possesses any order of regularity. Furthermore, it is easy to verify that the regularity of \(\varphi \) implies the regularity of \(\psi \).
As usual, the notation \(P_j\) stands for the orthogonal projection operator from \(L^{2}(\mathbb {R})\) onto the scaling space \(V_{j}\) with the orthonormal basis \(\{\varphi _{jk}\}_{k\in \mathbb {Z}}\). Thus, for each \(f\in L^2(\mathbb {R})\),
with \(s_{jk}:=\langle f,\varphi _{jk}\rangle \). If \(\varphi \) satisfies condition (\(\theta \)), i.e.,
then \(P_{j}f\) is well-defined for \(f\in L^{p}(\mathbb {R})~(1\le p\le \infty )\). Furthermore, Condition (\(\theta \)) can be followed by the regularity of \(\varphi \).
As in Refs. [13, 14], we shall investigate the point-wise estimations under the local Hölder condition. For a univariate function f, the local Hölder condition of order \(s>0\) at the point \(x_{0}\in \mathbb {R}\) means that for a fixed constant \(L>0\) and each \(x,y\in \Omega _{x_{0}}\) (a neighbourhood of the point \(x_0\)),
where [s] stands for the largest integer strictly small than s. Here, all those functions are denoted by \(H^{s}(\Omega _{x_{0}})\). Obviously, \(f\in H^{s+d}(\Omega _{x_{0}})\) if and only if \(f^{(d)}\in H^{s}(\Omega _{x_{0}})\) with \(d\in \mathbb {N}\).
The following lemma is necessary for the point-wise estimations.
Lemma 1.1
[14, 22, 23] Let \(\varphi \in L^{2}(\mathbb {R})\) be t regular scaling function and \(\psi \) be the corresponding wavelet. If \(f\in H^{s}(\Omega _{x_{0}})\cap L^{2}(\mathbb {R})\) with \(s>0\) and \(t\ge [s]\), then for \(x\in \Omega _{x_{0}}\) and sufficiently large j,
-
(i).
\(f(x)=\sum \limits _{k\in \mathbb {Z}}s_{j_{0}k}\varphi _{j_{0}k}(x)+ \sum \limits _{j=j_0}^{\infty }\sum \limits _{k\in \mathbb {Z}}d_{jk}\psi _{jk}(x)\) holds pointwisely;
-
(ii).
\(\sup \limits _{x\in \Omega _{x_{0}}}\sup \limits _{f\in H^{s}(\Omega _{x_{0}})\cap L^{2}(\mathbb {R})}|f(x)-P_{j}f(x)|\lesssim 2^{-js}.\)
Here and throughout, \(A\lesssim B\) stands for \(A\le cB\) with some constant \(c>0\); \(A\gtrsim B\) means \(B\lesssim A\); \(A\thicksim B\) denotes both \(A\lesssim B\) and \(A\gtrsim B\).
In this paper, the notation \(H^{s+d}(\Omega _{x_{0}},M)\) with \(d\in \mathbb {N}\) means that
where M is a positive constant and \(a\vee b:=\max \{a,~b\}\).
On the other hand, the Besov spaces are needed in order to establish \(L^p\)-risk estimations. Let \(W_r^n(\mathbb {R})\) be the Sobolev space with a non-negative integer exponent n,
and \(\Vert f\Vert _{W_r^n}:=\Vert f\Vert _r+\Vert f^{(n)}\Vert _r.\) Then \(L^r(\mathbb {R})\) can be seen as \(W_r^0(\mathbb {R})\). For \(1\le r,q\le \infty \) and \(s=n+\alpha \) with \(\alpha \in (0,1]\), a Besov space \(B_{r,q}^{s}(\mathbb {R})\) is defined by
with the norm \(\Vert f\Vert _{B_{r,q}^{s}}:=\Vert f\Vert _{W_r^n}+\Vert t^{-\alpha }\omega _r^2(f^{(n)},t)\Vert _q^*\). Here, \(\omega _r^2(f,t):=\sup _{|h|\le t}\Vert f(\cdot +2h)-2f(\cdot +h)+f(\cdot )\Vert _r\) denotes the smoothness modulus of f and
Then for \(f\in L^{r}(\mathbb {R})\), \(f\in W_{r}^{n+d}(\mathbb {R})\) if and only if \(f^{(d)}\in W_{r}^{n}(\mathbb {R})\), since \(f^{(n+d)}\in L^{r}(\mathbb {R})\) implies \(f^{(j)}\in L^{r}(\mathbb {R})~(j=1,2,\ldots ,n+d)\) (see Ref. [8]). Hence, \(f\in B_{r,q}^{s+d}(\mathbb {R})\) if and only if \(f^{(d)}\in B_{r,q}^{s}(\mathbb {R})\).
One advantage of wavelet bases is that they can characterize Besov spaces.
Lemma 1.2
[17] Let \(\varphi \) be t regular with \(t>s>0\) and \(\psi \) be the corresponding wavelet. Then for \(f\in L^{r}(\mathbb {R})\) and \(r,q\in [1,\infty ]\), the following conditions are equivalent:
-
(i).
\(f\in B^{s}_{r,q}(\mathbb {R});\)
-
(ii).
\(\{2^{js}\Vert P_{j}f-f\Vert _{r}\}_{j\in \mathbb {Z}}\in l^{q}(\mathbb {Z});\)
-
(iii).
\(\{2^{j(s-\frac{1}{r}+\frac{1}{2})}\Vert \{d_{j\cdot }\}\Vert _{l^r}\}_{j\in \mathbb {Z}}\in l^{q}(\mathbb {Z}).\)
The Besov norm of f can be given by
Furthermore, Lemma 1.2 (i) and (ii) show that \(\Vert P_jf-f\Vert _r\lesssim 2^{-js}\) holds for \(f\in B^{s}_{r,q}(\mathbb {R})\). When \(r\le p\), Lemma 1.2 (i) and (iii) imply that with \(s'-\frac{1}{p}=s-\frac{1}{r}>0\),
where \(A\hookrightarrow B\) stands for a Banach space A continuously embedded in another Banach space B. All these claims can be found in Refs. [11, 24].
In this paper, the notation \(B_{r,q}^{s+d}(M)\) with \(M>0\) stands for
and
Moreover, \(L^\infty (M)\) is defined by the way. On the other hand, it follows form \(f\in B_{r,q}^{s+d}(M)\) that \(f^{(d)}\in B_{r,q}^{s}(\mathbb {R})\) and \(\Vert f^{(d)}\Vert _{B_{r,q}^{s}}\le M\).
1.2 Our Results
As in [3, 20], the linear wavelet estimator for density derivatives is introduced by
where \(\widehat{\alpha }_{jk}:=\frac{(-1)^{d}}{n}\sum _{i=1}^{n} [\varphi _{jk}]^{(d)}(X_{i})\) and \(\varphi \) is t regular with \(t\ge d\). Clearly, \(E\widehat{\alpha }_{jk}=\alpha _{jk}:=\langle f^{(d)},\varphi _{jk}\rangle \) and \(E\widehat{f^{(d)}_{j}}=P_{j}f^{(d)}\).
Next, we are in a position to introduce our results in this paper. The first theorem gives a linear wavelet point-wise estimation for density derivatives under the local Hölder condition.
Theorem 1.1
Let \(\varphi \) be t regular with \(t\ge d\ge 0\) and \(\widehat{f^{(d)}_{j^{*}}}\) be the linear wavelet estimator in (1.2). Then for \(0<s<t\) and \(2^{j^{*}}\thicksim n^{\frac{1}{2s+2d+1}}\),
Remark 1.1
When the order \(d=0\), the density derivative estimation model can be reduced to the classical density one, and Theorem 1.1 coincides with the conclusion of Theorem 3 in one dimension in Ref. [13].
Remark 1.2
Note that the parameter j of the linear wavelet estimator depends on the smoothness index s of unknown density function f in Theorem 1.1, and the estimator in (1.2) is non-adaptive[6, 10, 11].
Motivated by the works in Refs. [1, 2, 7, 14], we provide a selection rule of parameter j in (1.2) only depending on the observed data \(X_{1},\ldots ,X_{n}\), which is so called data-driven version and totally adaptive estimator.
Let \(\mathcal {H}:=\left\{ 0,1,\ldots ,\lfloor \frac{1}{2d+1}\log _2{\frac{n}{\ln n}}\rfloor \right\} \) with \(\lfloor a\rfloor \) denoting the integer part of a. Thus, the selection rule of \(j=j_{0}\) in (1.2) is given by
Here and throughout, \(a\wedge b:=\min \{a,~b\}\), \(a_{+}:=\max \{a,~0\}\) and
where \(\lambda >0\) is a constant determined later on. Clearly, it only depends on the observed data \(X_1,\ldots ,X_n\). Thus, the data-driven wavelet estimator is obtained by
with \(j_0\in \mathcal {H}\) being given in (1.4).
To introduce Theorem 1.2, let
be the bias and the stochastic error of \(\widehat{f_{j}^{(d)}}\), respectively. Furthermore, we define
where \(\tau _{n}(j)\) is given by (1.5).
Then the following point-wise oracle inequality is established, which plays the key roles in the proofs of Theorems 1.3–1.4.
Theorem 1.2
For any \(x\in \mathbb {R}\), the estimator \(\widehat{f_{n}^{(d)}}(x)\) in (1.6) satisfies that
where \(\tau _{n}(j)\) is given by (1.5) and \(B_{j}^{*}(x,f),~\aleph _{n}(x)\) are determined by (1.8).
Moreover, by using Theorem 1.2, we obtain the adaptive point-wise estimation and \(L^p\)-risk \((1\le p<\infty )\) estimation based on the data-driven estimator in (1.6).
Theorem 1.3
Let \(\varphi \) be t regular with \(t\ge d\ge 0\). Then for \(0<s<t\), the data-driven estimator \(\widehat{f^{(d)}_{n}}\) in (1.6) satisfies
Remark 1.3
The same as Remark 1.1, when \(d=0\), Theorem 1.3 can be reduced to the conclusion of Theorem 4 in one dimension in Ref. [13].
Theorem 1.4
Let \(\varphi \) be t regular with \(t\ge d\ge 0\). Then for \(0<s<t\), \(r,q\in [1,\infty ]\) and \(p\in [1,\infty )\), the data-driven estimator \(\widehat{f_{n}^{(d)}}\) in (1.6) satisfies
where
Remark 1.4
According to Theorem 3.3 and Theorem 4.3 in Ref. [12], the convergence rates in Theorem 1.4 are optimal (up to a logarithmic factor) for the case of \(s>\frac{1}{r}\). However, the situation is unclear for \(s\le \frac{1}{r}\). Therefore, one of our future work is to determine the optimality of this statistical model for the case \(s\le \frac{1}{r}\).
Remark 1.5
When \(d=0\) and \(s>\frac{1}{r}\), the convergence rate \(\theta =\min \left\{ \frac{s}{2s+1},~\frac{s-\frac{1}{r}+\frac{1}{p}}{2(s-\frac{1}{r})+1}\right\} \) coincides with the works of Donoho et al. in Ref. [6]. In addition, the estimation for the case \(s\le \frac{1}{r}\) is considered in Theorem 1.4.
2 Some Lemmas and Propositions
In this section, we provide some lemmas and propositions which are necessary in the proofs of main results. Rosenthal’s inequality is introduced first.
Rosenthal’s inequality [8]. Let \(p>0\) and \(X_1,X_2,\ldots ,X_n\) be the independent random variables satisfying \(EX_i=0\) and \(E|X_i|^p<\infty \) \((i=1,2,\ldots ,n)\). Subsequently, there exists \(C(p)>0\) such that
Next, the following lemma is established, which is important for the proof of Theorem 1.1.
Lemma 2.1
Let \(\varphi \) be t regular with \(t\ge d\) and \(\hat{\alpha }_{jk}\) be defined in (1.2). Then for \(f\in L^{\infty }(M)\) with \(M>0\) and \(2^{j}\le n\),
where the constant in \(``\lesssim "\) only depends on \(\varphi \) and M.
Proof
According to the definition of \(\hat{\alpha }_{jk}\), one has \(E\hat{\alpha }_{jk}=\alpha _{jk}\) and
where \(\eta _i:=[\varphi _{jk}]^{(d)}(X_{i})-E[\varphi _{jk}]^{(d)}(X_{i})\). Clearly, \(\{\eta _i\}_{i=1}^n\) are i.i.d. samples and \(E\eta _i=0,~i=1,\ldots ,n\).
On the other hand, for \(i=1,\ldots ,n,\)
and \(\Vert \eta _i\Vert _{\infty } \lesssim \Vert [\varphi _{jk}]^{(d)}\Vert _{\infty } \lesssim 2^{j(\frac{1}{2}+d)} \) by the regularity of \(\varphi \) and \(\Vert f\Vert _{\infty }\lesssim 1\). These with Rosenthal’s inequality and \(2^{j}\le n\) show that
Finally, the desired conclusion is concluded by (2.1) and (2.2). The proof is done. \(\square \)
We give the next lemma in order to prove Proposition 2.1.
Lemma 2.2
Let \(K_{j}(v,x):=(-1)^{d}\sum _{k\in \mathbb {Z}}[\varphi _{jk}]^{(d)}(v)\varphi _{jk}(x)\) and \(\varphi \) be t regular with \(t\ge d\ge 0\). Then for \(f\in L^{\infty }(M)\),
where \(M_1\ge 1\) is some constant.
Proof
By the definition of \(K_j(v,x)\), one finds easily that
because of the regularity of \(\varphi \). On the other hand,
Furthermore,
Choosing \(M_1:=\max \{\Vert \Theta _{\varphi }\Vert _{\infty }\Vert \varphi ^{(d)}\Vert _{\infty }, ~\Vert \Theta _{\varphi }\Vert _{\infty }^{2} \Vert \varphi ^{(d)}\Vert _{\infty }\Vert \varphi ^{(d)}\Vert _{1}M,~1\}\), then it follows from (2.3)–(2.4) that the final conclusions. \(\square \)
To show Proposition 2.1, we need another well-known inequality.
Bernstein’s inequality [8]. Let \(\eta _{1},\ldots ,\eta _{n}\) be i.i.d. random variables with \(E\eta _{i}=0\), \(E\eta _{i}^{2}\le \sigma ^{2}\) and \(|\eta _{i}|\le M\) \((i=1,2,\ldots ,n)\). Then for any \(\epsilon >0\),
Now, we introduce the first proposition which plays important roles in the proofs of Theorems 1.3–1.4.
Proposition 2.1
Let \(f\in L^{\infty }(M)\) and \(\varphi \) be t regular with \(t\ge d\ge 0\). Then for each \(x\in \mathbb {R}\) and \(p\in [1,\infty )\), there exists \(\lambda >6M_{1}^{2}p^{2}\) such that
where \(\aleph _{n}(x)\) is given by (1.8) and \(M_1\ge 1\) is the constant in Lemma 2.2.
Proof
For each \(j\in \mathcal {H}\), one denotes
where \(\lambda _{j}:=\max \{(2d+1)p j\ln 2,~1\}\). Note that the inequality \(\lambda \ln n \ge 6M_{1}^{2}p\lambda _j\) holds for large n, since \(\lambda >6M_{1}^{2}p^{2}\) and \(j\in \mathcal {H}\). Hence, \(\overline{\tau _{n}(j)}\le \tau _{n}(j)\) thanks to (1.5) and (2.5). Moreover,
For any \(t\ge 0\),
Therefore,
This with variable substitution \(t=\omega \overline{\tau _{n}(j)}\) shows that
On the other hand, it is easy to see that the estimator \(\widehat{f_{j}^{(d)}}(x)\) in (1.6) can be rewritten as
because \(K_{j}(v,x):=(-1)^{d}\sum _{k\in \mathbb {Z}}[\varphi _{jk}]^{(d)}(v)\varphi _{jk}(x)\) in Lemma 2.2. This with (1.7) and Lemma 2.2 implies that \(S_n(x,j) =\frac{1}{n}\sum _{i=1}^{n}[K_{j}(X_{i},x)-EK_{j}(X_{i},x)]\) and
Furthermore,
thanks to Bernstein’s inequality.
For \(j\in \mathcal {H}\), \(\overline{\tau _{n}(j)}=\Big (\frac{6M_{1}^{2}p2^{j(2d+1)}\lambda _{j}}{n}\Big )^{\frac{1}{2}} \le 3M_{1}p\) holds for large n. Thus,
due to \(M_{1},p\ge 1\) and \(\omega >0\). Substituting this above estimate into (2.8), one obtains that
Then it follows from \(\lambda _{j}=\max \{(2d+1)p j\ln 2,~1\}\ge 1\) that
Combining this with (2.7) and \(\overline{\tau _{n}(j)}:=\left( \frac{6M_{1}^{2}p2^{j(2d+1)}\lambda _j}{n}\right) ^{\frac{1}{2}}\), one concludes that
Hence, according to \(\lambda _{j}\lesssim \ln n\) and \(e^{-\lambda _{j}}\le 2^{-(2d+1)p j}\), one knows
This with (1.8) and (2.6) leads to
which completes the proof. \(\square \)
To introduce Proposition 2.2, we also need the following notations:
where \(\delta _n=(\frac{C\ln n}{n})^{\frac{s}{2s+2d+1}}\), \(C>1\) is some constant and \(T>0\) is defined by (1.1).
Note that \(\mathfrak {M}(x,f)\le c_0:=\sup _x\mathfrak {M}(x,f)\), if \(\varphi \) is t regular and \(\Vert f^{(d)}\Vert _{\infty }\lesssim 1\). Then there exists
such that \(\Lambda _{m}=\emptyset \) for each \(m>m_{2}\). Obviously, \(m_{2}>0\) for large n.
Next, another useful proposition is provided which is one of the main ingredients in the proof of Theorem 1.4.
Proposition 2.2
Let \(f\in B_{r,q}^{s+d}(M)\) and \(\varphi \) be t regular with \(t\ge d\ge 0\). Then for \(m\in \mathbb {Z}\) satisfying \(0\le m\le m_2\) and each \(p\in [1,\infty )\),
Moreover, if \(s>\frac{1}{r}\) and \(r\le p\), then with \(s':=s-\frac{1}{r}+\frac{1}{p}\),
where \(\mathfrak {M}(x,f)\) and \(\Lambda _m\) are defined in (2.9)–(2.10) respectively.
Proof
The proof is similar to the second part of Proposition 3.2 in Ref. [2]. Here, one provides only some important steps to prove this proposition.
Take \(j_2\) satisfying \(c_12^{\frac{2m}{2d+1}}\delta _n^{-\frac{1}{s}}\le 2^{j_{2}}\le c_22^{\frac{2m}{2d+1}}\delta _n^{-\frac{1}{s}}\), where two positive constants \(c_1,c_2\) satisfy \( (2\,M)^{\frac{1}{s}}I_{\{r=\infty \}}<c_1<c_2< \min \left\{ \frac{C}{4c_0^{2}},~\frac{C}{4\lambda }\right\} ^{\frac{1}{2d+1}}.\) Then \(j_{2}\in \mathcal {H}\) and \(\tau _{n}(j_2)\le 2^{m-1}\delta _n\) for large n and \(0<m\le m_2\).
Clearly, by \(\Lambda _m=\{x\in [-T,T],~2^{m}\delta _n <\mathfrak {M}(x,f)\le 2^{m+1}\delta _n\}\),
where \(|\Lambda _m|\) stands for the Lebesgue measure of the set \(\Lambda _m\). On the other hand,
When \(1\le r<\infty \), according to Chebyshev’s inequality, (1.8), (2.13) and \(f\in B_{r,q}^{s+d}(M)\), one has
Substituting (2.14) into (2.12), one obtains that
due to \(2^{j_{2}}\thicksim 2^{\frac{2m}{2d+1}}\delta _{n}^{-\frac{1}{s}}\).
For the case \(r=\infty \), it follows from \(f\in B_{r,q}^{s+d}(M)\) and \(m>0\) that \( B_{j_{2}}^{*}(x,f)= \sup _{j'\ge j_2}B_{j'}(x,f)\le M 2^{-j_{2}s}\le Mc_1^{^{-s}} 2^{-\frac{2ms}{2d+1}}\delta _n \le 2^{m-1}\delta _n\) thanks to the choice of \(2^{j_{2}}\ge c_12^{\frac{2m}{2d+1}}\delta _{n}^{-\frac{1}{s}}\) with \(c_1>(2M)^{\frac{1}{s}}\). Thus, \(|\Lambda _{m}|=0\) because of (2.13). Furthermore, it reduces to \( Q_m\le (2^{m+1}\delta _n)^p|\Lambda _{m}|=0 \) by (2.12).
Finally, one discusses the case of \(s>\frac{1}{r}\) and \(r\le p\). Note that \(f^{(d)}\in B_{r,q}^{s}\hookrightarrow B_{p,q}^{s'}\) with \(s'=s-\frac{1}{r}+\frac{1}{p}\). Similar to (2.14),
This with (2.12) and \(2^{j_{2}}\thicksim 2^{\frac{2m}{2d+1}}\delta _{n}^{-\frac{1}{s}}\) implies that
The proof is done. \(\square \)
3 Proofs of Theorems 1.1–1.4
This section is devoted to give the proofs of Theorems 1.1–1.4.
Proof of Theorem 1.1
. By the definition of \(\widehat{f^{(d)}_{j^{*}}}(x)\) and \(E\hat{\alpha }_{j^{*}k}=\alpha _{j^{*}k}\), it is clear to see that
Moreover, it follows from the Hölder inequality with \(\frac{1}{p}+\frac{1}{p'}=1\) \((p>1)\) that
thanks to Lemma 2.1. When \(p=1\), the above estimate can be concluded directly without using the Hölder inequality.
On the other hand, Lemma 1.1 leads to \( \sup \limits _{x\in \Omega _{x_{0}}}\sup \limits _{f\in H^{s+d}(\Omega _{x_{0}},M)} \Big |E\widehat{f^{(d)}_{j^{*}}}(x)-f^{(d)}(x)\Big |^{p}\lesssim 2^{-j^{*}sp}. \) This with (3.1) and \(2^{j^{*}}\thicksim n^{\frac{1}{2s+2d+1}}\) shows
The proof is completed. \(\square \)
Proof of Theorem 1.2
. According to (1.3) and (1.5), one obtains that
The same arguments as (3.2) implies
Moreover, combining (3.2) and (3.3), one concludes
due to \(\widehat{f_{j_{0}\wedge j}^{(d)}}=\widehat{f_{j\wedge j_{0}}^{(d)}}\) and the selection of \(j_0\) in (1.4).
Clearly, by (1.8), \(|S_n(x,j)|\le [|S_n(x,j)|-\tau _{n}(j)]_++\tau _{n}(j)\le \aleph _{n}(x)+\tau _{n}(j).\) This with (1.7) and (1.8) shows that
On the other hand, by using (1.3) and (1.7),
This with \(\sup _{j'\in \mathcal {H}} |E\widehat{f_{j\wedge j'}^{(d)}}(x)-E\widehat{f_{j'}^{(d)}}(x)| \le \sup _{\{j'\in \mathcal {H},~j'\ge j\}}\{B_{j\wedge j'}(x,f)+B_{j'}(x,f)\}\) and (1.8) leads to
Hence, it follows from (3.4)–(3.6) that
holds for each \(j\in \mathcal {H}\). Furthermore,
thanks to \(\widehat{f_{n}^{(d)}}(x)=\widehat{f_{j_0}^{(d)}}(x)\) in (1.6). Hence, Theorem 1.2 is proved. \(\square \)
Proof of Theorem 1.3
. Take \(j_{1}\) satisfying \(2^{j_{1}}\thicksim (\frac{n}{\ln n})^{\frac{1}{2s+2d+1}}\). Then \(j_{1}\in \mathcal {H}\) for large n and \(s>0\). Moreover, Theorem 1.2 yields that
holds for any \(x\in \Omega _{x_{0}}\).
By (1.5) and the given choice \(2^{j_{1}}\thicksim (\frac{n}{\ln n})^{\frac{1}{2s+2d+1}}\), one finds easily
due to Proposition 2.1. On the other hand, (1.7)–(1.8) and Lemma 1.1 lead to
holds for any \(x\in \Omega _{x_{0}}\) and \(f\in H^{s+d}(\Omega _{x_{0}},M)\). This with the choice \(2^{j_{1}}\thicksim (\frac{n}{\ln n})^{\frac{1}{2s+2d+1}}\) implies that
Finally, the desired conclusion can be concluded from (3.7)–(3.9). The proof is finished. \(\square \)
Proof of Theorem 1.4
. Recall that \(\Lambda _{m}=\{x\in [-T,T],~2^{m}\delta _n<\mathfrak {M}(x,f)\le 2^{m+1}\delta _n\}\) due to (2.10). Define \(\Lambda _{0}^{-}:=\{x\in [-T,T],~\mathfrak {M}(x,f)\le \delta _{n}\}\) with \(\delta _{n}=(\frac{C\ln n}{n})^{\frac{s}{2s+2d+1}}\). Then for each \(p\in [1,\infty )\),
thanks to \(supp~f\subset [-T,T]\), Theorem 1.2, (2.9) and Proposition 2.1.
To complete the proof, one divides (3.10) into three regions. Recall that \(2^{m_{2}}\thicksim \delta _n^{-1}\) and \(\delta _n\thicksim (\frac{\ln n}{n})^{\frac{s}{2s+2d+1}}\) by (2.10)–(2.11). By Proposition 2.2, the following estimations are established.
(i). For \(1\le p<\frac{2sr}{2d+1}+r\),
(ii). For \(p\ge \frac{2sr}{2d+1}+r\),
(iii). For the case \(p\ge \frac{2sr}{2d+1}+r\) and \(s>\frac{1}{r}\), take \(m_1\in \mathbb {Z}\) satisfying
Clearly, \(0<m_1<m_2\) due to \(r<p,~p\ge \frac{2sr}{2d+1}+r\) and \(s>\frac{1}{r}\). Therefore,
This with (3.13), \(\delta _n\thicksim (\frac{\ln n}{n})^{\frac{s}{2s+2d+1}}\) and \(s'=s-\frac{1}{r}+\frac{1}{p}\) tells that
Finally, the desired conclusion follows from (3.10)–(3.12), which completes the proof. \(\square \)
References
Cao, K.K., Zeng, X.C.: Adaptive wavelet density estimation under independence hypothesis. Results Math. 76(4), 196 (2021)
Cao, K.K., Zeng, X.C.: A data-driven wavelet estimator for deconvolution density estimations. Results Math. 78(4), 156 (2023)
Chaubey, Y.P., Doosti, H., Prakasa Rao, B.L.S.: Wavelet based estimation of the derivatives of a density for a negatively associated process. J. Stat. Theory Pract. 2, 453–463 (2008)
Daubechies, I.: Ten Lectures on Wavelets. SIAM, Philadelphia (1992)
Daubechies, I.: The wavelet transform, time-frequency localization and signal analysis. IEEE Trans. Inform. Theory 36, 961–1005 (1990)
Donoho, D.L., Johnstone, I.M., Kerkyacharian, G., Picard, D.: Density estimation by wavelet thresholding. Ann. Stat. 24(2), 508–539 (1996)
Goldenshluger, A., Lepski, O.: On adaptive minimax density estimation on \(\mathbb{R} ^{d}\). Probab. Theory Relat. Fields 159(3–4), 479–543 (2014)
Härdle, W., Kerkyacharian, G., Picard, D., Tsybakov, A.: Wavelets, Approximation and Statistical Applications. Springer, New York (1998)
Huang, S.Y.: Density estimation by wavelet-based reproducing kernels. Stat. Sinica 9, 137–151 (1999)
Kerkyacharian, G., Picard, D.: Density estimation in Besov spaces. Stat. Probab. Lett. 13, 15–24 (1992)
Li, R., Liu, Y.M.: Wavelet optimal estimations for a density with some additive noises. Appl. Comput. Harmon. Anal. 36(3), 416–433 (2014)
Liu, Y.M., Wang, H.Y.: Wavelet estimations for density derivatives. Sci. China Math. 56(3), 483–495 (2013)
Liu, Y.M., Wu, C.: Point-wise estimation for anisotropic densities. J. Multivar. Anal. 171, 112–125 (2019)
Liu, Y.M., Wu, C.: Point-wise wavelet in the convolution structure density model. J. Fourier Anal. Appl. 26, 81 (2020)
Mallat, S.: A theorey for multiresolution signal decomposition: the wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 11, 674–693 (1989)
Markovich, L.A.: Gamma kernel estimation of the density derivative on the positive semi-axis by dependent data. REVSTAT-Stat. J. 14(3), 327–348 (2016)
Meyer, Y.: Wavelets and Operators. Cambridge University Press, Cambridge (1992)
Müller, H.G., Gasser, T.: Optimal convergence properties of kernel estimates of derivatives of a density function. In: Lecture Notes in Mathematics 757. Springer, Berlin, pp 144–154 (1979)
Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 35, 1065–1076 (1962)
Prakasa Rao, B.L.S.: Nonparametric estimation of the derivates of a density by the method of wavelets. Bull. Inform. Cybernet. 28, 91–100 (1996)
Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. Ann. Math. Stat. 27, 832–835 (1956)
Wu, C., Zeng, X.C., Mi, N.: Adaptive and optimal pointwise deconvolution density estimations by wavelets. Adv. Comput. Math. 47, 14 (2021)
Wu, C., Wang, X.C., Wang, J.R.: Wavelet adaptive pointwise density estimations with super-smooth noises. Acta Math. Sinica (Chin. Ser.) 62(5), 687–702 (2019)
Zeng, X.C.: A note on wavelet deconvolution density estimation. Int. J. Wavelets Multiresolut. Inf. Process. 15(6), 1750055 (2017)
Acknowledgements
This work is supported by the National Natural Science Foundation of China (Nos. 12101459 and 12171016). The authors would like to thank the referees for their valuable suggestions, which greatly improve the readability of the article.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no Conflict of interest.
Additional information
Communicated by Rosihan M. Ali.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cao, K., Zeng, X. Data-Driven Wavelet Estimations for Density Derivatives. Bull. Malays. Math. Sci. Soc. 47, 169 (2024). https://doi.org/10.1007/s40840-024-01766-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40840-024-01766-5