Abstract
Based on a data-driven selection of an estimator from a fixed family of kernel estimators, Goldenshluger and Lepski (Probab Theory Relat Fields 159:479–543, 2014) considered the problem of adaptive minimax un-compactly supported density estimation on \({\mathbb {R}}^{d}\) with \(L^{p}\) risk over Nikol’skii classes. This paper shows the same convergence rates by using a data-driven wavelet estimator over Besov spaces, because the wavelet estimations provide more local information and fast algorithm. Moreover, we explore better convergence rates under the independence hypothesis, which reduces the dimension disaster effectively.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Density estimation has a long history [1, 3]. In 1996, Donoho et al. [4] established an adaptive and optimal estimate (up to a logarithmic factor) for compactly supported density functions on \({\mathbb {R}}^{1}\) with \(L^{p}\) risk (\(1\le p<\infty \)) over Besov spaces by using a non-linear wavelet estimator.
It is quite remarkable that if the assumption that the underlying density has compact support is disappearance, then the minimax behavior becomes completely different. In particular, Kerkyacharian and Picard [10] defined a linear estimator by an orthogonal scaling function and discussed the convergence rates of \(L^{p}\) risk for \(1\le p<\infty \) over one-dimensional Besov spaces in 1992. Although their density functions do not have compact support, the above estimation is non-adaptive and needs an additional condition (Condition N) for \(1\le p<2\).
How about adaptive estimation for un-compactly supported density functions? Juditsky and Lambert-Lacroix [9] studied the optimal convergence rates of \(L^{p}\) risk (\(1\le p<\infty \)) by using a biothogonal wavelet estimator, which density functions are in one-dimensional Hölder spaces. Seven years later, the \(L^{2}\) risk estimation in one-dimensional Besov spaces was investigated [18]. In 2014, Goldenshluger and Lepski [5] addressed this problem on \({\mathbb {R}}^{d}\) with \(L^{p}\) risk (\(1< p<\infty \)) over anistropic Nikol’skii classes. They constructed an adaptive estimator based on a data-driven selection rule from a fixed family of kernel estimators, and there are four different regions (convergence rates) with respect to the minimax behavior which is (nearly) optimal.
Compared with kernel estimators, the wavelet ones provide more local information which are effective for the estimation of density function with cusps, because they have the properties of time-frequency localization and multiresolution (see [2, 12,13,14]). Recently, this fact has been verified by numerical experiments in tables and figures in lots of literatures, including both density [7, 22] and regression estimates [6, 11]. What’s more, the fast wavelet algorithm is important in many practical fields and the algorithm advantage of wavelet is based on filter banks and Pyramid algorithm (see [15, 20, 21]).
In this current paper, we use the orthornormal scaling function to construct a data-driven estimator on isotropic Besov spaces and obtain the same upper bounds as Goldenshluger and Lepski [5]. Compared with their work, our auxiliary estimators are more concise. Furthermore, motivated by the work of Rebells [19], we provide another better convergence rates with density functions having independence hypothesis. It should be pointed out that this estimation reduces the dimension disaster effectively.
1.1 Wavelets and Besov Spaces
We begin with a classical concept in wavelet analysis. A multiresolution analysis (MRA, [17]) is a sequence of closed subspaces \(\{V_{j}\}_{j\in {\mathbb {Z}}}\) of the square integrable function space \(L^{2}({\mathbb {R}}^{d})\) satisfying the following properties:
-
(i)
. \(V_{j}\subset V_{j+1}\), \(j\in {\mathbb {Z}}\);
-
(ii)
. \(\overline{\bigcup _{j\in {\mathbb {Z}}} V_{j}}=L^{2}({\mathbb {R}}^{d})\) (the space \(\bigcup _{j\in {\mathbb {Z}}} V_{j}\) is dense in \(L^{2}({\mathbb {R}}^{d})\));
-
(iii)
. \(f(2\cdot )\in V_{j+1}\) if and only if \(f(\cdot )\in V_{j}\) for each \(j\in {\mathbb {Z}}\);
-
(iv)
. There exists \(\varphi \in L^{2}({\mathbb {R}}^{d})\) (scaling function) such that \(\{\varphi (\cdot -k),~k\in {\mathbb {Z}}^{d}\}\) forms an orthonormal basis of \(V_{0}=\overline{\mathrm {span}\{\varphi (\cdot -k),~k\in {\mathbb {Z}}^d\}}\).
When \(d=1\), a wavelet function \(\psi \) can be constructed from the scaling function \(\varphi \) in a simple way such that \(\{2^{j/2}\psi (2^{j}\cdot -k),~j,k\in {\mathbb {Z}}\}\) constitutes an orthonormal basis (wavelet basis) of \(L^{2}({\mathbb {R}})\). Examples include the Daubechies wavelets [8], which have compact supports in time domain. For \(d\ge 2\), the tensor product method gives an MRA \(\{V_{j}\}\) of \(L^{2}({\mathbb {R}}^{d})\) from one-dimensional MRA. In fact, with a scaling function \(\varphi \) of tensor products, we find \(2^{d}-1\) wavelet functions \(\psi ^{\ell }~(\ell =1,2,\cdots ,2^d-1)\) such that
constitutes an orthonormal basis (wavelet basis) of \(L^{2}({\mathbb {R}}^{d})\).
Let \(P_{j}\) be the orthogonal projection operator from \(L^{2}({\mathbb {R}}^{d})\) onto the scaling space \(V_{j}\) with the orthonormal basis \(\{\varphi _{jk}(\cdot )=2^{jd/2}\varphi (2^{j}\cdot -k),~k\in {\mathbb {Z}}^{d}\}\). Then for each \(f\in L^{2}({\mathbb {R}}^{d})\),
with \(\alpha _{jk}{:}{=}\langle f,\varphi _{jk}\rangle \). Specially, when a scaling function \(\varphi \) is m-regular, the identity (1.1) holds in \(L^p({\mathbb {R}}^{d})\) for \(p\ge 1\) [8]. Here and after, m-regular means that \(\varphi \in C^{m}({\mathbb {R}}^{d})\) and \(|D^{\alpha }\varphi (x)|\le c_l(1+|x|^{2})^{-\frac{l}{2}}\) \((|\alpha |=0,1,\ldots ,m)\) for each \(l\in {\mathbb {Z}}\) and some independent positive constants \(c_l\). The Daubechies scaling function \(\underbrace{D_{2N}\times \cdots \times D_{2N}}_{d~\text {times}}\) with \(N>m+d\) is an example, and the tensor product of \(D_{2N}\) with large N is used in the whole paper.
One of advantages of wavelet bases is that they can characterize Besov spaces, which contain Hölder and \(L^{2}\)-Sobolev spaces as special examples. The next lemma provides an equivalent definition.
Lemma 1.1
([17]) Let \(\varphi \) be m-regular, \(\psi ^{\ell }~(\ell =1, 2, \cdots ,2^{d}-1)\) be the corresponding wavelets and \(f\in L^{r}({\mathbb {R}}^{d})\). If \(\alpha _{jk}{:}{=}\langle f,\varphi _{jk}\rangle \), \(\beta _{jk}^{\ell }=\langle f,\psi _{jk}^{\ell }\rangle \), \(r,q\in [1,\infty ]\) and \(0<s<m\), then the following assertions are equivalent:
-
(i)
. \(f\in B^{s}_{r,q}({\mathbb {R}}^{d})\);
-
(ii)
. \(\{2^{js}\Vert P_{j}f-f\Vert _{r}\}\in l_{q};\)
-
(iii)
. \(\{2^{j(s-\frac{d}{r}+\frac{d}{2})}\Vert \beta _{j\cdot }\Vert _{l_r}\}\in l_{q}.\)
The Besov norm of f can be defined by
where \(\Vert \alpha _{j_{0}\cdot }\Vert ^r_{l_r}{:}{=}\sum \limits _{k\in {\mathbb {Z}}^d}|\alpha _{j_0k}|^r\) and \(\Vert \beta _{j\cdot }\Vert _{l_r}^{r}=\sum \limits _{\ell =1}^{2^d-1}\sum \limits _{k\in {\mathbb {Z}}^{d}}| \beta ^{\ell }_{jk}|^{r}.\)
Moreover, Lemma 1.1 (i) and (ii) shows that \(\Vert P_jf-f\Vert _r\lesssim 2^{-js}\) holds for \(f\in B^{s}_{r,q}({\mathbb {R}}^{d})\). Here and throughout, the notations \(A\lesssim B\) denotes \(A\le cB\) with some fixed and independent constant \(c>0\); \(A\gtrsim B\) means \(B\lesssim A\); \(A\thicksim B\) stands for both \(A\lesssim B\) and \(A\gtrsim B\).
When \(r\le p\), Lemma 1.1 (i) and (iii) imply that with \(s'-\frac{d}{p}=s-\frac{d}{r}>0\),
where \(A\hookrightarrow B\) stands for a Banach space A continuously embedded in another Banach space B. All these claims can be found in Ref. [23].
1.2 Wavelet Estimator and Selection Rule
It is well-known that the classical linear wavelet estimator is given by
with \({\widehat{\alpha }}_{jk}{:}{=}\frac{1}{n}\sum _{i=1}^{n}\varphi _{jk}(X_{i})\). Moreover, the parameter \(j{:}{=}j(n)\) goes to infinity, as the sample size \(n\rightarrow \infty \). In general, it depends on the index s of unknown density function f and the estimator is non-adaptive [8, 10]. In this subsection, we give the selection rule of parameter j only depending on observations \(X_{1},\cdots ,X_{n}\), which is so called data-driven version.
Let \({\mathcal {H}}{:}{=}\left\{ 0,1,\cdots ,\lfloor \frac{1}{d}\log _2{\frac{n}{\ln n}}\rfloor \right\} \) with \(\lfloor a\rfloor \) denoting the largest integer smaller or equal to a and
be the stochastic error of \({\widehat{f}}_{j}\). The most important step of the selection rule is to find a function \(U_n(x,j)\) such that the moments of random variables
are “small” for each \(x\in {\mathbb {R}}^{d}\), where \(a_{+}{:}{=}\max \{a,0\}\). According to Bernstein’s inequality in Sect. 3, the function \(U_n(x,j)\) can be defined by
with some constant \(\lambda >(5p+6)\Vert \Phi \Vert _{\infty }\). Moreover, this special choice of \(\lambda \) is used in Proposition 3.1, (3.15) and (4.1). Here and throughout,
with \(\Phi \in C_0({\mathbb {R}}^{d})\) satisfying \(\Phi \ge 0\) and
where \(C_0({\mathbb {R}}^{d})\) stands for the set of all compactly supported and continuous functions. Clearly, \(\sigma _j\in L^{1}({\mathbb {R}}^{d})\cap L^{\infty }({\mathbb {R}}^{d})\) holds for each \(j\in {\mathcal {H}}\), if \(f\in L^{\infty }({\mathbb {R}}^{d})\).
Note that \(U_n(x,j)\) depends on unknown density function f. Hence, we use a empirical counterpart \({\widehat{U}}_n(x,j)\) instead of that, i.e.,
where \({\widehat{\sigma }}_{j}(x){:}{=}\frac{1}{n}\sum _{i=1}^n\Phi _{j}(x-X_{i})\). Then it is easy to find \(E{\widehat{\sigma }}_{j}(x)=\sigma _{j}(x).\)
Now, the selection rule of j would be shown as follows. For any \(x\in {\mathbb {R}}^{d}\), let
Here and after, \(a\wedge b{:}{=}\min \{a,b\}\) and \(a\vee b{:}{=}\max \{a,b\}\). Compared with the work of Goldenshluger and Lepski [5], the auxiliary estimator \({\widehat{f}}_{j\wedge j'}\) is more concise than theirs. Thus, the selection of \(j_{0}\) is given by
Obviously, it only depends on the observation data \(X_1,\cdots ,X_n\) for any \(x\in {\mathbb {R}}^{d}\).
With \({\widehat{\alpha }}_{jk}=\frac{1}{n}\sum _{i=1}^{n}\varphi _{jk}(X_i)\) and \(j_0\) being given in (1.9), a data-driven wavelet estimator is shown by
Moreover, the estimator \({\widehat{f}}_{n,d}(x)\) is a Borel function thanks to the discrete set \({\mathcal {H}}\) and the continuity of \(\sum _k\varphi (x-k)\varphi (y-k)\) with \(\varphi =\underbrace{D_{2N}\times \cdots \times D_{2N}}_{d~\text {times}}\) for large N.
1.3 Main Results
We shall state main theorems of this paper and discuss relations to some other work in this subsection. For \(M>0\), the notation \(B_{r,q}^{s}(M)\) stands for a Besov ball, i.e.,
Moreover, \(L^\infty (M)\) is defined by the way. Then the following theorem holds.
Theorem 1.1
Let \(0<s<m\) and \(r,q\in [1,\infty ]\). Then for \(p\in (1,\infty )\), the estimator \({\widehat{f}}_{n,d}\) in (1.10) satisfies
where
and
Remark 1.1
When \(q=\infty \), Besov space \(B_{r,\infty }^s({\mathbb {R}}^d)\) reduces to Nikol’skii class \({\mathcal {N}}_r(s,{\mathbb {R}}^d)\) automatically. Then according to Theorem 3 of Goldenshluger and Lepski [5], the above estimation is optimal up to a logarithmic factor, since the lower bound estimation holds for all possible estimators including both kernel and wavelet ones.
Remark 1.2
For the case \(s>\frac{d}{r}\), the condition \(L^{\infty }(M)\) is not necessary because of \(B_{r,q}^{s}({\mathbb {R}}^d)\subset L^{\infty }({\mathbb {R}}^d)\) in this case [8]. On the other hand, the convergence rates in (1.11)–(1.12) with \(d=1\) and \(p=2\) coincide with Theorem 3 of Reynaud-Bouret et al. [18]; If \(d=1\) and \(r=q=\infty \), then \(B_{\infty ,\infty }^s({\mathbb {R}})=H^s({\mathbb {R}})\) and Theorem 4 of Juditsky et al. [9] can follows from the above theorem directly.
By a detail observation, the convergence exponents \(\beta (p,d)\) in Theorem 1.1 tend to zero as the dimension \(d\longrightarrow \infty \). Motivated by the work of Rebelles [19], we reduce the influence of the dimension and improve the convergence rates in Theorem 1.1 by the independence hypothesis of density functions.
As in Ref. [19], denote \({\mathcal {I}}_d{:}{=}\{1,\cdots ,d\}\). For a partition \({\mathcal {P}}\) of \({\mathcal {I}}_d\), a density function f has the independence structure \({\mathcal {P}}\), if
with \(I=\{l_1,\cdots ,l_{|I|}\}\in {\mathcal {P}}\) and \(1\le l_1<\cdots <l_{|I|}\le d\). Here, \(x_I{:}{=}(x_{l_1},\cdots ,x_{l_{|I|}})\in {\mathbb {R}}^{|I|}\) and |I| denotes the cardinality of I. On the other hand, \( f\in B_{r,q}^{s}({\mathbb {R}}^{d},{\mathcal {P}}) \) if and only if \(f_{|I|}\in B_{r,q}^{s}({\mathbb {R}}^{|I|})\) for each \(I\in {\mathcal {P}}\); \(f\in L^{\infty }({\mathbb {R}}^{d},{\mathcal {P}})\) means \(f_{|I|}\in L^{\infty }({\mathbb {R}}^{|I|})\) for each \(I\in {\mathcal {P}}\). Furthermore, the following notations are needed:
For \(f_{|I|}\in B_{r,q}^{s}({\mathbb {R}}^{|I|})\), the corresponding wavelet estimator \({\widehat{f}}_{n,|I|}(x_{I})\) is given by (1.10). Then the estimator \({\widehat{f}}_{n,{\mathcal {P}}}\) for \(f\in B_{r,q}^{s}({\mathbb {R}}^{d},{\mathcal {P}})\) is defined by
Next, we are in a position to introduce the most important result of this paper.
Theorem 1.2
Let \(0<s<m\) and \(r,q\in [1,\infty ]\). For any \(p\in (1,\infty )\),
where \(\alpha _n(p,|I|)\) and \(\beta (p,|I|)\) can be found in (1.11) and (1.12) respectively.
Remark 1.3
When \({\mathcal {P}}=\{\{1,\ldots ,d\}\}\), \(|I|=d\) and the result of Theorem 1.1 can be reached directly from Theorem 1.2. For another extreme case \({\mathcal {P}}=\{\{1\},\ldots ,\{d\}\}\), the convergence order dose not depend on the dimension d and the influence of the dimension on the accuracy of estimation is gone because of \(|I|=1\) in this case.
2 Oracle Inequality
In this section, we shall introduce a point-wise oracle inequality, which is one of main ingredients in later proofs. Let us begin with the following lemma.
Lemma 2.1
Let \({\mathcal {X}}_{j}(x)=\Big [|{\widehat{\sigma }}_{j}(x)-\sigma _{j}(x)|-U_{n}(x,j)\Big ]_+\) with \(j\in {\mathcal {H}}\). Then
where \(U_{n}(x,j)\) and \({\widehat{U}}_{n}(x,j)\) are given by (1.4) and (1.7) respectively.
Proof
Define \({\mathcal {H}}_{0}{:}{=}\{j\in {\mathcal {H}},~\sigma _{j}(x)\ge 4\lambda 2^{jd}\frac{\ln n}{n}\}\). According to the definition of \({\mathcal {X}}_{j}(x)\),
This with (1.4) and (1.7) leads to
Then for any \(j\in {\mathcal {H}}_{0}\), the above inequality reduces to
Hence,
and
Furthermore, by a simple calculation, one obtains that
and
The desired conclusion is established for the case of \(j\in {\mathcal {H}}_{0}\).
It remains to show the case of \(j\in {\mathcal {H}}_{1}{:}{=}{\mathcal {H}}\backslash {\mathcal {H}}_{0}\). Clearly,
due to (1.4) and \(j\in {\mathcal {H}}_{1}\). This with \({\widehat{U}}_{n}(x,j)\ge \frac{3\lambda 2^{jd}\ln n}{n}\) in (1.7) implies
On the other hand, according to the definition of \({\mathcal {X}}_{j}(x)\),
thanks to \(j\in {\mathcal {H}}_{1}\) and (2.1). This with \(\sqrt{a+b}\le \sqrt{a}+\sqrt{b}\) shows that
Combining it with \(\sqrt{ab}\le \frac{a+b}{2}\) and \(U_{n}(x,j)\ge \frac{\lambda 2^{jd}\ln n}{n}\) in (1.4), one knows
Then it follows that
Hence, the lemma also holds for the case of \(j\in {\mathcal {H}}_{1}\) thanks to (2.2) and (2.3). The proof is done. \(\square \)
To state the point-wise oracle inequality, let \(B_j(x,f)\) be the bias of the estimator \(\widehat{f}_{j}(x)\), i.e.,
and define
where \(P_j\) and \(U_{n}(x,j)\) are given by (1.1) and (1.4) respectively.
The following oracle inequality is the main result of this section.
Theorem 2.1
For any \(x\in {\mathbb {R}}^{d}\), the estimator \({\widehat{f}}_{n,d}(x)\) in (1.10) satisfies that
where v(x) is defined in (1.3) and
Proof
It follows from the definition of \({\widehat{R}}_{j}(x)\) in (1.8) that
thanks to (1.8). The same arguments as (2.7) show
Then combining (1.9) with (2.7)–(2.8), one obtains that
Clearly, by (1.3),
Moreover, it follows from (2.5) that
On the other hand, according to (1.2) and (2.4),
This with \(\sup _{j'\in {\mathcal {H}}}|E{\widehat{f}}_{j\wedge j'}(x)-E{\widehat{f}}_{j'}(x)| \le \sup _{\{j'\in {\mathcal {H}},~j'\ge j\}}\{B_{j\wedge j'}(x,f)+B_{j'}(x,f)\}\) leads to
because of (2.5)–(2.6) and the second inequality of Lemma 2.1. Hence, it follows from (2.9)–(2.11) that
Note that the fact \(\big [\sup _\alpha F_\alpha -\sup _\alpha G_\alpha \big ]_+\le \sup _\alpha \big [F_\alpha -G_\alpha \big ]_+\). Then
thanks to the first inequality of Lemma 2.1 and (2.6). Therefore, \({\widehat{U}}^{*}_n(x,j)\le 13U_n^{*}(x,j)+2\omega (x)\). This with (2.12) shows that
holds for any \(j\in {\mathcal {H}}\). Furthermore,
due to \({\widehat{f}}_{n,d}(x)={\widehat{f}}_{j_0}(x)\) in (1.10), which finishes the proof. \(\square \)
3 Two Propositions
This section is devoted to prove two necessary propositions. The following classical inequality is needed to prove Proposition 3.1.
Bernstein’s inequality ([16]). Let \(Y_{1},\cdots ,Y_{n}\) be i.i.d. random variables with \(EY_{i}^{2}\le \sigma ^{2}\) and \(|Y_{i}|\le M\) \((i=1,2,\cdots ,n)\). Then for any \(x>0\),
Now, we state the first proposition, which plays an important role in the proof of the second one.
Proposition 3.1
Let v(x) and \(\omega (x)\) be given by (1.3) and (2.6) respectively. Then for each \(\gamma >0\), there exists \(\lambda >(5\gamma +6)\Phi _{\infty }\) such that
where \(\Phi _{\infty }=\Vert \Phi \Vert _{\infty }\) and \(\Phi \) is defined in (1.5).
Proof
According to the definitions v(x) and \(\omega (x)\), one only needs to prove the first inequality and the second one is similar. Moreover, one will show \(\int _{{\mathbb {R}}^d} E[v(x)]^{\gamma }dx\lesssim n^{-\frac{\gamma }{2}}\) in two steps.
Step 1. Define \(F(x){:}{=}f*I_{[-1,~1]^{d}}(x)\) and
where \(\lambda _{j}=\max \big \{\frac{1}{4},~(\gamma +1)jd\ln 2+\ln (F^{-1}(x)\wedge n^{l})\big \}\) with \(l=\frac{3\gamma }{2}+2\).
Note that \(\lambda \ln n\ge 2\Phi _{\infty }\lambda _j\) follows from \(\lambda >(5\gamma +6)\Phi _{\infty }\) and \((\gamma +1)jd\ln 2+\ln (F^{-1}(x)\wedge n^{l}) \le [(\gamma +1)+l]\ln n\) with \(j\in {\mathcal {H}}\). Then \(\overline{U_{n}}(x,j)\le U_{n}(x,j)\) due to (1.4) and (3.1). Furthermore,
For each \(t\ge 0\),
Hence,
This with variable substitution \(t=v\omega \) and \(\omega {:}{=}\sqrt{\frac{\Phi _{\infty }2^{jd+1}\sigma _{j}(x)}{n}}+ \frac{\Phi _{\infty }2^{jd+2}}{3n}\) shows
thanks to \(v+\sqrt{\lambda _j}\ge \sqrt{v+\lambda _j}\) and \(\lambda _j\ge \frac{1}{4}\).
On the other hand,
with \(K(x,y)=\sum _k\varphi (x-k)\varphi (y-k)\). Then by (1.6),
Combining these with Bernstein’s inequality, one concludes that
This with (3.3) implies that
due to \(\omega {:}{=}\sqrt{\frac{\Phi _{\infty }2^{jd+1}\sigma _{j}(x)}{n}}+ \frac{\Phi _{\infty }2^{jd+2}}{3n}\). Note that \(\sigma _{j}(x)=\int _{{\mathbb {R}}^d}\Phi _{j}(t-x)f(t)dt\lesssim 2^{jd}\) and \(e^{-\lambda _{j}}\le 2^{-jd(\gamma +1)}[F(x)\vee n^{-l}]\). Then
It follows from (1.3) and (3.2) that
Step 2. The second step is devoted to prove \(\int _{{\mathbb {R}}^d} E[v(x)]^{\gamma }dx\lesssim n^{-\frac{\gamma }{2}}\) by Step 1. Denote
Then with (3.4), one obtains
thanks to \(F(x){:}{=}f*I_{[-1,~1]^{d}}(x)\in L^1({\mathbb {R}}^d)\).
Next, the main work is to prove \(\int _{T_2} E[v(x)]^{\gamma }dx\lesssim n^{-\frac{\gamma }{2}}\). Define
and \(\overline{{\widehat{D}}(x)}=[{\widehat{D}}(x)]^c\), where \(A^{c}\) means the complement of the set A.
Without loss of the generality, \(\mathrm {supp}~\Phi \subseteq [-1,~1]^{d}\) is assumed in this paper. Then
because of (1.6) and \(F(x)=\int _{{\mathbb {R}}^d} I_{U(x)}(t)f(t)dt\). Moreover,
By \(l\ge 1\) and \(\lambda>(5\gamma +6)\Phi _{\infty }>2\Phi _\infty \), for each \(x\in T_2\),
which implies that \( \sup _{j\in {\mathcal {H}}}\big [|\xi _n(x,j)|-U_n(x,j)\big ]_+\cdot I_{\{ {\widehat{D}}(x)\}}=0 \) holds for \(x\in T_{2}\). Hence,
For the case \(\int _{T_2}E[v(x)]^{\gamma }I_{\big \{\overline{{\widehat{D}}(x)}\big \}}dx\). Note that \(|\xi _n(x,j)|\lesssim \Vert K_j\Vert _{_{\infty }}\lesssim 2^{jd}\le n\) follows from \(j\in {\mathcal {H}}\). Then with \(v(x){:}{=}\sup _{j\in {\mathcal {H}}}\big [|\xi _n(x,j)|-U_n(x,j)\big ]_+\),
According to Markov’s inequality, for each \(z>0\),
On the other hand,
These with \((t+1)^n\le e^{nt}\) imply that
Put \(z=\ln 2-\ln (nF(x))\). Then \(z>0\) by \(l\ge 1\) and \(F(x)\le n^{-l}\) in \(T_2\). Furthermore, (3.8) reduces to
thanks to \(0\le nF(x)\le n^{-l+1}\) with \(x\in T_2\). This with (3.7) leads to
because of \(F\in L^1({\mathbb {R}}^d)\) and \(l=\frac{3\gamma }{2}+2\).
Finally, the desired conclusion follows from (3.5), (3.6) and (3.9). The proof is completed.\(\square \)
Before giving another proposition, we need three more notations. Define
where \(\delta _n=(\frac{C\ln n}{n})^{\frac{s}{2s+d}}\) and \(m_{0}\in {\mathbb {Z}}\) satisfies \(c' \delta _n^{\frac{sr+d}{sr+dr-d}}\le 2^{m_{0}}\le c'' \delta _n^{\frac{sr+d}{sr+dr-d}}\) with some constants \(1<c'<c''<\infty \) and \(C>0\).
Note that \(U_{f}(x)\le c_0{:}{=}\sup _xU_{f}(x)\). Then there exists
such that \(\Omega _{m}=\emptyset \) for each \(m>m_{2}\). Clearly, \(m_{0}<0<m_{2}\) for large n.
Proposition 3.2
Denote
Then the following statements hold:
(1). For each \(p>1\),
(2). Let \(f\in B_{r,q}^{s}(M)\cap L^{\infty }(M)\) and \(m\in {\mathbb {Z}}\) satisfy \(m_{0}\le m\le 0\). Then
(3). Let \(f\in B_{r,q}^{s}(M)\cap L^{\infty }(M)\) and \(m\in {\mathbb {Z}}\) satisfy \(0\le m\le m_2\). Then
Moreover, if \(s>\frac{d}{r}\) and \(r\le p\), then with \(s'{:}{=}s-\frac{d}{r}+\frac{d}{p}\),
Proof
(1). According to Theorem 2.1,
where \(\Delta (x)=v(x)+\omega (x)\) and \(U_f(x)\) is given by (3.10). Then for each \(p>1\),
Moreover, \(U_{f}(x)\le 2^{m_{0}}\delta _n\) follows from (3.12). Hence,
On the other hand, \( |{\widehat{f}}_{n,d}(x)|\le \frac{1}{n}\sum _{i=1}^n\Phi _{j_{0}}(x-X_i) \) due to \({\widehat{f}}_{n,d}(x)=\sum _k{\widehat{\alpha }}_{j_{0}k}\varphi _{j_{0}k}(x)\) and \(|\sum _k\varphi (x-k)\varphi (y-k)|\le \Phi (x-y)\). Then
because of \({\mathcal {H}}\) is a discrete set and the cardinality of \({\mathcal {H}}\) is no more than \(\ln n\). Therefore,
This with (3.14) and Proposition 3.1 leads to
It follows from \(2^{m_0}\thicksim \delta _n^{\frac{sr+d}{sr+dr-d}}\) that \(2^{m_0}\delta _n n^{-\frac{p-1}{2}}\lesssim n^{-\frac{p}{2}}\) holds for \(sr-dr+d>0\) and \(2^{m_0}\delta _n n^{-\frac{p-1}{2}}\lesssim (2^{m_{0}}\delta _n)^{p-1}\) holds for \(sr-dr+d\le 0\) and \(p>1\). Combining these with (3.15), one concludes that
which is the first desired conclusion.
(2). Clearly, by \(\Omega _m=\{x\in {\mathbb {R}}^{d},~2^{m}\delta _n <U_{f}(x)\le 2^{m+1}\delta _n\}\),
where \(|\Omega _m|\) stands for the Lebesgue measure of the set \(\Omega _m\). On the other hand, (3.10) tells that \(U_{f}(x)=\inf _{j\in {\mathcal {H}}}\{B_{j}^{*}(x,f)+U_{n}^{*}(x,j)\}\). Then for each \(j\in {\mathcal {H}}\),
since \(B_{j}^{*}(x,f)=\sup _{j'\in {\mathcal {H}},~j'\ge j}B_{j'}(x,f)\). Moreover, (3.16) reduces to
If \(1\le r<\infty \), by using Chebyshev’s inequality and \(f\in B_{r,q}^s(M)\),
To estimate \(J_m^1(j)\), one chooses \(j_{1}\in {\mathbb {Z}}\) satisfying
with two constants \(c_2>c_1>1\). Thus, \(j_{1}\in {\mathcal {H}}\) for \(m_0\le m\le 0\) and large n. In fact, if \(r>2\), then
thanks to the choice of \(2^{m_0}\) and \(\frac{s}{2s+d}(\frac{d}{s}+\frac{d(r-2)}{sr+dr-d})<1\). If \(1\le r\le 2\), then
due to the choice of \(2^{m_0}\), \(\frac{d}{s}+\frac{d(r-2)}{sr+dr-d}>0\) and \(c_1,c'>1\). Hence, \(j_{1}\in {\mathcal {H}}\) follows from (3.20) and (3.21).
Recall that \( c' \delta _n^{\frac{sr+d}{sr+dr-d}}\le 2^{m_{0}}\le c'' \delta _n^{\frac{sr+d}{sr+dr-d}}\) and \(\delta _n=(\frac{C\ln n}{n})^{\frac{s}{2s+d}}\). Then by choosing C such that \(\max \{1,~(2M)^{\frac{d}{s}}\}<c_1<c_2<\frac{C}{4\lambda }\),
because of \(m\ge m_0\) and \(c'>1\).
Furthermore, according to the definition of \(U_{n}^{*}(x,j)\) and \(j_1\in {\mathcal {H}}\), one obtains that
where (3.22) is used in the second inequality. Moreover, it follows from \(\Vert \sigma _j\Vert _1\lesssim 1\) and (3.22) that
For the case \(1\le r<\infty \), combining (3.18) with (3.23) and (3.19), one knows that
This with the choice of \({j_1}\) yields
If \(r=\infty \), then \(c_1(2^{m}\delta _n)^{-\frac{d}{s}}\le 2^{j_{1}d}\le c_2(2^{m}\delta _n)^{-\frac{d}{s}}\) also due to the choice of \({j_1}\). Moreover, \(f\in B_{\infty ,q}^{s}\subseteq B_{\infty ,\infty }^{s}\) follows from \(l^q\hookrightarrow l^\infty \). Then
by choosing \(c_1\ge \max \{1,~(2M)^{\frac{d}{s}}\}\). Therefore, in view of (3.17),
This with (3.18) and (3.23) shows
The proof of the second estimation is completed.
(3). Take \(j_2\) satisfying \(c_32^{2m}\delta _n^{-\frac{d}{s}}\le 2^{j_{2}d}\le c_42^{2m}\delta _n^{-\frac{d}{s}}\). Then by \(\sigma _{j}(x)=\int _{{\mathbb {R}}^d}\Phi _{j}(t-x)f(t)dt<L\), there exist two positive constants
such that \(j_{2}\in {\mathcal {H}}\) and \(U_{n}^{*}(x,j_2)\le 2^{m-1}\delta _n\) for \(0<m\le m_2\). In fact, (3.13) tells that \(2^{m_{2}}\le 2c_0\delta _n^{-1}\). Then due to \(c_4<\frac{C}{4c_0^{2}}\),
Hence, \(j_{2}\in {\mathcal {H}}\). On the other hand, according to \(j_2\in {\mathcal {H}}\) and \(c_4<C(2\sqrt{\lambda L}+2\lambda )^{-2}\),
This with (3.17) implies
When \(1\le r<\infty \), substituting (3.19) and (3.25) into (3.18), one obtains that
For the case \(r=\infty \), it follows from (3.24) and \(0<m\le m_2\) that
due to the choice of \(c_3\). Thus, \(J_m^2(j_2)=0\) because of (3.17). This with (3.18) and (3.25) leads to
To finish the proof of proposition, the case of \(s>\frac{d}{r}\) and \(r\le p\) is considered. Note that \(B_{r,q}^{s}\subseteq B_{p,q}^{s'}\) with \(s'=s-\frac{d}{r}+\frac{d}{p}\). Similar to (3.19),
Substituting this above estimate and (3.25) into (3.18), one concludes that
thanks to \(2^{j_2d}\thicksim 2^{2m}\delta _n^{-\frac{d}{s}}\). The proof is done. \(\square \)
Remark 3.1
By a careful check of the above proofs, the choice of C in \(\delta _{n}=(\frac{C\ln n}{n})^{\frac{s}{2s+d}}\) should be chosen large in order to ensure the existence of the constants \(c_{1},c_{2},c_{3},c_{4}\). In particular, when \(r\ne 1\) and \(r\ne \infty \), we can choose \(C=1\) (i.e., \(\delta _{n}=(\frac{\ln n}{n})^{\frac{s}{2s+d}}\)), because the lower bound \(\max \{1,~(2M)^{\frac{d}{s}}\}\) of the constants \(c_1,c_3\) is unnecessary for \(1< r<\infty \).
4 Proofs
Now, we are ready to prove Theorem 1.1 and Theorem 1.2 respectively.
4.1 Proof of Theorem 1.1
Proof
According to Theorem 2.1, one obtains that
where \(U_{f}(x)\) is given by (3.10). This with Proposition 3.1 implies
Here, \(J_{m_{0}}^{-}\) and \(J_{m}\) are defined in Proposition 3.2.
To complete the proof, one divides (4.1) into four regions. Recall that \(2^{m_{0}}\thicksim \delta _n^{\frac{sr+d}{sr+dr-d}},~2^{m_{2}}\thicksim \delta _n^{-1}\) and \(\delta _n\thicksim (\frac{\ln n}{n})^{\frac{s}{2s+d}}\) by (3.12)–(3.13). Then with Proposition 3.2, the following estimations are established.
(i). For \(p\le \frac{2sr+dr}{sr+d}\),
Next, one continues to show the proofs of the rest regions based on the fact that \((2^{m_0}\delta _n)^{p-1}<\delta _n^{p}\) follows from \(p>\frac{2sr+dr}{sr+d}\).
(ii). For \(\frac{2sr+dr}{sr+d}< p<\frac{2sr}{d}+r\),
(iii). For \(p\ge \frac{2sr}{d}+r\),
(iv). For the case \(p\ge \frac{2sr}{d}+r\) and \(s>\frac{d}{r}\). Take \(m_1\in {\mathbb {Z}}\) satisfying
by balancing \(2^{m_{1}(p-r-\frac{2sr}{d})}\delta _n^{p}\) and \(2^{-\frac{2m_{1}s'p}{d}}\delta _n^{\frac{s'}{s}p}\). Then it follows from \(r<p\) and \(s>\frac{d}{r}\) that \(0<m_1<m_2\). Hence,
Then due to the choice of \(2^{m_1}\), \(\delta _n\thicksim (\frac{\ln n}{n})^{\frac{s}{2s+d}}\) and \(s'=s-\frac{d}{r}+\frac{d}{p}\), the above inequality reduces to
The proof of Theorem 1.1 is finished thanks to (4.1)–(4.5). \(\square \)
4.2 Proof of Theorem 1.2
Proof
It is easy to show that
This with (1.13) and (1.14) leads to
Obviously, \(|{\widehat{f}}_{n,|I|}(x_{I})|\le |{\widehat{f}}_{n,|I|}(x_{I})-f_{|I|}(x_I)|+|f_{|I|}(x_{I})|\) and
On the other hand, \(|f_{|I|}(x_{I})|\lesssim 1\) follows from \(f_{|I|}\in L^{\infty }(M)\). Combining this with (4.6) and (4.7), one concludes that
Note that \(|{\widehat{f}}_{n,|I|}(x_{I})-f_{|I|}(x_{I})|^{(d-1)p}\) and \(|{\widehat{f}}_{n,|I|}(x_{I})-f_{|I|}(x_{I})|^{p}\) attain their maximum values for the same I. Therefore,
which implies that
According to Theorem 1.1 and \(f\in B_{r,q}^{s}(M,{\mathcal {P}})\cap L^{\infty }(M,{\mathcal {P}})\), one obtains that
and
Moreover, it follows from (1.11) that for each \(I\in {\mathcal {P}}\),
Hence, in order to conclude the final conclusion of Theorem 1.2, it is sufficient to show \(\beta (pd,|I|)d\ge \beta (p,|I|)\) for each \(I\in {\mathcal {P}}\) because of (4.8)–(4.11).
It is equivalent to prove that \(\beta (pd,\ell )d\ge \beta (p,\ell )\) holds for each \(\ell \in \{1,\cdots ,d\}\). By (1.12),
Therefore, (i). For \(p\ge \frac{2sr}{\ell }+r\) and \(s>\frac{\ell }{r}\),
(ii). For \(p\ge \frac{2sr}{\ell }+r\) and \(s\le \frac{\ell }{r}\),
(iii). If \(\frac{2sr+\ell r}{sr+\ell }< p<\frac{2sr}{\ell }+r\), then the possible values of \(\beta (pd,\ell )d\) are \(\frac{d(s-\frac{\ell }{r}+\frac{\ell }{pd})}{2(s-\frac{\ell }{r})+\ell }\) (for \(s>\frac{\ell }{r}\)), \(\frac{s r}{p\ell }\) (for \(s\le \frac{\ell }{r})\) and \(\frac{ds}{2s+\ell }\). Clearly,
Hence, \(\beta (pd,\ell )d\ge \frac{s}{2s+\ell }=\beta (p,\ell )\) holds in this region.
(iv). If \(p\le \frac{2sr+\ell r}{sr+\ell }\), then the possible values of \(\beta (pd,\ell )d\) are \(\frac{d(s-\frac{\ell }{r}+\frac{\ell }{pd})}{2(s-\frac{\ell }{r})+\ell }\) (for \(s>\frac{\ell }{r}\)), \(\frac{s r}{p\ell }\) (for \(s\le \frac{\ell }{r})\), \(\frac{ds}{2s+\ell }\) and \( \frac{ds(1-\frac{1}{pd})}{s+\ell -\frac{\ell }{r}}\). Due to (4.12) and \(d\ge 1\),
\( \min \left\{ \frac{d(s-\frac{\ell }{r}+\frac{\ell }{pd})}{2(s-\frac{\ell }{r})+\ell } ,~\frac{s r}{p\ell },~\frac{ds}{2s+\ell }\right\} \ge \frac{s}{2s+\ell } \ge \frac{s(1-\frac{1}{p})}{s+\ell -\frac{\ell }{r}} \quad \text{ and }\quad \frac{ds(1-\frac{1}{pd})}{s+\ell -\frac{\ell }{r}}\ge \frac{s(1-\frac{1}{p})}{s+\ell -\frac{\ell }{r}}. \)
Therefore, \(\beta (pd,\ell )d\ge \frac{s(1-\frac{1}{p})}{s+\ell -\frac{\ell }{r}}=\beta (p,\ell )\) follows in this region.
The proof is done. \(\square \)
References
Bertagnolle, J., Huber, C.: Estimation des densités: risque minimax. Z. Wahrsch. Verw. Gebiete 47, 119–137 (1979)
Daubechies, I.: The wavelet transform, time-frequency localization and signal analysis. IEEE Trans. Inform. Theory 36, 961–1005 (1990)
Devroye, L., Györfi, L.: Nonparametric Density Estimation: The \(L^1\) view. Wiley, New York (1985)
Donoho, D.L., Johnstone, M.I., Kerkyacharian, G., Picard, D.: Density estimation by wavelet thresholding. Ann. Stat. 24, 508–539 (1996)
Goldenshluger, A., Lepski, O.: On adaptive minimax density estimation on \({\mathbb{R}}^{d}\). Probab. Theory Relat. Fields. 159, 479–543 (2014)
Guo, H.J., Kou, J.K.: Non linear wavelet estimation of regression derivatives based on biased data. Commun. Statist. Theory Methods 48, 3219–3235 (2019)
Guo, H.J., Kou, J.K.: Pointwise density estimation for biased sample. J. Comput. Appl. Math. 361, 444–458 (2019)
Härdle, W., Kerkyacharian, G., Picard, D., Tsybakov, A.: Wavelets. Approximation and Statistical Applications. Springer, New York (1998)
Juditsky, A., Lambert-Lacroix, S.: On minimax density estimation on \({\mathbb{R}}\). Bernoulli 10, 187–220 (2004)
Kerkyacharian, G., Picard, D.: Density estimation in Besov spaces. Statt. Probab. Lett. 13, 15–24 (1992)
Kou, J.K., Liu, Y.M.: Non parametric regression estimations over \(L^p\) risk based on biased data. Commun. Stat. Theory Methods 46, 2375–2395 (2017)
Liu, Y.M., Zeng, X.C.: Asymptotic normality for wavelet deconvolution density estimators. Appl. Comput. Harmon. Anal. 48, 321–342 (2020)
Madych, W.R.: Some elementary properties of multiresolution analyses of \(L^2({\mathbb{R}}^n)\). In: Chui, C.K. (ed.) Wavelets: A Tutorial in Theory and Applications. Academic Press, Boston (1992)
Mallat, S.: Multiresolution approximations and wavelet orthonormal bases for \(L^2({\mathbb{R}})\). Trans. Am. Math. Soc. 315, 69–87 (1989)
Mallat, S.: A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 11, 674–693 (1989)
Massart, P.: Concentration inequalities and model selection. In: Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour. Springer, Berlin (2007)
Meyer, Y.: Wavelets and Operators. Cambridge University Press, Cambridge (1992)
Reynaud-Bouret, P., Rivoirard, V., Tuleau-Malot, C.: Adaptive density estimation: a curse of support? J. Stat. Plan. Inference 141, 115–139 (2011)
Rebelles, G.: Pointwise adaptive estimation of a multivariate density under independence hypothesis. Bernoulli 21, 1984–2023 (2015)
Strang, G., Nguyen, T.: Wavelets and Filter Banks. Wellesley-Cambridge Press, Wellesley (1996)
Vetterli, M., Herley, C.: Wavelets and filter banks: theory and design. IEEE. Trans. Signal Proc. 40, 2207–2232 (1992)
Wu, C., Zeng, X.C., Mi, N.: Adaptive and optimal pointwise deconvolution density estimations by wavelets. Adv. Comput. Math. 47(2021), Artile Number: 14
Zeng, X.C.: A note on wavelet deconvolution density estimation. Int. J. Wavelets Multiresolut. Inf. Process. 15(2017), Article Number: 1750055
Acknowledgements
The authors would like to thank Prof. Youming Liu (Beijing University of Technology, China) for his important comments and suggestions. This work is supported by the National Natural Science Foundation of China (Nos. 11901019 and 12101459), and the Science and Technology Program of Beijing Municipal Commission of Education (No. KM202010005025).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cao, K., Zeng, X. Adaptive Wavelet Density Estimation Under Independence Hypothesis. Results Math 76, 196 (2021). https://doi.org/10.1007/s00025-021-01506-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00025-021-01506-2