Abstract
By using a kernel method, Lepski and Willer establish adaptive and optimal \(L^p\) risk estimations in the convolution structure density model in 2017 and 2019. They assume their density functions to be in a Nikol’skii space. Motivated by their work, we first use a linear wavelet estimator to obtain a point-wise optimal estimation in the same model. We allow our densities to be in a local and anisotropic Hölder space. Then a data driven method is used to obtain an adaptive and near-optimal estimation. Finally, we show the logarithmic factor necessary to get the adaptivity.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The deconvolution estimation is an important topic in statistics. In this paper, we consider the generalized deconvolution model introduced in [21, 22].
Let \((\Omega ,\mathcal {F},\mathbf {P})\) be a probability space and \(Z_1,Z_2,\ldots ,Z_n\) be independent and identically distributed (i.i.d.) random variables of
where X stands for a real-valued random variable with unknown probability density f on \(\mathbb {R}^d\), Y denotes an independent random noise (error) with the probability density g and \(\varepsilon \in \{0,1\}\) Bernoulli random variable with \(P\{\varepsilon =1\}=\alpha \), \(\alpha \in [0,1]\). The purpose is to estimate f by the observed data \(Z_1,Z_2,\ldots ,Z_n\) in some sense.
When \(\alpha =1\), (1.1) reduces to the classical deconvolution model, while \(\alpha =0\) corresponds to the traditional density estimation.
Clearly, the density h of Z in (1.1) satisfies
because \(P\{Z<t\}=P\{\varepsilon =0\}P\{X+\varepsilon Y|_{\varepsilon =0}<t\} +P\{\varepsilon =1\}P\{X+\varepsilon Y|_{\varepsilon =1}<t\}=(1-\alpha )P\{X<t\}+\alpha P\{X+Y<t\}\). Furthermore,
when the function
has nonzeros on \(\mathbb {R}^d\), where \(f^{ft}\) is the Fourier transform of \(f\in L^1(\mathbb {R}^d)\) defined by
Under some mild assumptions on \(G_{\alpha }(t)\), Lepski and Willer [21] establish an asymptotic lower bound of \(L^p\) risk for model (1.1). Recently, they provide an adaptive and optimal estimate of \(L^p\) risk over an anisotropic Nikol’skii space by using kernel method in [22].
When \(\alpha =0\), References [9] and [17] deal with linear wavelet estimations; Nonlinear wavelet estimations are studied for adaptivity [6, 8, 25]. Kernel estimation with selection rule can be found in References [12, 13] and [20].
For \(\alpha =1\), consistency of deconvolution estimators are investigated [7, 24, 29]. Pensky and Vidakovic and Fan and Koo obtain the optimal convergence rates for \(L^2\) risk in Sobolev and Besov spaces respectively [11, 31]; Lounici and Nickl [28] provide the optimal \(L^\infty \) risk estimate, while Li and Liu discuss optimal \(L^p\) risk estimation for \(p\in [1,\infty )\), see [23].
In contrast to the above \(L^p\) risk estimation, we consider point-wise estimations for model (1.1) in this paper, because it is more concerned in some applications. For a density function set \(\Sigma \), the maximal pointwise risk at \(x\in \mathbb {R}\) over \(\Sigma \) means
where \(\hat{f}_n(x):=\hat{f}_n(x;Z_1,Z_2,\ldots ,Z_n)\) and EX is the expectation of X. An estimator \(\hat{f}_n^*\) is said to be optimal over \(\Sigma \), if
Here and throughout, \(A\lesssim B\) denotes \(A\le CB\) for some absolute constant \(C>0\); \(A > rsim B\) means \(B\lesssim A\); we use \(A\sim B\) to stand for both \(A > rsim B\) and \(A\lesssim B\). Clearly,
holds automatically.
In the case \(\alpha =0\), there are existing references dealing with pointwise estimation in \(L^2\) Sobelev ball \(W_2^s(\mathbb {R}, M)\) [1, 2, 38]. Rebelles [32] discusses pointwise adaptive estimation under independence hypothesis in Nikoskii space.
In the case \(\alpha =1\) and the dimension \(d=1\), Fan shows with a kernel estimator \(\hat{f}_n\),
when the characteristic function \(g^{ft}\) of Y satisfies
Moreover, the estimate attains the optimal rate [10]. This above result extends the work of Carroll and Hall and Stefanski and Carroll [4, 34], in which the lower and upper bounds are investigated respectively for Gaussian noise. In 2013, Comte and Lacour [5] study the pointwise optimal estimation in anisotropic Hölder ball \(H^{\mathbf {s}}(\mathbb {R}^d, M)\) with both moderately and severely ill-posed noise by kernel method.
It should be pointed out that for \(\alpha \in (0,1)\), Hesse [14] provides an upper bound estimation for twice differentiable density functions, when the noise density g satisfies that \(\inf \limits _{t}g(t)\ge \alpha \) and \(g^{ft}(t)\) is twice continuously differentiable. Yuan and Chen generalize that work to high dimensional case [40]. However, they don’t consider the optimality of their estimations. More related work can be found in [3, 27, 30, 36, 39]. The parametric estimation is referred to [16].
Most of them assume the global smoothness of the estimated functions. It is more natural to require some smoothness of f at \(x_0\), in order to estimate \(f(x_0)\). In this current work, we study the same problem for \(\alpha \in [0,1]\) over a local Hölder space. As in [26], we introduce for \(l=1,\ldots ,d\) and \(k=1,2,\ldots \),
Definition 1.1
For \(\mathbf {s}=(s_1,\ldots ,s_d)\) with \(s_l>0\), a function \(f:R^d\rightarrow R\) is said to satisfy a local and anisotropic H\(\ddot{o}\)lder condition with exponent vector \(\mathbf {s}\) at \(x_0\in \mathbb {R}\), if there exists a neighborhood \(\Omega _{x_0}\) of \(x_0\) such that for \(x,x+te_l\in \Omega _{x_0}\) and \(t\in \mathbb {R}\),
where \(L>0\) is a fixed constant and \([s_l]\) stands for the largest integer strictly smaller than \(s_l\). All those functions are denoted by \(H^{\mathbf {s}}(\Omega _{x_0},L)\).
When \(\mathbf {s}=(s,\ldots ,s)\), a function f in global Hölder ball \(H^s(\mathbb {R}^d,L)\) must be in \(H^{\mathbf {s}}(\Omega _{x},L)\) for each \(x\in \mathbb {R}^d\). However, the converse is not necessarily true, see Example 1 in [26].
In this paper, we use \(H^{\mathbf {s}}(\Omega _{x_{0}},L,~M)\) to stand for the density set
Before posing some assumptions on noise Y in model (1.1), we recall that g is the density of Y and \(G_{\alpha }(t)=1-\alpha +\alpha g^{ft}(t)\). For \(\beta _l\ge 0~(l=1,\ldots ,d)\), denote
The following two conditions on noise Y are used in our discussions.
-
(C1)
\(|G_{\alpha }(t)| > rsim \prod _{l=1}^d (1+|t_l|^2)^{-\frac{\beta _l(\alpha )}{2}}\text { with }t=(t_1,\ldots ,t_d)\);
-
(C2)
\(|D_k^mG_{\alpha }(t)|\lesssim \prod _{l=1}^d(1+|t_l|^2)^{-\frac{\beta _l(\alpha )}{2}}\text { for }k\in \{1,\ldots ,d\}\text { and }0\le m\le d.\)
Condition (C1) is the same as Assumption 4 in [22], which will be needed only for upper bound estimation as there. For \(0<\alpha <\frac{1}{2}\),
thanks to \(|g^{ft}(t)|\le 1\), and therefore (C1) holds automatically. If \(0\le \alpha <1\) and \(g^{ft}(t)\ge 0\), then (C1) holds as well. Examples for \(g^{ft}(t)\ge 0\) include
with a density \(p(x)=p(-x)\). In fact, \(p(x)=p(-x)\) implies \(p^{ft}\) real and \(g^{ft}(t)=[p^{ft}(t)]^{2N}\ge 0\) for that case.
The lower bound estimation needs condition (C2) which is a little weaker than Assumption 1–2 in [21]. Clearly, (C2) can be rewritten as
-
(C2-1)
\(|G_{\alpha }(t)|\lesssim \prod _{l=1}^d(1+|t_l|^2)^{-\frac{\beta _l(\alpha )}{2}}\) with \(t=(t_1,\ldots ,t_d)\);
-
(C2-2)
\(|D_k^mG_{\alpha }(t)|\lesssim \prod _{l=1}^d(1+|t_l|^2)^{-\frac{\beta _l(\alpha )}{2}}\) for \(1\le k\le d\) and \(1\le m\le d\).
When \(\alpha <1\), (C2-1) holds automatically.
In Sect. 2, we show a lower bound estimation. Let \(\hat{f}_n(x;Z_1,\ldots ,Z_n)\) be an estimator of density functions in \(H^{\mathbf {s}}(\Omega _{x_{0}},L,~M)\) with \(\mathbf {s}=(s_1,\ldots ,s_d)\) and \(s_l>0\). Then with \(\frac{1}{s}=\frac{1}{d}\sum \limits _{l=1}^d\frac{1+2\beta _l(\alpha )}{s_l}\) and \(1\le p<+\infty \),
The anisotropic linear wavelet estimator \(\hat{f}_n\) is constructed to attain the optimal convergence rate \(n^{-\frac{s}{2s+d}}\) on \(H^{\mathbf {s}}(\Omega _{x_{0}},L,~M)\) in the next section. To get adaptivity, we use a data driven estimator to obtain a convergence rate
in Sect. 4. Furthermore, it will be proved in Sect. 5 that the losing convergence rate \((\ln n)^{-\frac{s}{2s+d}}\) is necessary for the adaptivity. Some concluding remarks are provided in the last section.
2 Lower Bound
We use Fano’s Lemma to give our lower bound estimation in this part. Let P, Q be two probability measures with density functions p, q respectively. If P is absolutely continuous with respect to Q (denoted by \(P\ll Q\)), then the Kullback–Leilber divergence between P and Q means
Lemma 2.1
(Fano’s Lemma) Let \((\Omega ,\mathcal {F},P_k)\) be probability measurable spaces and \(A_k\in \mathcal {F}~(k=0,1,\ldots ,m)\). If \(A_k\bigcap A_v=\emptyset \) for \(k\ne v\), then with \(A^c\) standing for the complement of A and \(\mathcal {K}_m:=\inf \limits _{0\le v\le m}\frac{1}{m}\sum \limits _{k\ne v}K(P_k,P_v)\),
Theorem 2.1
Let g satisfy (C2) and \(1/s:=1/d\sum \nolimits _{l=1}^d(1+2\beta _l(\alpha ))/s_l\) with \(\mathbf {s}=(s_1,\ldots ,s_d)\) and \(s_l>0\). Then for \(M>(2\pi )^{-d/2}\) and \(1\le p\le \infty \),
where \(\hat{f}_n\) runs over all possible estimators of \(f\in H^{\mathbf {s}}(\Omega _{x_{0}},L,~M)\).
Proof
Let \(\tilde{f}_0(t):=\frac{1}{\pi (1+t^2)}\) be the one dimensional Cauchy density and
with \(x=(x_1,x_2,\ldots ,x_d),~x_0= (x_{01},x_{02},\ldots ,x_{0d})\in \mathbb {R}^d\). With the Meyer wavelet function \(\psi \), one introduces a sequence of functions \(\Psi _n~(n=1,2,\ldots )\) by
Because \(|\psi (t)|\lesssim (1+|t|^2)^{-1}\) and \(\delta _{nl}\le 1\),
Define \(f_n(x):=f_0(x)+\Psi _n(x)~(n=1,2,\ldots )\) and
Then for each \(x\in \mathbb {R}^d\), \(f_n(x)>0\) as n sufficiently large, because of \(f_0(x)>0\), (2.2) and \(\lim \limits _{n\rightarrow \infty }a_n=0\). In addition,
thanks to \(\int _{\mathbb {R}}\psi (t)dt=0\) and \(\int _{\mathbb {R}^d}\Psi _n(x)dx=0\). Hence, \(f_n\) is a density function for large n.
Since the arbitrary-order derivative of \(\tilde{f}_{0}(t)\) is bounded, one finds \(\tilde{f}_{0}\in H^{\tilde{s}}(\mathbb {R})\) for each \(\tilde{s}>0\), and therefore \(f_0\in H^{\mathbf {s}}(\Omega _{x_0},L)\). Note that
thanks to (2.1) and \(\psi \in H^{s_l}(\mathbb {R})\). Then \(\Psi _n\in H^{\mathbf {s}}(\Omega _{x_0},L)\) and \(f_n\in H^{\mathbf {s}}(\Omega _{x_0},L)\). On the other hand, \(\Vert f_0\Vert _{\infty }\le \frac{1}{\pi ^d}\) and \(\Vert f_n\Vert _{\infty }\le \Vert f_0\Vert _{\infty }+ \Vert \Psi _n\Vert _{\infty }\lesssim 1\) because of (2.2) and \(a_n\lesssim 1\). Hence, \(\{f_0,~f_n\}\subseteq H^{\mathbf {s}}(\Omega _{x_{0}},L,~M)\) with some \(M>0\).
By \(\psi (0)\ne 0\) [37], \(|f_n(x_0)-f_0(x_0)|=a_n|\psi (0)|^d > rsim a_n\). For \(\tilde{x}=(x^1,~\ldots ,~x^n)\) with \(x^k\in \mathbb {R}^d\) and \(1\le k\le n\), define two density functions
Then with \(A_\chi :=\left\{ |\hat{f}_n(x_0)-f_\chi (x_0)|<\frac{1}{2}a_n|\psi (0)|^d\right\} ~(\chi =0,n)\), \(A_0\bigcap A_n=\emptyset \) and
due to Lemma 2.1. On the other hand, Markov inequality tells
This with Jensen’s inequality and (2.3) shows
It remains to prove \(\kappa _1\lesssim 1\) for finishing the proof of Theorem 2.1.
By (2.2) and the definition of \(h_n\), one finds \(P_{h_n}\ll P_{h_0}\) and
Since \(Z_i\) are i.i.d, \(K(P_{H_n},P_{H_0})=nK(P_{h_n},P_{h_0})\) and
thanks to \(\ln u\le u-1~(u>0)\). This with \(\kappa _1:=\inf \limits _{v=0,n}\sum \limits _{\chi \ne v}K(P_{H_\chi },P_{H_v})\le K(P_{H_n},P_{H_0})\) and the definition of \(h_0\) shows
Clearly,
Next, one shows
By (2.6) and Fatou’s Lemma,
Hence, there exists \(A>0\) such that
Since g is a density, one finds \(\int _{|y-x_0|\le B-A}g(y)dy\ge \frac{1}{2}\) for some \(B>A\). When \(|x-x_0|\le A\) and \(|y-x_0|\le B-A\), \(|x-y-x_0|\le |x-x_0|+|y-x_0| +|x_0|\le B+|x_0|\) and
This with (2.8) reaches (2.7).
According to (2.6)–(2.7), \(|(1-\alpha )f_0(x)+\alpha (f_0*g)(x)| > rsim (1+|x|^2)^{-d}\) for each \(\alpha \in [0,1]\). Combining this with (2.5) and the definitions of \(h_n,~h_0\), one concludes
where
It is easy to see that
where the first inequality comes from Parseval identity and the definition of \(G_{\alpha }(t)\); the second one is true due to (C2-1); the third inequality holds because of the definition of \(\Psi _n\) in (2.1); the last one follows from supp\(~\psi ^{ft}\subset [2\pi /3,8\pi /3]\). Furthermore,
thanks to definition of \(\delta _{nl}\), s and \(a_n\).
To estimate \(I_{2n}\), denote \(q:={\Psi }_n^{ft}\cdot G_{\alpha }\). Note that \([(1-\alpha )\psi _n(\cdot )+\alpha (\psi _n*g)(\cdot )]^{ft}(t)= \psi _n^{ft}(t)G_{\alpha }(t)\). Then \(q^{ft}(t)=(1-\alpha )\psi _n(-t)+\alpha (\psi _n*g)(-t)\) and
where one uses Parseval identity. By \(D_l^dq=\sum _{m=1}^dC_d^mD_l^m\Psi _n^{ft}\cdot D_l^{d-m}G_{\alpha }\) and (C2-2),
Because \(\delta _{nl}\in (0,1]\) and \(\Psi _n^{ft}(t)=a_n\prod \limits _{l=1}^d\delta _{nl}\psi ^{ft}(\delta _{nl}t_l)e^{-ix_{0l}t_l}\), similar arguments to (2.10)–(2.11) show
This with (2.9) and (2.11) leads to \(\kappa _1\lesssim 1\). The proof is completed. \(\square \)
Remark 2.1
When \(\alpha =0\), \(\beta _l(\alpha )=0\) and Theorem 2.1 reduces to Theorem 2 in [26]. Since (2.7) plays a key role in our proof, we choose \(f_0\) to be the tensor product of the Cauchy density instead of the Gauss function for the case \(\alpha =0\); In the definition of
\(\Psi _n(x)\) is defined by the Meyer’s wavelet function \(\psi \) (see (2.1)) whose Fourier transform has compact support. In contrast to that, a compactly supported wavelet \(\psi \) is used in [26].
Remark 2.2
From the proof of Theorem 2.1, we find the function \(f_n\in H^{\mathbf {s}}(\mathbb {R}^d,L,~M)\) for \(n=1,2,\ldots \). Therefore, the conclusion of Theorem 2.1 can be replaced by
When \(\alpha =1\), the above estimate reduces to Theorem 1 (Case A) of Comte and Lacour [5]; When \(\alpha =0\), it coincides with Theorem 2 (\(r>0\)) of Rebelles [32].
3 Upper Bound
This section shows an upper bound of the point-wise risk for an anisotropic wavelet estimator, which matches the lower bound given in Theorem 2.1. To do that, we firstly introduce an anisotropic wavelet basis. Then our wavelet estimator is defined by that basis. After giving two lemmas, we finally prove the main result (Theorem 3.1) in this section.
Let \(\mathbb {N}\) be the nonnegative integer set and \(\{V_j:j\in \mathbb {N}\}\) a classical orthonormal multiresolution analysis of \(L^2(\mathbb {R})\) with scaling and wavelet functions \(\phi ,~\psi \). For \(\mathbf {r}=(r_1,\ldots ,r_d)\) with \(r_1, \ldots , r_d>0\) and \(\sum \nolimits _{i=1}^d r_i=d\), we define \(V_{j}^{r_i}=V_{\lfloor jr_i\rfloor }\) for \(j\ge 0\) and \(i\in \{1,\ldots ,d\}\), where \(\lfloor a\rfloor \) denotes the integer part of a. With \(x=(x_1,\ldots ,x_d)\in \mathbb {R}^d\) and \(\mathbf {k}=(k_1,\ldots ,k_d)\in \mathbb {Z}^d\) (the multiple integer set),
constitutes an orthonormal basis for \(\mathbf {V}_j^\mathbf {r}=\bigotimes \limits _{i=1}^dV_{j}^{r_i}\). Moreover, \(\mathbf {V}_{j+1}^\mathbf {r}=\mathbf {V}_j^\mathbf {r}\oplus \mathbf {W}_j^\mathbf {r}\) with \(\mathbf {W}_j^\mathbf {r}=\bigoplus _{\gamma \in \Gamma }\left( \bigotimes _{i=1}^d W_{\lfloor jr_i\rfloor }^{\gamma _i}\right) \) and \(\Gamma =\{0,1\}^d\backslash \{0\}^d \), see [35].
Denote for \(\gamma _i=1\),
with some normalized constant c, and \(\psi _{{\lfloor jr_i\rfloor },~k_i}^{\gamma _i} (x_i)=\phi _{{\lfloor jr_i\rfloor },~k_i}(x_i)\) for \(\gamma _i=0\). Then \(\Psi _{j\mathbf {r};~\mathbf {k}}^{\gamma }(x)=\prod _{i=1}^d \psi _{{\lfloor jr_i\rfloor },~k_i}^{\gamma _i}(x_i)\) forms an orthonormal basis of \(\bigotimes \limits _{i=1}^d W_{\lfloor jr_i\rfloor }^{\gamma _i}\). Hence, for each \(f\in L^2(\mathbb {R}^d)\),
holds in the \(L^2(\mathbb {R}^d)\) sense, where
In fact, (3.1) holds point-wisely when \(\phi \) and \(\psi \) are chosen to be continuous and compactly supported, see Remark 2 in [26].
When \(\mathbf {r}=(1,\ldots ,1)\), this above anisotropic wavelet basis reduces to the traditional tensor wavelet basis. The flexibility of \(\mathbf {r}\) plays a key role for the estimation of anisotropic density functions.
To introduce our estimator, we choose a compactly supported and continuous \(\phi \) with \(|\phi ^{ft}(t)|\lesssim (1+|t|)^{-m}~(m\ge \max \limits _l\beta _l(\alpha )+2)\). Then
is well-defined under the assumption (C1). Define our linear wavelet estimator by
with i.i.d samples \(Z_i\). The vector \(j\mathbf {r}\) will be specified in Theorem 3.1. By \(Ee^{-itZ_i}=h^{ft}(t)=G_{\alpha }(t)f^{ft}(t)\) and Plancherel formula,
and \(E\hat{f}_n=P_jf:=\sum _{\mathbf {k}\in \mathbb {Z}^d}\alpha _{j\mathbf {r},\mathbf {k}} \Phi _{j\mathbf {r};~\mathbf {k}}\).
For \(t=(t_1,\ldots ,t_d)\in \mathbb {R}^d\), denote \(\lfloor t\rfloor :=(\lfloor t_1\rfloor ,\ldots ,\lfloor t_d\rfloor )\) and \(|\lfloor t\rfloor |:=\lfloor t_1\rfloor +\ldots +\lfloor t_d\rfloor \). Then the following lemma holds.
Lemma 3.1
Let \(\hat{\alpha }_{j\mathbf {r},\mathbf {k}}\) be given by (3.4), \(2^{|\lfloor j\mathbf {r}\rfloor |}\le n\) and \(\Vert f\Vert _{\infty }\lesssim 1\). Then
where \(\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )\) stands for the inner product of \(\lfloor j\mathbf {r}\rfloor \) and \(\beta (\alpha )\).
Proof
By the definitions of \(\hat{\alpha }_{j\mathbf {r},\mathbf {k}}\) and \(\alpha _{j\mathbf {r},\mathbf {k}}\),
with \(\zeta _i=(K\phi )_{jk}(Z_i)-E(K\phi )_{jk}(Z_i)\). According to (3.3) and the definition of \(\Phi _{j\mathbf {r};~\mathbf {k}}\), one knows
This with (C1) and \(|\phi ^{ft}(t)|\lesssim (1+|t|^2)^{-m}~(m>\max \limits _l{\beta _l(\alpha )}+2)\) leads to
Hence,
By \(\Vert f\Vert _{\infty }\lesssim 1\), \(\Vert h\Vert _{\infty }=\Vert (1-\alpha )f+\alpha f*g\Vert _{\infty }\lesssim 1\) and
which reduces to
because of the definition of \((K\phi )_{jk}\) and Parseval identity. Similar to the arguments of (3.6), one obtains
According to Rosenthal’s inequality,
Since \(2^{|\lfloor j\mathbf {r}\rfloor |}\le n\), \((n^{-1}2^{|\lfloor j\mathbf {r}\rfloor |})^{\frac{p}{2}-1}\lesssim 1\) for \(p>2\). Combining these with (3.5), one receives the desired estimate
\(\square \)
We also need another lemma in [26] for the proof of Theorem 3.1.
Lemma 3.2
Let \(\beta _{j\mathbf {r},\mathbf {k}}^{\gamma }\) be defined in (3.2). Then for \(j\in \mathbb {N}\),
Theorem 3.1
Let g satisfy (C1) and \(1/s:=1/d\sum \limits _{l=1}^d(1+2\beta _l(\alpha ))/s_l\) with \(s_l>0~(l=1,\ldots ,d)\). Then with \(2^{\lfloor jr_l\rfloor }\sim n^{\frac{s}{2s+d}\frac{1}{s_l}}\) and \(1\le p<\infty \), the estimator \(\hat{f}_n\) in (3.4) satisfies
Proof
One begins with an inequality,
By (3.1),
This with Lemma 3.2 and \(2^{\lfloor jr_l\rfloor }\sim n^{\frac{s}{2s+d}\frac{1}{s_l}}\) leads to
On the other hand, \(|\hat{f}_n(x)-P_{j}f(x)|=|\sum _{\mathbf {k}}(\hat{\alpha }_{j\mathbf {r},\mathbf {k}}- \alpha _{j\mathbf {r},\mathbf {k}})\Phi _{j\mathbf {r};~\mathbf {k}}(x)|\).
According to Hölder inequality with \(\frac{1}{p}+\frac{1}{p'}=1\),
Furthermore,
thanks to Lemma 3.1. Since \(\phi \) is assumed to be compactly supported and continuous, \(\sum \nolimits _{k}|\phi (x-k)|\lesssim 1\) and \(\sum _{\mathbf {k}} |\Phi _{j\mathbf {r};~\mathbf {k}}(x)|\lesssim 2^{|\lfloor j\mathbf {r}\rfloor |/2}\). Combining this with (3.11), one finds
Then it follows from the assumptions \(1/s:=1/d\sum \limits _{l=1}^d(1+2\beta _l(\alpha ))/s_l\) and \(2^{\lfloor jr_l\rfloor }\sim n^{\frac{s}{2s+d}\frac{1}{s_l}}\) that
This with (3.9)–(3.10) shows the desired conclusion. \(\square \)
Remark 3.1
When replace the choice \(2^{\lfloor jr_l\rfloor }\sim n^{\frac{s}{2s+d}\frac{1}{s_l}}\) by \(2^{\lfloor jr_l\rfloor }\sim (\frac{n}{\ln n})^{\frac{s}{2s+d}\frac{1}{s_l}}\), we find
from the proof of Theorem 3.1.
Remark 3.2
Comparing with References [32] and [40], we use wavelet method instead of kernel one. Taking \(\alpha =0\) in Theorem 3.1, we obtain Theorem 3 [32]; In the case \(\alpha \in (0,1)\) and \(\mathbf {s}=(2,2,\ldots ,2)\), our theorem implies Theorem 2.1 in [40].
When \(\alpha =1\), Theorem 3.1 coincides with Theorem 4.5 in [38]; When \(\alpha =0\), our Theorem reduces to Theorem 3 in [26]. The linear wavelet estimator \(\hat{f}_n\) in Theorem 3.1 is not adaptive, since it depends on the unknown vector \(\mathbf {s}=(s_1,\ldots ,s_d)\). We may expect to consider the nonlinear wavelet estimator to get the adaptivity as in the classical case. However, it seems hard even for \(\alpha =0\), see Remark 3 in [26]. In the next section, we give adaptive and near-optimal estimations by using data driven strategy.
4 Adaptive Estimation
Since the linear wavelet estimator \(\hat{f}_n\) in Theorem 3.1 is not adaptive, this section provides an adaptive and near-optimal estimations on \(H^{\mathbf {s}}(\Omega _{x_0},~L,~M)\), see Theorem 4.1. Motivated by the work of [32] and [12], we use the linear wavelet estimator \(\hat{f}_n\) to define an auxiliary estimator \(\hat{f}_{j\mathbf {r},j^*\mathbf {r}^*}(x)\). After introducing a subset \(\mathcal {H}^d\) of \(\mathbb {R}^d\), we give a selection rule to determine \(j_0\mathbf {r}_0\) and the desired estimator \(\hat{f}_{j_0\mathbf {r}_0}(x)\).
Rewrite the linear wavelet estimator \(\hat{f}_n\) in Theorem 3.1 as \(\hat{f}_{j\mathbf {r}}:=\hat{f}_n\), since it depends on \(\lfloor jr_l\rfloor \). Then define an auxiliary estimator
where \(j\mathbf {r}\wedge j^*\mathbf {r}^*:=j\mathbf {r}\) for \(\min \limits _{1\le l\le d}jr_l\le \min \limits _{1\le l\le d}j^*r^*_l\), and \(j\mathbf {r}\wedge j^*\mathbf {r}^*:=j^*\mathbf {r}^*\) otherwise. With the constant \(\lambda \) specified after (4.12) on page 19,
satisfies
Again, \({\mu }_{j\mathbf {r}}\) depends on \(\lfloor jr_l\rfloor ~(1\le l\le d)\). When \(\alpha =0\), (4.1)–(4.2) are the same as (21)–(22) in [26].
Next, we introduce
When \(\alpha <1\), \(\beta (\alpha )=(0,\ldots ,0)\) and
Then \(j_0\mathbf {r}_0\in \mathcal {H}^d\) is determined by the following rule:
-
(i)
\(\hat{\xi }_{j\mathbf {r}}(x):=\max \limits _{j^*\mathbf {r}^*\in \mathcal {H}^d}[|\hat{f}_{j\mathbf {r},j^*\mathbf {r}^*}(x)-\hat{f}_{j^*\mathbf {r}^*}(x)|- {\mu }_{j^*\mathbf {r}^*}-{\mu }_{j\mathbf {r}}]_{+}\) with \(a_{+}:=\max \{0,a\}\);
-
(ii)
\(\hat{\xi }_{j_0\mathbf {r}_0}(x)+2{\mu }_{j_0\mathbf {r}_0}:=\min \limits _{j\mathbf {r}\in \mathcal {H}^d}[\hat{\xi }_{j\mathbf {r}}(x)+2{\mu }_{j\mathbf {r}}]\).
Although \(\mathcal {H}^d\) is an infinite set, the sets after “max” and “min” in (i) and (ii) are finite, because \(\hat{\xi }_{j\mathbf {r}}\) and \({\mu }_{j\mathbf {r}}\) depends on \(\lfloor jr_l\rfloor ~(l=1,\ldots ,d)\). Therefore \(\hat{\xi }_{j\mathbf {r}}(x)\) and \(j_0\mathbf {r}_0\) are well-defined. Clearly, \(j_0\mathbf {r}_0\) is completely decided by the known samples \(\{Z_k\}\). It doesn’t depend on any unknown information of f. To prove Theorem 4.1, we need two well-known lemmas.
Lemma 4.1
Let \((X,\mathcal {F},\mu )\) be a measurable space and \(f\in L^p(X,\mathcal {F},\mu )\) with \(0<p<\infty \). Then with \(\lambda (t):=\mu \{y\in X, ~|f(y)|>t\}\),
Lemma 4.2
(Bernstein inequality) Let \(X_1,\ldots ,X_n\) be i.i.d, \(EX_i=0\) and \(|X_i|\le \Vert X\Vert _{\infty }\). Then for each \(\gamma > 0\),
Theorem 4.1
Let g satisfy (C1) and \(1/s:=1/d\sum \limits _{l=1}^d(1+2\beta _l(\alpha ))/s_l\) with \(s_l>0~(l=1,\ldots ,d)\). Then for \(j_0\mathbf {r}_0\) given in the above selection rule,
Proof
With the choice \(2^{\lfloor j_1r_{1l}\rfloor }\sim (\frac{n}{\ln n})^{[s/(2s+d)]\cdot (1/s_l)} ~(1\le l\le d)\), Remark 3.1 tells
It is easy to see \(j_1\mathbf {r}_1\in \mathcal {H}^d\) because \(\sum _{l=1}^d [s/(2s+d)]\cdot (1/s_l)+\sum _{l=1}^d[s/(2s+d)]\cdot (2\beta _l(\alpha )/s_l) =[s/(2s+d)]\cdot \sum \nolimits _{l=1}^d(1+2\beta _l(\alpha ))/s_l= d/(2s+d)< 1\). By selection rule (i) and (ii),
Therefore,
According to \(2^{\lfloor j_1r_{1l}\rfloor }\sim (\frac{n}{\ln n})^{[s/(2s+d)]\cdot (1/s_l)}\) and \(1/s:=1/d\sum \limits _{l=1}^d(1+2\beta _l(\alpha ))/s_l\),
and \({\mu }_{j_1\mathbf {r}_1}^2\lesssim n^{-1}\ln \frac{n}{\ln n} (\frac{n}{\ln n})^{\frac{d}{2s+d}}\le (\frac{n}{\ln n})^{-2s/(2s+d)}\) due to (4.2). Furthermore, \({\mu }_{j_1\mathbf {r}_1}^p\lesssim (\frac{n}{\ln n})^{-sp/(2s+d)}\). These with (4.4) show that
where one uses the facts that \((|a|+|b|+|c|)^{\theta }\lesssim |a|^{\theta }+|b|^{\theta }+ |c|^{\theta }\) for \(\theta >0\), and \(\sup (|x|+|y|+|z|)\le \sup |x|+\sup |y|+\sup |z|\). Hence, it suffices for the desired conclusion (4.3) to show
By the same arguments as in [26],
For \(y> 0\), \(P\{(|\hat{f}_{j\mathbf {r}}-\mathrm{E}\hat{f}_{j\mathbf {r}}|- {\mu }_{j\mathbf {r}})_{+}\ge y\}=P\{|\hat{f}_{j\mathbf {r}}-\mathrm{E}\hat{f}_{j\mathbf {r}}|- {\mu }_{j\mathbf {r}}\ge y\}\). Then Lemma 4.1 tells
which turns out to be (by variable change)
According to the definition of \(\hat{f}_{j\mathbf {r}}\),
and \(\hat{f}_{j\mathbf {r}}(x)-\mathrm{E}\hat{f}_{j\mathbf {r}}(x)=1/n\sum _{i=1}^{n}\eta _i\) with
Clearly, \(\{\eta _i\}\) are i.i.d and \(E\eta _i=0\). By (3.6), \(|(K\phi )_{jk}(Z_i)|\le 2^{\frac{|\lfloor j\mathbf {r}\rfloor |}{2}+\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )}\) and
On the other hand,
thanks to (4.8) and the definition of \((K\phi )_{jk}\) (see (3.3)). Since \(\Vert f\Vert _{\infty }\le M\), \(\Vert h\Vert _{\infty }\le M\) and \(E\eta _i^2\) can be bounded by
Using Parseval identity and (C1), one knows
for some \(c_2>0\). Then it follows from Lemma 4.2 and (4.9)–(4.10) that
For \(j\mathbf {r}\in \mathcal {H}^d\), \((|\lfloor j\mathbf {r}\rfloor |+2\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )) 2^{(\lfloor j\mathbf {r}\rfloor +2\lfloor j\mathbf {r}\rfloor \beta (\alpha ))}\le n\) and \({\mu }_{j\mathbf {r}}\lesssim 1\) because of (4.2). Furthermore,
This with (4.1) concludes the right hand side of (4.11) bounded by
With the choice \(\lambda \ge \sqrt{c_3}\), (4.11) reduces to
Substituting this above estimate into (4.7), one obtains
due to (4.2) and \(|\lfloor j\mathbf {r}\rfloor |+2\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )\lesssim \ln n\). Finally, it follows from (4.6) and the choice \(2^{\lfloor j_1r_{1l}\rfloor }\sim (\frac{n}{\ln n})^{\frac{s}{2s+d}\frac{1}{s_l}}\) that
Because all constants in “\(\lesssim \)” of the whole proof are independent of \(x\in \Omega _{x_0}\) and \(f\in H^\mathbf {s}(\Omega _{x_0,L,M})\), the desired conclusion (4.5) reaches. This completes the proof. \(\square \)
Remark 4.1
Theorem 4.1 with \(\alpha =0\) and \(\alpha =1\) reduces to Theorem 4 [32] and Corollary 1 [5] respectively. From the proof of Theorem 4.1, we find that the parameter \(\lambda \) doesn’t depend on L, but it depends on M as in the classical wavelet density estimation.
Remark 4.2
Although Theorem 4.1 can be proved by the method in [22], our proof looks simpler and more elementary. In fact, it is not easy to prove pointwise oracle inequality for losses [22]; We use the obtained linear estimation to show Theorem 4.1 directly, which seems natural as well.
The selection rule used in this section is the same as that in Lepski and Willer [22]. However, we use wavelet estimators instead of kernel-type ones. For some density functions, a wavelet estimator picks up local information more effectively. In addition, wavelet estimation has fast algorithm [15, 19].
Remark 4.3
In contrast to the linear wavelet estimator \(\hat{f}_n\) in Theorem 3.1, the estimator \(\hat{f}_{j_0\mathbf {r}_0}\) in Theorem 4.1 is adaptive due to the selection rule (i)–(ii). Of course, we pay the price: the convergence rate loses \((\ln n)^{-\frac{s}{2s+d}}\). The next section shows it necessary to get the adaptivity.
5 Optimality
In this part, we show the convergence rate of Theorem 4.1 the best possible, for which the following lemma is needed.
Lemma 5.1
Let \(\Phi _{s_i}~(i=0,1)\) be a density set, \(f_{i,n}\in \Phi _{s_i}~(n=1,2,\ldots )\) and \(h_{i,n}:=(1-\alpha )f_{i,n}+\alpha (f_{i,n}*g)\). If
hold with \(a_n,~b_n>0\) and \(H_{i,n}(x):=\prod _{j=1}^nh_{i,n}(x_j)\), then for \(\hat{f}_n:=\hat{f}_n(Z_1,\ldots ,Z_n,x)\),
Proof
According to Jensen’s inequality, \(I_n\ge a_nE_{h_{0,n}}d(\hat{f}_n,f_{0,n})+b_nE_{h_{1,n}}d(\hat{f}_n,f_{1,n})\ge E_{h_{0,n}}a_n|d(\hat{f}_n,f_{1,n})-d(f_{1,n},f_{0,n})|+a_n^{-1}b_nE_{h_{1,n}}a_nd(\hat{f}_n,f_{1,n})\). Denoting \(T_n:=d(\hat{f}_n,f_{1,n})/d(f_{1,n},f_{0,n})\) and using \(a_nd(f_{1,n},f_{0,n}) > rsim 1\), one obtains that
Note that \(\min \{a,b\}=\frac{1}{2}(a+b-|a-b|)\) and \(|a-1|+|a|\ge 1\). Then
By Jensen’s inequality, (5.2) reduces to
This with the given assumption \(\int H_{1,n}^{-1}H_{0,n}^2\le a_n^{-1}b_n\) concludes
which is the desired conclusion (5.1). \(\square \)
To state the main theorem in this section, recall that
where \(\Sigma _s\) is a density set for \(s\in S\subset \mathbb {R}^m\). A positive sequence \(\{\varphi _{n,s}\}~(s\in S)\) is called adaptively admissible if there exists an estimator \(\hat{f}_n\) such that
For two positive sequences \(\{\varphi _{n,s}\}\) and \(\{\psi _{n,s}\}\), introduce
and
The sequence \(\{\psi _{n,s}\}\) outperforms \(\{\varphi _{n,s}\}\) on \(A^{(0)}[\psi _{n,s}/\varphi _{n,s}]\), while \(\{\varphi _{n,s}\}\) does much better than \(\{\psi _{n,s}\}\) on \(A^{(\infty )}[\psi _{n,s}/\varphi _{n,s}]\). The following definition is a special case of the corresponding ones in References [32] and [18].
Definition 5.1
A positive sequence \(\{\varphi _{n,s}\}\) is called optimal rate of adaptive convergence on \(\Sigma _s~(s\in S)\), if
-
1
\(\{\varphi _{n,s}\}\) is adaptively admissible;
-
2
for adaptively admissible sequence \(\{\psi _{n,s}\}\) satisfying \(A^{(0)}[\psi _{n,s}/\varphi _{n,s}]\ne \emptyset \),
-
(i)
\(A^{(0)}[\psi _{n,s}/\varphi _{n,s}]\) is contained in an \((m-1)-\)dimensional manifold,
-
(ii)
\(A^{(\infty )}[\psi _{n,s}/\varphi _{n,s}]\) contains an open set of S.
-
(i)
Theorem 5.1
Let g satisfy (C2), \(\Sigma _s=H^{\mathbf {s}}(\Omega _{x_0},L,M)\) with \(\mathbf {s}=(s_1,\ldots ,s_d)~(s_l>0)\) and \(\frac{1}{s}=\frac{1}{d}\sum \limits _{l=1}^d\frac{1+2\beta _l(\alpha )}{s_l}\). Then an optimal rate of adaptive convergence on \(\Sigma _s\) is
Proof
By Theorem 4.1, \(\{\varphi _{n,s}\}~(s\in \mathbb {R}^{+}:=(0,+\infty ))\) is an adaptively admissible sequence. Let \(\{\psi _{n,s}\}\) be any adaptively admissible sequence satisfying \(s_0\in A^{(0)}[\psi _{n,s}/\varphi _{n,s}]\). Then it suffices to show
according to Definition 5.1.
By \(s_0\in A^{(0)}[\psi _{n,s}/\varphi _{n,s}]\), one knows
Since \(\{\psi _{n,s}\}\) is an adaptively admissible sequence, there exist some estimator \(\hat{f}_n^*\) and positive constant C such that
hold with \(\tilde{s}_0\in \mathbb {R}\setminus \{s_0\}\).
The main work for (5.3) is to show for any \(\tilde{s}_0\in (s_0,\infty )\),
Choose the Daubechies wavelet function \(\psi _{2N}\) (for large N) as \(\psi :=\psi _{2N}\), and define
where \(f_1(x)=\prod \limits _{l=1}^d\frac{1}{\pi (1+|x_l-x_{0l}|^2)}\) is the d dimensional Cauchy density, \(\gamma _n=(\frac{\ln n}{n})^{\frac{s_0}{2s_0+d}}\), \(\delta _{nl}=\gamma _n^{\frac{1}{s_{l}}}\) and \(\frac{1}{s_0}=\frac{1}{d}\sum \limits _{l=1}^d\frac{1+2\beta _l(\alpha )}{s_l}\). The constant c will be specified later on. It is easy to see \(f_1\in \Psi _{\tilde{s}_0}\), \(f_{0,n}\in \Psi _{s_0}\) for large n, and \(h_{1}=(1-\alpha )f_1+\alpha (f_1*g)>0\). To use Lemma 5.1, one takes
Then \(a_nd(f_1,f_{0,n})=a_n|f_1(x_0)-f_{0,n}(x_0)|=a_nc\gamma _n|\psi (0)|^d=c|\psi (0)|^d > rsim 1\) thanks to \(\psi (0)\ne 0\) [37].
Clearly,
Similar to the estimates of \(I_{1n}\) and \(I_{2n}\) in the proof of Theorem 2.1 (on pp. 9–10), there exists a constant \(\tilde{c}\) depending on \(x_0\) such that
Here, \(h_1\) and \(h_{0,n}\) play the same roles as \(h_0\) and \(h_n\) there; the constant c appears in the definition of \(f_{0,n}\), see (5.8), while \(\tilde{c}\) comes from estimates of \(I_{1n}\) and \(I_{2n}\); In contrast to \(a_n=n^{-\frac{s}{2s+d}}\) in (2.1), we choose \(\gamma _n=(\frac{\ln n}{n})^{\frac{s_0}{2s_0+d}}\) here so that the factor \(\frac{\ln n}{n}\) appears in (5.9). Furthermore, (5.9) reduces to
Since \(\tau >\frac{s_0}{2s_0+d}\), one has \(n^{\tilde{c}c^2}<n^{\tau }(\frac{\ln n}{n})^{\frac{s_0}{2s_0+d}}=a_n^{-1}b_n\) by choosing small \(c>0\). Hence,
According to Lemma 5.1 with \(d(f,g)=|f(x_0)-g(x_0)|\) and \(b_n=n^{\tau }\),
On the other hand, \(\lim _{n\rightarrow \infty }a_n\sup _{f\in \Phi _{s_0}}[E_hd^p(\hat{f}_n^*,f)]^{\frac{1}{p}}=0\) thanks to (5.4)–(5.5) and \(a_n=\varphi _{n,s_0}^{-1}\). Then
This with (5.6) shows
Because \(\tau <\frac{\tilde{s}_0}{2\tilde{s}_0+d}\), there exists \(a>0\) such that \(\tau +a<\frac{\tilde{s}_0}{2\tilde{s}_0+d}\). Moreover,
due to (5.10) and \(\varphi _{n,\tilde{s}_0}=(\frac{\ln n}{n})^{\frac{\tilde{s}_0}{2\tilde{s}_0+d}}\). By Theorem 2.1 and (5.5), \(\psi _{n,s_0} > rsim n^{-\frac{s_0}{2s_0+d}}\) and \(\frac{\psi _{n,s_0}}{\varphi _{n,s_0}} > rsim (\ln n)^{-\frac{s_0}{2s_0+d}}\). Combining this with (5.11) and \(a>0\), one obtains
which is the desired conclusion (5.7).
Using (5.7), one can conclude (5.3) easily. Suppose \(s_1\in A^{(0)}[\psi _{n,s}/\varphi _{n,s}]\setminus \{s_0\}\). Then \(s_1\in (0,s_0)\) thanks to the definition of \(A^{(0)}[\psi _{n,s}/\varphi _{n,s}]\) and (5.7). Replacing \(s_0\) by \(s_1\), \(\tilde{s}_0\) by \(s_0\) in the proof of (5.7), one finds
which contradicts with \(s_0,s_1\in A^{(0)}[\psi _{n,s}/\varphi _{n,s}]\). Hence, \(A^{(0)}[\psi _{n,s}/\varphi _{n,s}]=\{s_0\}\). Furthermore, \(A^{(\infty )}[\psi _{n,s}/\varphi _{n,s}] \supseteq (s_0,+\infty )\) thanks to (5.7). Now, (5.3) holds true and the proof is done. \(\square \)
6 Concluding Remark
In order to deal with density estimation in model \(Z=X+\varepsilon Y\), we suppose in this paper that the observed random samples \(Z_i\) are i. i. d and verify
where the sequences \(\{X_i\}\), \(\{Y_i\}\) and \(\{\varepsilon _i\}\) are mutually independent. The mutual independence plays a key role for (1.2), which is the starting point of the whole paper.
A natural question is that if the i. i. d assumption on \(Z_i\) can be replaced by some weaker condition, for example, \(\alpha -\)mixed or negatively associated. For lower bound estimation, Fano’s lemma is a fundamental tool. To get the relation \(K(P_{H_n},P_{H_0})=nK(P_{h_n},P_{h_0})\) (see page 8), we use i. i. d assumption on \(Z_i\). It seems hard to estimate Kullback distance \(K(P_{H_n},P_{H_0})\) without that condition.
Upper bound estimates need usually Rosenthal’s inequality and Bernstein inequality, which require independent condition on \(Z_i\). Of course, there are existing some replacements for \(\alpha -\)mixed data. However, they become more complicated [33]. We will consider the corresponding density estimation for that case in future.
Although the convergence rate \(n^{-\frac{s}{2s+d}}\) in Theorem 3.1 depends heavily on the dimension d, we can reduce the dimension influence under some independence hypothesis, as in [32]. For a partition \(\mathcal {P}\) of \(\mathcal {I}_d:=\{1,\ldots ,d\}\), a density function f is said to have independence structure \(\mathcal {P}\), if
With \(I=\{n_1,\ldots ,n_{|I|}\}\in \mathcal {P}\) and \(1\le n_1<\cdots <n_{|I|}\le d\), \(x_{I}\) stands for an element \((x_{n_1},\ldots ,x_{n_{|I|}})\) in \(\mathbb {R}^{|I|}\), where |I| denotes the cardinality of I. We use \(f\in H_{\mathcal {P}}^{\mathbf {s}} (\Omega _{x_0},L,~M)\) to denote \(f_{I}\in H^{\mathbf {s}_I}(\Omega _{x_{0I}},L_I,~M_I)\) for each \(I\in \mathcal {P}\) and \(M=:\prod _{I\in \mathcal {P}}M_I,~L=:\prod _{I\in \mathcal {P}}L_I\).
Let \(\hat{f}_{n,I}(x_{I})\) be the corresponding linear wavelet estimator of \(f_{I}\) given in (3.4) and define \(\hat{f}_n(x)\) for \(f\in H_{\mathcal {P}}^{\mathbf {s}}(\Omega _{x_{0}},L,~M)\) by
Then we can use Theorem 3.1 and the inequality
to prove the following result.
Theorem 6.1
Let g satisfy (C1) and \(\mathbf {s}=(s_1,\ldots ,s_d)\) with \(s_i>0\). Define \(s':=\min \limits _{I\in \mathcal {P}} \left( \sum \limits _{i\in I}[1+2\beta _i(\alpha )]/s_i \right) ^{-1}\) and \(1/s_{\text {I}}:=|I|^{-1} \sum \limits _{i\in I}(1+2\beta _i(\alpha ))/s_i\). Then with \(2^{\lfloor jr_i\rfloor } \sim n^{[s_{\text {I}}/(2s_{\text {I}}+|I|)](1/s_i)}~ (i\in I\in \mathcal {P})\) and \(p\in [1, \infty )\),
We omit the proof, since the arguments are elementary.
When \(\mathcal {P}=\{\{1\},\ldots ,\{d\}\}\), the density f has complete independence structure and
which doesn’t depend on the dimension d. Although the estimator \(\hat{f}_n\) in Theorem 6.1 is not adaptive, we can apply our selection rule (in Sect. 4) to each \(\hat{f}_{n,I}\) in order to get an adaptive estimation.
The next theorem shows the optimality of the estimation in Theorem 6.1.
Theorem 6.2
Let g satisfy (C2) and \(s'=\underset{I\in \mathcal {P}}{\min }\left\{ [\sum \limits _{i\in I} (1+2\beta _i(\alpha ))/s_i]^{-1}\right\} \). Then for \(M\ge \pi ^{-d}\) and \(p\in [1,+\infty )\),
where \(\hat{f}_n\) runs over all possible estimators for \(f\in H_{\mathcal {P}}^{\mathbf {s}}(\Omega _{x_{0}},~M)\).
We outline a proof here. As in the proof of Theorem 2.1, choose the one dimensional Cauchy density function \(\tilde{f}_0\) and Meyer’s wavelet \(\psi \). Then for \(I\in \mathcal {P}\) and \(s_{\text {I}}^{-1}= \frac{1}{|I|}\sum \limits _{i\in I}(1+2\beta _i(\alpha ))/s_i)\), define
with \(a_n:=n^{-s_{\text {I}}/(2s_{\text {I}}+|I|)}\) and \(\delta _{ni}=a_n^{\frac{1}{s_i}}\). Furthermore, define \(f_0(x):=\prod \limits _{I\in \mathcal {P}}f_{0,I}(x_I)\) and \(f_n(y)=f_0(y)+\Psi _n(y)\) with
The remaining proofs are similar to Theorem 2.1.
References
Butucea, C.: The adaptive rate of convergence in a problem of pointwise density estimation. Stat. Probab. Lett. 47, 85–90 (2000)
Butucea, C.: Exact adaptive pointwise estimation on Sobolev classes of densities. ESAIM Prob. Stat. 5, 1–31 (2001)
Benhaddou, R.: Minimax lower bounds for the simultaneous wavelet deconvolution with fractional Gaussian noise and unknown kernels. Stat. Prob. Lett. 140, 91–95 (2018)
Carroll, R.J., Hall, P.: Optimal rates of convergence for deconvolving a density. J. Am. Stat. Assoc. 83, 1184–1186 (1988)
Comte, F., Lacour, C.: Anisotropic adaptive kernel deconvolution. Ann. Inst. Henri Poincaré Prob. Stat. 49, 569–609 (2013)
Delyon, B., Juditsky, A.: On minimax wavelet estimators. Appl. Comput. Harmon. Anal. 3, 215–228 (1996)
Devroye, L.: Consistent deconvolution in density estimation. Can. J. Stat. 17, 235–239 (1989)
Donoho, D.L., Johnstone, I.M., Kerkyacharian, G., Picard, D.: Density estimation by wavelet thresholding. Ann. Stat. 24, 508–539 (1996)
Doukhan, P., León, J.R.: Déviation quadratique déstimateurs de densité par projections orthogonales. C. R. Acad. Sci. Paris Sér. I Math. 310, 425–430 (1990)
Fan, J.: On the optimal rates of convergence for nonparametric deconvolution problem. Ann. Stat. 19, 1257–1272 (1991)
Fan, J., Koo, J.-Y.: Wavelet deconvolution. IEEE Trans. Inf. Theory 48, 734–747 (2002)
Goldenshluger, A., Lepski, O.: Bandwidth selection in kernel density estimation: oracle inequalities and adaptive minimax optimality. Ann. Stat. 39, 1608–1632 (2011)
Goldenshluger, A., Lepski, O.: On adaptive minimax density estimation on \(\mathbb{R}^d\). Prob. Theory Relat. Fields 159, 479–543 (2014)
Hesse, C.H.: Deconvolving a density from partially contaminated observations. J. Multivariate Anal. 55, 246–260 (1995)
Härdle, W.K., Kerkyacharian, G., Picard, D., Tsybakov, A.: Wavelets, Approximation, and Statistical Applications. Springer, New York (1998)
Ibragimov, I.A., Hasminskii, R.Z.: Statistical Estimation: Asymptotic Theory. Springer, New York (1981)
Kerkyacharian, G., Picard, D.: Density estimation in Besov spaces. Stat. Prob. Lett. 13, 15–24 (1992)
Klutchnikoff, N.: Pointwise adaptive estimation of a multivariate function. Math. Methods Stat. 23, 132–150 (2014)
Kou, J.K., Liu, Y.M.: Nonparametric regression estimations over \(L^p\) risk based on biased data. Commun. Stat. Theor. Methods 46, 2375–2395 (2017)
Lepski, O.: Multivariate density estimation under sup-norm losses: oracle approach, adaptation and independence structure. Ann. Stat. 41, 1005–1034 (2013)
Lepski, O., Willer, T.: Lower bounds in the convolution structure density model. Bernoulli 23, 884–926 (2017)
Lepski, O., Willer, T.: Oracle inequalities and adaptive estimation in the convolution structure density model. Ann. Stat. 47, 233–287 (2019)
Li, R., Liu, Y.M.: Wavelet optimal estimations for a density with some additive noises. Appl. Comput. Harmon. Anal. 36, 416–433 (2014)
Liu, M., Taylor, R.: A consistent nonparametric density estimator for the deconvolution problem. Can. J. Stat. 17, 427–438 (1989)
Liu, Y.M., Wang, H.Y.: Convergence order of wavelet thresholding estimator for differential operators on Besov spaces. Appl. Comput. Harmon. Anal. 32, 342–356 (2012)
Liu, Y.M., Wu, C.: Point-wise estimation for anisotropic densities. J. Multivariate Anal. 171, 112–125 (2019)
Liu, Y.M., Zeng, X.C.: Asymptotic normality for wavelet deconvolution density estimators. Appl. Comput. Harmon. Anal. 48, 321–342 (2020)
Lounici, K., Nickl, R.: Global uniform risk bounds for wavelet deconvolution estimators. Ann. Stat. 39, 201–231 (2011)
Masry, E.: Strong consistency and rates for deconvolution of multivariate densities of stationary processes. Stoch. Process. Appl. 47, 53–74 (1993)
Pensky, M.: Density deconvolution based on wavelets with bounded supports. Stat. Prob. Lett. 56, 261–269 (2002)
Pensky, M., Vidakovic, B.: Adaptive wavelet estimator for nonparametric density deconvolution. Ann. Stat. 27, 2033–2053 (1999)
Rebelles, G.: Pointwise adaptive estimation of a multivariate density under independence hypothesis. Bernoulli 21, 1984–2023 (2015)
Shao, Q., Yu, H.: Weak convergence for weighted empirical process of dependent sequences. Ann. Prob. 24, 2098–2127 (1996)
Stefanski, L., Carroll, R.: Deconvoluting kernel density estimators. Statistics 21, 169–184 (1990)
Triebel, H.: Theory of Function Spaces III. Birkhäuser, Berlin (2006)
Tsybakov, A.B.: Pointwise and sup-norm sharp adaptive estimation of functions on the Sobolev classes. Ann. Stat. 26, 2420–2469 (1998)
Walnut, D.F.: An Introduction to Wavelet Analysis. Birkhäuser, Boston (2004)
Walter, G.G.: Density estimation in the presence of noise. Stat. Prob. Lett. 41, 237–246 (1999)
Wishart, J.R.: Smooth hyperbolic wavelet deconvolution with anisotropic structure. Electron. J. Stat. 13, 1694–1716 (2019)
Yuan, M., Chen, J.: Deconvolving multidimensional density from partially contaminated observations. J. Stat. Plann. Inference 104, 147–160 (2002)
Acknowledgements
This work is supported by the National Natural Science Foundation of China (No. 11771030) and the Beijing Natural Science Foundation (No. 1172001). The authors would like to thank two referees for their important comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Stephane Jaffard.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, Y., Wu, C. Point-Wise Wavelet Estimation in the Convolution Structure Density Model. J Fourier Anal Appl 26, 81 (2020). https://doi.org/10.1007/s00041-020-09794-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00041-020-09794-y
Keywords
- Density estimation
- Generalized deconvolution model
- Point-wise risk
- Optimality
- Adaptivity
- Wavelet
- Anisotropic Hölder space