Point-Wise Wavelet Estimation in the Convolution Structure Density Model

Liu, Youming; Wu, Cong

doi:10.1007/s00041-020-09794-y

Point-Wise Wavelet Estimation in the Convolution Structure Density Model

Published: 20 October 2020

Volume 26, article number 81, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Fourier Analysis and Applications Aims and scope Submit manuscript

Point-Wise Wavelet Estimation in the Convolution Structure Density Model

Download PDF

Youming Liu¹ &
Cong Wu¹

184 Accesses
5 Citations
Explore all metrics

Abstract

By using a kernel method, Lepski and Willer establish adaptive and optimal $L^p$ risk estimations in the convolution structure density model in 2017 and 2019. They assume their density functions to be in a Nikol’skii space. Motivated by their work, we first use a linear wavelet estimator to obtain a point-wise optimal estimation in the same model. We allow our densities to be in a local and anisotropic Hölder space. Then a data driven method is used to obtain an adaptive and near-optimal estimation. Finally, we show the logarithmic factor necessary to get the adaptivity.

On a Plug-In Wavelet Estimator for Convolutions of Densities

Article 01 December 2014

Adaptive Wavelet Density Estimation Under Independence Hypothesis

Article 06 September 2021

Adaptive wavelet estimations for the derivative of a density in GARCH-type model

Article Open access 18 April 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The deconvolution estimation is an important topic in statistics. In this paper, we consider the generalized deconvolution model introduced in [21, 22].

Let $(\Omega ,\mathcal {F},\mathbf {P})$ be a probability space and $Z_1,Z_2,\ldots ,Z_n$ be independent and identically distributed (i.i.d.) random variables of

$$\begin{aligned} Z=X+\varepsilon Y, \end{aligned}$$

(1.1)

where X stands for a real-valued random variable with unknown probability density f on $\mathbb {R}^d$, Y denotes an independent random noise (error) with the probability density g and $\varepsilon \in \{0,1\}$ Bernoulli random variable with $P\{\varepsilon =1\}=\alpha $, $\alpha \in [0,1]$. The purpose is to estimate f by the observed data $Z_1,Z_2,\ldots ,Z_n$ in some sense.

When $\alpha =1$, (1.1) reduces to the classical deconvolution model, while $\alpha =0$ corresponds to the traditional density estimation.

Clearly, the density h of Z in (1.1) satisfies

$$\begin{aligned} h=(1-\alpha )f+\alpha f*g, \end{aligned}$$

(1.2)

because $P\{Z<t\}=P\{\varepsilon =0\}P\{X+\varepsilon Y|_{\varepsilon =0}<t\} +P\{\varepsilon =1\}P\{X+\varepsilon Y|_{\varepsilon =1}<t\}=(1-\alpha )P\{X<t\}+\alpha P\{X+Y<t\}$. Furthermore,

$$\begin{aligned} f^{ft}(t)=[(1-\alpha )+\alpha g^{ft}(t)]^{-1}h^{ft}(t):=G_{\alpha }^{-1}(t)h^{ft}(t), \end{aligned}$$

when the function

$$\begin{aligned} G_{\alpha }(t)=1-\alpha +\alpha g^{ft}(t) \end{aligned}$$

has nonzeros on $\mathbb {R}^d$, where $f^{ft}$ is the Fourier transform of $f\in L^1(\mathbb {R}^d)$ defined by

$$\begin{aligned}f^{ft}(t):=\int _{\mathbb {R}^d}f(x)e^{-itx}dx.\end{aligned}$$

Under some mild assumptions on $G_{\alpha }(t)$, Lepski and Willer [21] establish an asymptotic lower bound of $L^p$ risk for model (1.1). Recently, they provide an adaptive and optimal estimate of $L^p$ risk over an anisotropic Nikol’skii space by using kernel method in [22].

When $\alpha =0$, References [9] and [17] deal with linear wavelet estimations; Nonlinear wavelet estimations are studied for adaptivity [6, 8, 25]. Kernel estimation with selection rule can be found in References [12, 13] and [20].

For $\alpha =1$, consistency of deconvolution estimators are investigated [7, 24, 29]. Pensky and Vidakovic and Fan and Koo obtain the optimal convergence rates for $L^2$ risk in Sobolev and Besov spaces respectively [11, 31]; Lounici and Nickl [28] provide the optimal $L^\infty $ risk estimate, while Li and Liu discuss optimal $L^p$ risk estimation for $p\in [1,\infty )$, see [23].

In contrast to the above $L^p$ risk estimation, we consider point-wise estimations for model (1.1) in this paper, because it is more concerned in some applications. For a density function set $\Sigma $, the maximal pointwise risk at $x\in \mathbb {R}$ over $\Sigma $ means

$$\begin{aligned} R_{p,n}(\hat{f}_n,\Sigma ,x):=\sup _{f\in \Sigma }\left[ E|\hat{f}_n(x)-f(x)|^p\right] ^{\frac{1}{p}} \text { with } p\in [1,\infty ), \end{aligned}$$

where $\hat{f}_n(x):=\hat{f}_n(x;Z_1,Z_2,\ldots ,Z_n)$ and EX is the expectation of X. An estimator $\hat{f}_n^*$ is said to be optimal over $\Sigma $, if

$$\begin{aligned} R_{p,n}(\hat{f}_n^*,\Sigma ,x)\lesssim \inf _{\hat{f}_n}R_{p,n}(\hat{f}_n,\Sigma ,x). \end{aligned}$$

Here and throughout, $A\lesssim B$ denotes $A\le CB$ for some absolute constant $C>0$; $A > rsim B$ means $B\lesssim A$; we use $A\sim B$ to stand for both $A > rsim B$ and $A\lesssim B$. Clearly,

$$\begin{aligned}R_{p,n}(\hat{f}_n^*,\Sigma ,x)\ge \inf _{\hat{f}_n}R_{p,n}(\hat{f}_n,\Sigma ,x)\end{aligned}$$

holds automatically.

In the case $\alpha =0$, there are existing references dealing with pointwise estimation in $L^2$ Sobelev ball $W_2^s(\mathbb {R}, M)$ [1, 2, 38]. Rebelles [32] discusses pointwise adaptive estimation under independence hypothesis in Nikoskii space.

In the case $\alpha =1$ and the dimension $d=1$, Fan shows with a kernel estimator $\hat{f}_n$,

$$\begin{aligned} \sup _{f\in H^s(L)}E|\hat{f}_n(x)-f(x)|^2\lesssim (\ln n)^{-\frac{2s}{\alpha }}, \end{aligned}$$

when the characteristic function $g^{ft}$ of Y satisfies

$$\begin{aligned} |g^{ft}(t)|\sim (1+|t|^2)^{-\frac{\beta }{2}}e^{-c_0|t|^{\alpha }}~~(\beta \ge 0, c_0, \alpha >0). \end{aligned}$$

Moreover, the estimate attains the optimal rate [10]. This above result extends the work of Carroll and Hall and Stefanski and Carroll [4, 34], in which the lower and upper bounds are investigated respectively for Gaussian noise. In 2013, Comte and Lacour [5] study the pointwise optimal estimation in anisotropic Hölder ball $H^{\mathbf {s}}(\mathbb {R}^d, M)$ with both moderately and severely ill-posed noise by kernel method.

It should be pointed out that for $\alpha \in (0,1)$, Hesse [14] provides an upper bound estimation for twice differentiable density functions, when the noise density g satisfies that $\inf \limits _{t}g(t)\ge \alpha $ and $g^{ft}(t)$ is twice continuously differentiable. Yuan and Chen generalize that work to high dimensional case [40]. However, they don’t consider the optimality of their estimations. More related work can be found in [3, 27, 30, 36, 39]. The parametric estimation is referred to [16].

Most of them assume the global smoothness of the estimated functions. It is more natural to require some smoothness of f at $x_0$, in order to estimate $f(x_0)$. In this current work, we study the same problem for $\alpha \in [0,1]$ over a local Hölder space. As in [26], we introduce for $l=1,\ldots ,d$ and $k=1,2,\ldots $,

$$\begin{aligned} D_l^{k}f(x):=\frac{\partial ^{k}}{\partial x_l^k}f(x) ~\text { and } ~ e_l:=(\overset{l-1}{\overbrace{0,\ldots ,0}},1,0,\ldots ,0)\in \mathbb {R}^d. \end{aligned}$$

Definition 1.1

For $\mathbf {s}=(s_1,\ldots ,s_d)$ with $s_l>0$, a function $f:R^d\rightarrow R$ is said to satisfy a local and anisotropic H$\ddot{o}$lder condition with exponent vector $\mathbf {s}$ at $x_0\in \mathbb {R}$, if there exists a neighborhood $\Omega _{x_0}$ of $x_0$ such that for $x,x+te_l\in \Omega _{x_0}$ and $t\in \mathbb {R}$,

$$\begin{aligned} |D_l^{[s_l]}f(x+te_l)-D_l^{[s_l]}f(x)|\le L|t|^{s_l-[s_l]}~~~~~~ (1\le l\le d), \end{aligned}$$

where $L>0$ is a fixed constant and $[s_l]$ stands for the largest integer strictly smaller than $s_l$. All those functions are denoted by $H^{\mathbf {s}}(\Omega _{x_0},L)$.

When $\mathbf {s}=(s,\ldots ,s)$, a function f in global Hölder ball $H^s(\mathbb {R}^d,L)$ must be in $H^{\mathbf {s}}(\Omega _{x},L)$ for each $x\in \mathbb {R}^d$. However, the converse is not necessarily true, see Example 1 in [26].

In this paper, we use $H^{\mathbf {s}}(\Omega _{x_{0}},L,~M)$ to stand for the density set

$$\begin{aligned} H^{\mathbf {s}}(\Omega _{x_{0}},L,~M):=\{f\in H^{\mathbf {s}}(\Omega _{x_0},L), \,f\text { is a density and}~ \Vert f\Vert _{\infty }\le M\}. \end{aligned}$$

Before posing some assumptions on noise Y in model (1.1), we recall that g is the density of Y and $G_{\alpha }(t)=1-\alpha +\alpha g^{ft}(t)$. For $\beta _l\ge 0~(l=1,\ldots ,d)$, denote

$$\begin{aligned} \beta (\alpha ):=(\beta _1(\alpha ),\ldots ,\beta _d(\alpha ))\text { with }\beta _l(\alpha )= {\left\{ \begin{array}{ll}\beta _l,~\alpha =1;\\ 0,~\alpha \in [0,1).\end{array}\right. } \end{aligned}$$

The following two conditions on noise Y are used in our discussions.

(C1)
$|G_{\alpha }(t)| > rsim \prod _{l=1}^d (1+|t_l|^2)^{-\frac{\beta _l(\alpha )}{2}}\text { with }t=(t_1,\ldots ,t_d)$;
(C2)
$|D_k^mG_{\alpha }(t)|\lesssim \prod _{l=1}^d(1+|t_l|^2)^{-\frac{\beta _l(\alpha )}{2}}\text { for }k\in \{1,\ldots ,d\}\text { and }0\le m\le d.$

Condition (C1) is the same as Assumption 4 in [22], which will be needed only for upper bound estimation as there. For $0<\alpha <\frac{1}{2}$,

$$\begin{aligned} \beta _l(\alpha )=0~~\text {and}~~|G_{\alpha }(t)|\ge 1-\alpha -\alpha |g^{ft}(t)| > rsim 1 \end{aligned}$$

thanks to $|g^{ft}(t)|\le 1$, and therefore (C1) holds automatically. If $0\le \alpha <1$ and $g^{ft}(t)\ge 0$, then (C1) holds as well. Examples for $g^{ft}(t)\ge 0$ include

$$\begin{aligned} g(x)=\overset{\text {2N}}{\overbrace{p*\cdots *p}(x)} \end{aligned}$$

with a density $p(x)=p(-x)$. In fact, $p(x)=p(-x)$ implies $p^{ft}$ real and $g^{ft}(t)=[p^{ft}(t)]^{2N}\ge 0$ for that case.

The lower bound estimation needs condition (C2) which is a little weaker than Assumption 1–2 in [21]. Clearly, (C2) can be rewritten as

(C2-1)
$|G_{\alpha }(t)|\lesssim \prod _{l=1}^d(1+|t_l|^2)^{-\frac{\beta _l(\alpha )}{2}}$ with $t=(t_1,\ldots ,t_d)$;
(C2-2)
$|D_k^mG_{\alpha }(t)|\lesssim \prod _{l=1}^d(1+|t_l|^2)^{-\frac{\beta _l(\alpha )}{2}}$ for $1\le k\le d$ and $1\le m\le d$.

When $\alpha <1$, (C2-1) holds automatically.

In Sect. 2, we show a lower bound estimation. Let $\hat{f}_n(x;Z_1,\ldots ,Z_n)$ be an estimator of density functions in $H^{\mathbf {s}}(\Omega _{x_{0}},L,~M)$ with $\mathbf {s}=(s_1,\ldots ,s_d)$ and $s_l>0$. Then with $\frac{1}{s}=\frac{1}{d}\sum \limits _{l=1}^d\frac{1+2\beta _l(\alpha )}{s_l}$ and $1\le p<+\infty $,

$$\begin{aligned} \inf _{\hat{f}_n}\sup _{f\in H^{\mathbf {s}}(\Omega _{x_{0}},L,~M)} \left[ E|\hat{f}_n(x_0)-f(x_0)|^p\right] ^{\frac{1}{p}} > rsim n^{-\frac{s}{2s+d}}. \end{aligned}$$

(1.3)

The anisotropic linear wavelet estimator $\hat{f}_n$ is constructed to attain the optimal convergence rate $n^{-\frac{s}{2s+d}}$ on $H^{\mathbf {s}}(\Omega _{x_{0}},L,~M)$ in the next section. To get adaptivity, we use a data driven estimator to obtain a convergence rate

$$\begin{aligned} n^{-\frac{s}{2s+d}}\cdot (\ln n)^{\frac{s}{2s+d}} \end{aligned}$$

in Sect. 4. Furthermore, it will be proved in Sect. 5 that the losing convergence rate $(\ln n)^{-\frac{s}{2s+d}}$ is necessary for the adaptivity. Some concluding remarks are provided in the last section.

2 Lower Bound

We use Fano’s Lemma to give our lower bound estimation in this part. Let P, Q be two probability measures with density functions p, q respectively. If P is absolutely continuous with respect to Q (denoted by $P\ll Q$), then the Kullback–Leilber divergence between P and Q means

$$\begin{aligned} K(P,Q):=\int \limits _{p\cdot q>0}p(x)\ln \frac{p(x)}{q(x)}dx. \end{aligned}$$

Lemma 2.1

(Fano’s Lemma) Let $(\Omega ,\mathcal {F},P_k)$ be probability measurable spaces and $A_k\in \mathcal {F}~(k=0,1,\ldots ,m)$. If $A_k\bigcap A_v=\emptyset $ for $k\ne v$, then with $A^c$ standing for the complement of A and $\mathcal {K}_m:=\inf \limits _{0\le v\le m}\frac{1}{m}\sum \limits _{k\ne v}K(P_k,P_v)$,

$$\begin{aligned} \sup \limits _{0\le k\le m}P_k(A_k^c)\ge \min \left\{ \frac{1}{2},\sqrt{m}~ \exp (-3e^{-1}-\mathcal {K}_m)\right\} . \end{aligned}$$

Theorem 2.1

Let g satisfy (C2) and $1/s:=1/d\sum \nolimits _{l=1}^d(1+2\beta _l(\alpha ))/s_l$ with $\mathbf {s}=(s_1,\ldots ,s_d)$ and $s_l>0$. Then for $M>(2\pi )^{-d/2}$ and $1\le p\le \infty $,

$$\begin{aligned} \inf _{\hat{f}_n}\sup _{f\in H^{\mathbf {s}}(\Omega _{x_{0}},L,~M)} \left[ E|\hat{f}_n(x_0)-f(x_0)|^p\right] ^{\frac{1}{p}} > rsim n^{-\frac{s}{2s+d}}, \end{aligned}$$

where $\hat{f}_n$ runs over all possible estimators of $f\in H^{\mathbf {s}}(\Omega _{x_{0}},L,~M)$.

Proof

Let $\tilde{f}_0(t):=\frac{1}{\pi (1+t^2)}$ be the one dimensional Cauchy density and

$$\begin{aligned} f_0(x):=\prod _{l=1}^d\tilde{f}_0(x_l-x_{0l})\end{aligned}$$

with $x=(x_1,x_2,\ldots ,x_d),~x_0= (x_{01},x_{02},\ldots ,x_{0d})\in \mathbb {R}^d$. With the Meyer wavelet function $\psi $, one introduces a sequence of functions $\Psi _n~(n=1,2,\ldots )$ by

$$\begin{aligned} \Psi _n(x)=a_n\prod _{l=1}^d\psi \left( \frac{x_l-x_{0l}}{\delta _{nl}}\right) \text { with } a_n= n^{-\frac{s}{2s+d}}\text { and }\delta _{nl}=a_n^{\frac{1}{s_l}}. \end{aligned}$$

(2.1)

Because $|\psi (t)|\lesssim (1+|t|^2)^{-1}$ and $\delta _{nl}\le 1$,

$$\begin{aligned} |\Psi _n(x)|\lesssim \frac{a_n}{\prod _{l=1}^d(1+\delta _{nl}^{-2}|x_l-x_{0l}|^2)} \le \frac{a_n}{\prod _{l=1}^d(1+|x_l-x_{0l}|^2)}\lesssim a_n f_0(x). \end{aligned}$$

(2.2)

Define $f_n(x):=f_0(x)+\Psi _n(x)~(n=1,2,\ldots )$ and

$$\begin{aligned} h_n(x):=(1-\alpha )f_n(x)+\alpha (f_n*g)(x)~(n=0,1,\ldots ). \end{aligned}$$

Then for each $x\in \mathbb {R}^d$, $f_n(x)>0$ as n sufficiently large, because of $f_0(x)>0$, (2.2) and $\lim \limits _{n\rightarrow \infty }a_n=0$. In addition,

$$\begin{aligned} \int _{\mathbb {R}^d}f_n(x)dx=\int _{\mathbb {R}^d}f_0(x)dx=1 \end{aligned}$$

thanks to $\int _{\mathbb {R}}\psi (t)dt=0$ and $\int _{\mathbb {R}^d}\Psi _n(x)dx=0$. Hence, $f_n$ is a density function for large n.

Since the arbitrary-order derivative of $\tilde{f}_{0}(t)$ is bounded, one finds $\tilde{f}_{0}\in H^{\tilde{s}}(\mathbb {R})$ for each $\tilde{s}>0$, and therefore $f_0\in H^{\mathbf {s}}(\Omega _{x_0},L)$. Note that

$$\begin{aligned} |D_l^{\lfloor s_l\rfloor }\Psi _n(x+te_l)-D_l^{\lfloor s_l\rfloor }\Psi _n(x)| \lesssim a_n\delta _{nl}^{-\lfloor s_l\rfloor }|\frac{t}{\delta _{nl}}|^{s_l-\lfloor s_l\rfloor } = |t|^{s_l-\lfloor s_l\rfloor } \end{aligned}$$

thanks to (2.1) and $\psi \in H^{s_l}(\mathbb {R})$. Then $\Psi _n\in H^{\mathbf {s}}(\Omega _{x_0},L)$ and $f_n\in H^{\mathbf {s}}(\Omega _{x_0},L)$. On the other hand, $\Vert f_0\Vert _{\infty }\le \frac{1}{\pi ^d}$ and $\Vert f_n\Vert _{\infty }\le \Vert f_0\Vert _{\infty }+ \Vert \Psi _n\Vert _{\infty }\lesssim 1$ because of (2.2) and $a_n\lesssim 1$. Hence, $\{f_0,~f_n\}\subseteq H^{\mathbf {s}}(\Omega _{x_{0}},L,~M)$ with some $M>0$.

By $\psi (0)\ne 0$ [37], $|f_n(x_0)-f_0(x_0)|=a_n|\psi (0)|^d > rsim a_n$. For $\tilde{x}=(x^1,~\ldots ,~x^n)$ with $x^k\in \mathbb {R}^d$ and $1\le k\le n$, define two density functions

$$\begin{aligned} H_0(\tilde{x}):=\prod \limits _{k=1}^nh_0(x^k)\quad \text {and}\quad H_n(\tilde{x}):= \prod \limits _{k=1}^nh_n(x^k). \end{aligned}$$

Then with $A_\chi :=\left\{ |\hat{f}_n(x_0)-f_\chi (x_0)|<\frac{1}{2}a_n|\psi (0)|^d\right\} ~(\chi =0,n)$, $A_0\bigcap A_n=\emptyset $ and

$$\begin{aligned} \sup \limits _{\chi =0,n}P_{H_\chi }(A_\chi ^c)\ge \min \left\{ \frac{1}{2}, ~\exp (-3e^{-1}-\kappa _1)\right\} \end{aligned}$$

(2.3)

due to Lemma 2.1. On the other hand, Markov inequality tells

$$\begin{aligned} E|\hat{f}_n(x_0)-f_\chi (x_0)| \ge \frac{1}{2}a_n|\psi (0)|^dP_{H_\chi }(A_\chi ^c). \end{aligned}$$

This with Jensen’s inequality and (2.3) shows

$$\begin{aligned}&\sup \limits _{\chi =0,n}[E|\hat{f}_n(x_0)-f_\chi (x_0)|^p]^{\frac{1}{p}}\nonumber \\&\quad \ge \sup \limits _{\chi =0,n} E|\hat{f}_n(x_0)-f_\chi (x_0)|\nonumber \\&\quad \ge \sup \limits _{\chi =0,n}\frac{1}{2}a_n|\psi (0)|^dP_{H_\chi }(A_\chi ^c)\nonumber \\&\quad \ge \frac{1}{2}a_n|\psi (0)|^d \min \left\{ \frac{1}{2},~\exp (-3e^{-1}-\kappa _1)\right\} . \end{aligned}$$

(2.4)

It remains to prove $\kappa _1\lesssim 1$ for finishing the proof of Theorem 2.1.

By (2.2) and the definition of $h_n$, one finds $P_{h_n}\ll P_{h_0}$ and

$$\begin{aligned} K(P_{h_n},P_{h_0})=\int h_n(x)\ln \frac{h_n(x)}{h_0(x)}dx. \end{aligned}$$

Since $Z_i$ are i.i.d, $K(P_{H_n},P_{H_0})=nK(P_{h_n},P_{h_0})$ and

$$\begin{aligned} K(P_{H_n},P_{H_0})\le n\int h_n(x)[\frac{h_n(x)}{h_0(x)}-1]dx= n\int |h_0(x)|^{-1}|h_n(x)-h_0(x)|^2dx \end{aligned}$$

thanks to $\ln u\le u-1~(u>0)$. This with $\kappa _1:=\inf \limits _{v=0,n}\sum \limits _{\chi \ne v}K(P_{H_\chi },P_{H_v})\le K(P_{H_n},P_{H_0})$ and the definition of $h_0$ shows

$$\begin{aligned} \kappa _1\le n\int |(1-\alpha )f_0(x)+\alpha (f_0*g)(x)|^{-1}|h_n(x)-h_0(x)|^2dx. \end{aligned}$$

(2.5)

Clearly,

$$\begin{aligned} f_0(x)\ge \pi ^{-d}(1+|x-x_0|^2)^{-d}. \end{aligned}$$

(2.6)

Next, one shows

$$\begin{aligned} (f_0*g)(x) > rsim (1+|x-x_0|^2)^{-d}. \end{aligned}$$

(2.7)

By (2.6) and Fatou’s Lemma,

$$\begin{aligned}&\underset{|x-x_0|\rightarrow \infty }{\underline{\lim }}(1+|x-x_0|^2)^d f_0*g(x) \\&\ge \frac{1}{\pi ^d}\int _{\mathbb {R}} g(y)\underset{|x-x_0|\rightarrow \infty }{\underline{\lim }}\left( \frac{1+|x-x_0|^2}{1+|x-x_0-y|^2}\right) ^ddy= \frac{1}{\pi ^d}. \end{aligned}$$

Hence, there exists $A>0$ such that

$$\begin{aligned} (f_0*g)(x) > rsim (1+|x-x_0|^2)^{-d}~~\text {for}~~|x-x_0|\ge A. \end{aligned}$$

(2.8)

Since g is a density, one finds $\int _{|y-x_0|\le B-A}g(y)dy\ge \frac{1}{2}$ for some $B>A$. When $|x-x_0|\le A$ and $|y-x_0|\le B-A$, $|x-y-x_0|\le |x-x_0|+|y-x_0| +|x_0|\le B+|x_0|$ and

$$\begin{aligned}&(f_0*g)(x)\ge \int _{|y-x_0|\le B-A}f_0(x-y)g(y)dy\\&\quad > rsim \int _{|y-x_0|\le B-A} g(y)(1+|x-x_0|^2)^{-d}dy\\&\quad \ge [1+(B+|x_0|)^2]^{-d}\int _{|y-x_0|\le B-A}g(y)dy > rsim (1+|x-x_0|^2)^{-d}. \end{aligned}$$

This with (2.8) reaches (2.7).

According to (2.6)–(2.7), $|(1-\alpha )f_0(x)+\alpha (f_0*g)(x)| > rsim (1+|x|^2)^{-d}$ for each $\alpha \in [0,1]$. Combining this with (2.5) and the definitions of $h_n,~h_0$, one concludes

$$\begin{aligned} \frac{\kappa _1}{n}\lesssim \int (1+|x|^2)^d|(1-\alpha )\Psi _n(x)+ \alpha (\Psi _n*g)(x)|^2dx\lesssim I_{1n}+I_{2n}, \end{aligned}$$

(2.9)

where

$$\begin{aligned} I_{1n}= & {} \int |(1-\alpha )\Psi _n(x)+\alpha (\Psi _n*g)(x)|^2dx,\\ I_{2n}= & {} \int \sum _{l=1}^dx_l^{2d} |(1-\alpha )\Psi _n(x)+\alpha (\Psi _n*g)(x)|^2dx.\end{aligned}$$

It is easy to see that

$$\begin{aligned} I_{1n}&\lesssim \int |\Psi _n^{ft}(t)G_{\alpha }(t)|^2dt\lesssim \int |\Psi _n^{ft}(t)|^2\prod _{l=1}^d(1+|t_l|^2)^{-\beta _l(\alpha )}dt\nonumber \\&\lesssim a_n^2\prod _{l=1}^d\delta _{nl}\int |\psi ^{ft}(t_l)|^2 (1+|\delta _{nl}^{-1}t_l|^2)^{-\beta _l(\alpha )}dt_l \lesssim a_n^2\prod _{l=1}^d\delta _{nl}^{1+2\beta _l(\alpha )}, \end{aligned}$$

(2.10)

where the first inequality comes from Parseval identity and the definition of $G_{\alpha }(t)$; the second one is true due to (C2-1); the third inequality holds because of the definition of $\Psi _n$ in (2.1); the last one follows from supp$~\psi ^{ft}\subset [2\pi /3,8\pi /3]$. Furthermore,

$$\begin{aligned} I_{1n}\lesssim n^{-1} \end{aligned}$$

(2.11)

thanks to definition of $\delta _{nl}$, s and $a_n$.

To estimate $I_{2n}$, denote $q:={\Psi }_n^{ft}\cdot G_{\alpha }$. Note that $[(1-\alpha )\psi _n(\cdot )+\alpha (\psi _n*g)(\cdot )]^{ft}(t)= \psi _n^{ft}(t)G_{\alpha }(t)$. Then $q^{ft}(t)=(1-\alpha )\psi _n(-t)+\alpha (\psi _n*g)(-t)$ and

$$\begin{aligned} I_{2n}=\sum _{l=1}^d\int |t_l^dq^{ft}(t)|^2dt=\sum _{l=1}^d\int |(D_l^dq(t))^{ft}|^2dt= \sum _{l=1}^d\int |D_l^dq(t)|^2dt, \end{aligned}$$

where one uses Parseval identity. By $D_l^dq=\sum _{m=1}^dC_d^mD_l^m\Psi _n^{ft}\cdot D_l^{d-m}G_{\alpha }$ and (C2-2),

$$\begin{aligned} I_{2n}&\lesssim \sum _{l=1}^d\sum _{m=1}^d\int |D_l^m\Psi _n^{ft}(t)\cdot D_l^{d-m}G_{\alpha }(t)|^2dt\\&\lesssim \sum _{l=1}^d\sum _{m=1}^d\int |D_l^m\Psi _n^{ft}(t)|^2 \prod _{l=1}^d(1+|t_l|^2)^{-\beta _l(\alpha )}dt. \end{aligned}$$

Because $\delta _{nl}\in (0,1]$ and $\Psi _n^{ft}(t)=a_n\prod \limits _{l=1}^d\delta _{nl}\psi ^{ft}(\delta _{nl}t_l)e^{-ix_{0l}t_l}$, similar arguments to (2.10)–(2.11) show

$$\begin{aligned} I_{2n}\lesssim n^{-1}. \end{aligned}$$

(2.12)

This with (2.9) and (2.11) leads to $\kappa _1\lesssim 1$. The proof is completed. $\square $

Remark 2.1

When $\alpha =0$, $\beta _l(\alpha )=0$ and Theorem 2.1 reduces to Theorem 2 in [26]. Since (2.7) plays a key role in our proof, we choose $f_0$ to be the tensor product of the Cauchy density instead of the Gauss function for the case $\alpha =0$; In the definition of

$$\begin{aligned} f_n(x):=f_0(x)+\Psi _n(x), \end{aligned}$$

$\Psi _n(x)$ is defined by the Meyer’s wavelet function $\psi $ (see (2.1)) whose Fourier transform has compact support. In contrast to that, a compactly supported wavelet $\psi $ is used in [26].

Remark 2.2

From the proof of Theorem 2.1, we find the function $f_n\in H^{\mathbf {s}}(\mathbb {R}^d,L,~M)$ for $n=1,2,\ldots $. Therefore, the conclusion of Theorem 2.1 can be replaced by

$$\begin{aligned} \inf _{\hat{f}_n}\sup _{f\in H^{\mathbf {s}}(\mathbb {R}^d,L,~M)} \left[ E|\hat{f}_n(x_0)-f(x_0)|^p\right] ^{\frac{1}{p}} > rsim n^{-\frac{s}{2s+d}}. \end{aligned}$$

When $\alpha =1$, the above estimate reduces to Theorem 1 (Case A) of Comte and Lacour [5]; When $\alpha =0$, it coincides with Theorem 2 ($r>0$) of Rebelles [32].

3 Upper Bound

This section shows an upper bound of the point-wise risk for an anisotropic wavelet estimator, which matches the lower bound given in Theorem 2.1. To do that, we firstly introduce an anisotropic wavelet basis. Then our wavelet estimator is defined by that basis. After giving two lemmas, we finally prove the main result (Theorem 3.1) in this section.

Let $\mathbb {N}$ be the nonnegative integer set and $\{V_j:j\in \mathbb {N}\}$ a classical orthonormal multiresolution analysis of $L^2(\mathbb {R})$ with scaling and wavelet functions $\phi ,~\psi $. For $\mathbf {r}=(r_1,\ldots ,r_d)$ with $r_1, \ldots , r_d>0$ and $\sum \nolimits _{i=1}^d r_i=d$, we define $V_{j}^{r_i}=V_{\lfloor jr_i\rfloor }$ for $j\ge 0$ and $i\in \{1,\ldots ,d\}$, where $\lfloor a\rfloor $ denotes the integer part of a. With $x=(x_1,\ldots ,x_d)\in \mathbb {R}^d$ and $\mathbf {k}=(k_1,\ldots ,k_d)\in \mathbb {Z}^d$ (the multiple integer set),

$$\begin{aligned} \Phi _{j\mathbf {r};~\mathbf {k}}(x)=\prod _{i=1}^d\phi _{\lfloor jr_i\rfloor ,~k_i}(x_i)= \prod _{i=1}^d2^{\lfloor jr_i\rfloor /2}\phi (2^{\lfloor jr_i\rfloor }x_i-k_i) \end{aligned}$$

constitutes an orthonormal basis for $\mathbf {V}_j^\mathbf {r}=\bigotimes \limits _{i=1}^dV_{j}^{r_i}$. Moreover, $\mathbf {V}_{j+1}^\mathbf {r}=\mathbf {V}_j^\mathbf {r}\oplus \mathbf {W}_j^\mathbf {r}$ with $\mathbf {W}_j^\mathbf {r}=\bigoplus _{\gamma \in \Gamma }\left( \bigotimes _{i=1}^d W_{\lfloor jr_i\rfloor }^{\gamma _i}\right) $ and $\Gamma =\{0,1\}^d\backslash \{0\}^d $, see [35].

Denote for $\gamma _i=1$,

$$\begin{aligned} \psi _{{\lfloor jr_i\rfloor },~k_i}^{\gamma _i}=c \sum \limits _{\ell ={\lfloor jr_i\rfloor }}^{{\lfloor (j+1)r_i\rfloor -1}}\psi _{\ell ,~k_i} \end{aligned}$$

with some normalized constant c, and $\psi _{{\lfloor jr_i\rfloor },~k_i}^{\gamma _i} (x_i)=\phi _{{\lfloor jr_i\rfloor },~k_i}(x_i)$ for $\gamma _i=0$. Then $\Psi _{j\mathbf {r};~\mathbf {k}}^{\gamma }(x)=\prod _{i=1}^d \psi _{{\lfloor jr_i\rfloor },~k_i}^{\gamma _i}(x_i)$ forms an orthonormal basis of $\bigotimes \limits _{i=1}^d W_{\lfloor jr_i\rfloor }^{\gamma _i}$. Hence, for each $f\in L^2(\mathbb {R}^d)$,

$$\begin{aligned} f=\sum _{\mathbf {k}\in \mathbb {Z}^d}\alpha _{j_0\mathbf {r},\mathbf {k}}\Phi _{j_0\mathbf {r};~\mathbf {k}} +\sum _{j=j_0}^\infty \sum _{\mathbf {k}\in \mathbb {Z}^d}\sum _{\gamma \in \Gamma }\beta _{j\mathbf {r},\mathbf {k}}^{\gamma } \Psi _{j\mathbf {r};~\mathbf {k}}^{\gamma } \end{aligned}$$

(3.1)

holds in the $L^2(\mathbb {R}^d)$ sense, where

$$\begin{aligned} \alpha _{j\mathbf {r},\mathbf {k}}=\langle f,\Phi _{j\mathbf {r};~\mathbf {k}}\rangle \quad \text {and} \quad \beta _{j\mathbf {r},\mathbf {k}}^{\gamma }= \langle f,\Psi _{j\mathbf {r};~\mathbf {k}}^{\gamma }\rangle . \end{aligned}$$

(3.2)

In fact, (3.1) holds point-wisely when $\phi $ and $\psi $ are chosen to be continuous and compactly supported, see Remark 2 in [26].

When $\mathbf {r}=(1,\ldots ,1)$, this above anisotropic wavelet basis reduces to the traditional tensor wavelet basis. The flexibility of $\mathbf {r}$ plays a key role for the estimation of anisotropic density functions.

To introduce our estimator, we choose a compactly supported and continuous $\phi $ with $|\phi ^{ft}(t)|\lesssim (1+|t|)^{-m}~(m\ge \max \limits _l\beta _l(\alpha )+2)$. Then

$$\begin{aligned} (K\phi )_{jk}(y):=\frac{1}{2\pi }\int e^{-ity}\overline{(\Phi _{j\mathbf {r};~\mathbf {k}})^{ft}}(t) G_{\alpha }^{-1}(t)dt \end{aligned}$$

(3.3)

is well-defined under the assumption (C1). Define our linear wavelet estimator by

$$\begin{aligned} \hat{f}_n(x):=\sum \limits _{\mathbf {k}}\hat{\alpha }_{j\mathbf {r},\mathbf {k}} \Phi _{j\mathbf {r};~\mathbf {k}}(x),~~~ \hat{\alpha }_{j\mathbf {r},\mathbf {k}}:= \frac{1}{n}\sum \limits _{i=1}^n(K\phi )_{jk}(Z_i) \end{aligned}$$

(3.4)

with i.i.d samples $Z_i$. The vector $j\mathbf {r}$ will be specified in Theorem 3.1. By $Ee^{-itZ_i}=h^{ft}(t)=G_{\alpha }(t)f^{ft}(t)$ and Plancherel formula,

$$\begin{aligned} E\hat{\alpha }_{j\mathbf {r},\mathbf {k}}=E(K\phi )_{jk}(Z_1)=\langle f^{ft}, (\Phi _{j\mathbf {r};~\mathbf {k}})^{ft}\rangle =\langle f,\Phi _{j\mathbf {r};~\mathbf {k}}\rangle = \alpha _{j\mathbf {r},\mathbf {k}} \end{aligned}$$

and $E\hat{f}_n=P_jf:=\sum _{\mathbf {k}\in \mathbb {Z}^d}\alpha _{j\mathbf {r},\mathbf {k}} \Phi _{j\mathbf {r};~\mathbf {k}}$.

For $t=(t_1,\ldots ,t_d)\in \mathbb {R}^d$, denote $\lfloor t\rfloor :=(\lfloor t_1\rfloor ,\ldots ,\lfloor t_d\rfloor )$ and $|\lfloor t\rfloor |:=\lfloor t_1\rfloor +\ldots +\lfloor t_d\rfloor $. Then the following lemma holds.

Lemma 3.1

Let $\hat{\alpha }_{j\mathbf {r},\mathbf {k}}$ be given by (3.4), $2^{|\lfloor j\mathbf {r}\rfloor |}\le n$ and $\Vert f\Vert _{\infty }\lesssim 1$. Then

$$\begin{aligned} E|\hat{\alpha }_{j\mathbf {r},\mathbf {k}}-\alpha _{j\mathbf {r},\mathbf {k}}|^p\lesssim n^{-\frac{p}{2}}2^{p\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )}, \end{aligned}$$

where $\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )$ stands for the inner product of $\lfloor j\mathbf {r}\rfloor $ and $\beta (\alpha )$.

Proof

By the definitions of $\hat{\alpha }_{j\mathbf {r},\mathbf {k}}$ and $\alpha _{j\mathbf {r},\mathbf {k}}$,

$$\begin{aligned} E|\hat{\alpha }_{j\mathbf {r},\mathbf {k}}-\alpha _{j\mathbf {r},\mathbf {k}}|^p=\frac{1}{n^p} E|\sum _{i=1}^n [(K\phi )_{jk}(Z_i)-E(K\phi )_{jk}(Z_i)]|^p:=\frac{1}{n^p}E|\sum _{i=1}^n\zeta _i|^p \end{aligned}$$

(3.5)

with $\zeta _i=(K\phi )_{jk}(Z_i)-E(K\phi )_{jk}(Z_i)$. According to (3.3) and the definition of $\Phi _{j\mathbf {r};~\mathbf {k}}$, one knows

$$\begin{aligned}&|(K\phi )_{jk}(Z_i)|\lesssim \int |(\Phi _{j\mathbf {r};~\mathbf {k}})^{ft}(t)|\cdot |G_{\alpha }^{-1}(t)| dt\\&\quad =\int \prod _{l=1}^d2^{-\lfloor jr_l\rfloor /2} |\phi ^{ft}(2^{-\lfloor jr_l\rfloor }t_l)||G_{\alpha }^{-1}(t)|dt. \end{aligned}$$

This with (C1) and $|\phi ^{ft}(t)|\lesssim (1+|t|^2)^{-m}~(m>\max \limits _l{\beta _l(\alpha )}+2)$ leads to

$$\begin{aligned}&|(K\phi )_{jk}(Z_i)|\lesssim \prod _{l=1}^d\int 2^{\frac{\lfloor jr_l\rfloor }{2}}|\phi ^{ft}(t)| \left( 1+|2^{\lfloor jr_l\rfloor }t_l|^2\right) ^{\frac{\beta _l(\alpha )}{2}}dt_l \nonumber \\&\quad \lesssim \prod _{l=1}^d2^{\frac{\lfloor jr_l\rfloor (1+2\beta _l(\alpha ))}{2}}= 2^{|\lfloor j\mathbf {r}\rfloor |/2+\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )}. \end{aligned}$$

(3.6)

Hence,

$$\begin{aligned} \Vert \zeta _i\Vert _{\infty }\lesssim 2^{|\lfloor j\mathbf {r}\rfloor |/2+\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )}. \end{aligned}$$

(3.7)

By $\Vert f\Vert _{\infty }\lesssim 1$, $\Vert h\Vert _{\infty }=\Vert (1-\alpha )f+\alpha f*g\Vert _{\infty }\lesssim 1$ and

$$\begin{aligned} E|\zeta _i|^2\le E(K\phi )_{jk}^2(Z_i)=\int |(K\phi )_{jk}(y)|^2h(y)dy\le \int |(K\phi )_{jk}(y)|^2dy, \end{aligned}$$

which reduces to

$$\begin{aligned} E|\zeta _i|^2\lesssim \int |(\Phi _{j\mathbf {r};~\mathbf {k}})^{ft}(t) G_{\alpha }^{-1}(t)|dt \end{aligned}$$

because of the definition of $(K\phi )_{jk}$ and Parseval identity. Similar to the arguments of (3.6), one obtains

$$\begin{aligned} E|\zeta |^2\lesssim 2^{2\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )}. \end{aligned}$$

(3.8)

According to Rosenthal’s inequality,

$$\begin{aligned}&E\left| \sum _{i=1}^n\zeta _i\right| ^p\lesssim \sum _{i=1}^nE|\zeta _i|^pI_{\{p>2\}} +\left( \sum _{i=1}^nE\zeta _i^2\right) ^{\frac{p}{2}}\\&\quad \le \Vert \zeta _i\Vert _{\infty }^{p-2}\sum _{i=1}^nE|\zeta _i|^2I_{\{p>2\}} +\left( \sum _{i=1}^nE\zeta _i^2\right) ^{\frac{p}{2}}. \end{aligned}$$

This with (3.7)–(3.8) shows

$$\begin{aligned} E\left| \sum _{i=1}^n\zeta _i\right| ^p\lesssim n^{\frac{p}{2}}2^{p\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )}[(n^{-1}2^{|\lfloor j\mathbf {r}\rfloor |})^{\frac{p}{2}-1} I_{\{p>2\}}+1].\end{aligned}$$

Since $2^{|\lfloor j\mathbf {r}\rfloor |}\le n$, $(n^{-1}2^{|\lfloor j\mathbf {r}\rfloor |})^{\frac{p}{2}-1}\lesssim 1$ for $p>2$. Combining these with (3.5), one receives the desired estimate

$$\begin{aligned}E|\hat{\alpha }_{j\mathbf {r},\mathbf {k}}-\alpha _{j\mathbf {r},\mathbf {k}}|^p\lesssim n^{-\frac{p}{2}}2^{p\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )}.\end{aligned}$$

$\square $

We also need another lemma in [26] for the proof of Theorem 3.1.

Lemma 3.2

Let $\beta _{j\mathbf {r},\mathbf {k}}^{\gamma }$ be defined in (3.2). Then for $j\in \mathbb {N}$,

$$\begin{aligned} \sup _{x\in \Omega _{x_0}}\sup _{f\in H^{\mathbf {s}}(\Omega _{x_0},L,~M)}\sum _\mathbf {k} \sum _{\gamma \in \Gamma }|\beta _{j\mathbf {r},\mathbf {k}}^{\gamma } \Psi _{j\mathbf {r};~\mathbf {k}}^{\gamma }(x)|\lesssim \sum _{l=1}^d2^{-\lfloor jr_l\rfloor s_l}. \end{aligned}$$

Theorem 3.1

Let g satisfy (C1) and $1/s:=1/d\sum \limits _{l=1}^d(1+2\beta _l(\alpha ))/s_l$ with $s_l>0~(l=1,\ldots ,d)$. Then with $2^{\lfloor jr_l\rfloor }\sim n^{\frac{s}{2s+d}\frac{1}{s_l}}$ and $1\le p<\infty $, the estimator $\hat{f}_n$ in (3.4) satisfies

$$\begin{aligned} \sup _{x\in \Omega _{x_0}}\sup _{f\in H^{\mathbf {s}}(\Omega _{x_0},L,~M)} \left[ E|\hat{f}_n(x)-f(x)|^p\right] ^{\frac{1}{p}} \lesssim n^{-\frac{s}{2s+d}}. \end{aligned}$$

Proof

One begins with an inequality,

$$\begin{aligned} {[}E|\hat{f}_n(x)-f(x)|^p]^{\frac{1}{p}}\lesssim [E|\hat{f}_n(x)-P_{j}f(x)|^p]^{\frac{1}{p}} + |P_{j}f(x)-f(x)|. \end{aligned}$$

(3.9)

By (3.1),

$$\begin{aligned} |f(x)-P_jf(x)|\lesssim \sum _{j'=j}^\infty \sum _{\mathbf {k}\in \mathbb {Z}^d}\sum _{\gamma \in \Gamma } |\beta _{j'\mathbf {r},\mathbf {k}}^{\gamma }\Psi _{j'\mathbf {r};~\mathbf {k}}^{\gamma }(x)|. \end{aligned}$$

This with Lemma 3.2 and $2^{\lfloor jr_l\rfloor }\sim n^{\frac{s}{2s+d}\frac{1}{s_l}}$ leads to

$$\begin{aligned} \sup _{x\in \Omega _{x_0}}\sup \limits _{f\in H^{\mathbf {s}}(\Omega _{x_0},L,~M)}|P_{j}f(x)-f(x)| \lesssim \sum _{l=1}^d2^{-\lfloor jr_l\rfloor s_l} \lesssim n^{-\frac{s}{2s+d}} . \end{aligned}$$

(3.10)

On the other hand, $|\hat{f}_n(x)-P_{j}f(x)|=|\sum _{\mathbf {k}}(\hat{\alpha }_{j\mathbf {r},\mathbf {k}}- \alpha _{j\mathbf {r},\mathbf {k}})\Phi _{j\mathbf {r};~\mathbf {k}}(x)|$.

According to Hölder inequality with $\frac{1}{p}+\frac{1}{p'}=1$,

$$\begin{aligned} |\hat{f}_n(x)-P_{j}f(x)|^p \le \sum _{\mathbf {k}}|\hat{\alpha }_{j\mathbf {r},\mathbf {k}}- \alpha _{j\mathbf {r},\mathbf {k}}|^p |\Phi _{j\mathbf {r};~\mathbf {k}}(x)||\sum _{\mathbf {k}}\Phi _{j\mathbf {r};~\mathbf {k}}(x)|^{\frac{p}{p'}}.\end{aligned}$$

Furthermore,

$$\begin{aligned} E|\hat{f}_n(x)-P_{j}f(x)|^p \le n^{-\frac{p}{2}}2^{p\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )}\left( \sum _{\mathbf {k}} |\Phi _{j\mathbf {r};~\mathbf {k}}(x)|\right) ^p \end{aligned}$$

(3.11)

thanks to Lemma 3.1. Since $\phi $ is assumed to be compactly supported and continuous, $\sum \nolimits _{k}|\phi (x-k)|\lesssim 1$ and $\sum _{\mathbf {k}} |\Phi _{j\mathbf {r};~\mathbf {k}}(x)|\lesssim 2^{|\lfloor j\mathbf {r}\rfloor |/2}$. Combining this with (3.11), one finds

$$\begin{aligned} {[}E|\hat{f}_n(x)-P_{j}f(x)|^p]^{\frac{1}{p}}\lesssim n^{-\frac{1}{2}}2^{\lfloor j\mathbf {r}\rfloor \beta (\alpha )} 2^{\frac{|\lfloor j\mathbf {r}\rfloor |}{2}} = \left( \frac{2^{|\lfloor j\mathbf {r}\rfloor |+2\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )}}{n}\right) ^{\frac{1}{2}}. \end{aligned}$$

Then it follows from the assumptions $1/s:=1/d\sum \limits _{l=1}^d(1+2\beta _l(\alpha ))/s_l$ and $2^{\lfloor jr_l\rfloor }\sim n^{\frac{s}{2s+d}\frac{1}{s_l}}$ that

$$\begin{aligned} {[}E|\hat{f}_n(x)-P_{j}f(x)|^p]^{\frac{1}{p}}\lesssim n^{-\frac{s}{2s+d}}. \end{aligned}$$

This with (3.9)–(3.10) shows the desired conclusion. $\square $

Remark 3.1

When replace the choice $2^{\lfloor jr_l\rfloor }\sim n^{\frac{s}{2s+d}\frac{1}{s_l}}$ by $2^{\lfloor jr_l\rfloor }\sim (\frac{n}{\ln n})^{\frac{s}{2s+d}\frac{1}{s_l}}$, we find

$$\begin{aligned} \sup _{x\in \Omega _{x_0}}\sup _{f\in H^{\mathbf {s}}(\Omega _{x_0},L,~M)} \left[ E|\hat{f}_n(x)-f(x)|^p\right] ^{\frac{1}{p}} \lesssim \left( \frac{\ln n}{n}\right) ^{\frac{s}{2s+d}} \end{aligned}$$

from the proof of Theorem 3.1.

Remark 3.2

Comparing with References [32] and [40], we use wavelet method instead of kernel one. Taking $\alpha =0$ in Theorem 3.1, we obtain Theorem 3 [32]; In the case $\alpha \in (0,1)$ and $\mathbf {s}=(2,2,\ldots ,2)$, our theorem implies Theorem 2.1 in [40].

When $\alpha =1$, Theorem 3.1 coincides with Theorem 4.5 in [38]; When $\alpha =0$, our Theorem reduces to Theorem 3 in [26]. The linear wavelet estimator $\hat{f}_n$ in Theorem 3.1 is not adaptive, since it depends on the unknown vector $\mathbf {s}=(s_1,\ldots ,s_d)$. We may expect to consider the nonlinear wavelet estimator to get the adaptivity as in the classical case. However, it seems hard even for $\alpha =0$, see Remark 3 in [26]. In the next section, we give adaptive and near-optimal estimations by using data driven strategy.

4 Adaptive Estimation

Since the linear wavelet estimator $\hat{f}_n$ in Theorem 3.1 is not adaptive, this section provides an adaptive and near-optimal estimations on $H^{\mathbf {s}}(\Omega _{x_0},~L,~M)$, see Theorem 4.1. Motivated by the work of [32] and [12], we use the linear wavelet estimator $\hat{f}_n$ to define an auxiliary estimator $\hat{f}_{j\mathbf {r},j^*\mathbf {r}^*}(x)$. After introducing a subset $\mathcal {H}^d$ of $\mathbb {R}^d$, we give a selection rule to determine $j_0\mathbf {r}_0$ and the desired estimator $\hat{f}_{j_0\mathbf {r}_0}(x)$.

Rewrite the linear wavelet estimator $\hat{f}_n$ in Theorem 3.1 as $\hat{f}_{j\mathbf {r}}:=\hat{f}_n$, since it depends on $\lfloor jr_l\rfloor $. Then define an auxiliary estimator

$$\begin{aligned} \hat{f}_{j\mathbf {r},j^*\mathbf {r}^*}(x):= \sum \limits _{\mathbf {k}}\hat{\alpha }_{j\mathbf {r}\wedge j^*\mathbf {r}^*, \mathbf {k}}\Phi _{j\mathbf {r}\wedge j^*\mathbf {r}^*;~\mathbf {k}}(x), \end{aligned}$$

where $j\mathbf {r}\wedge j^*\mathbf {r}^*:=j\mathbf {r}$ for $\min \limits _{1\le l\le d}jr_l\le \min \limits _{1\le l\le d}j^*r^*_l$, and $j\mathbf {r}\wedge j^*\mathbf {r}^*:=j^*\mathbf {r}^*$ otherwise. With the constant $\lambda $ specified after (4.12) on page 19,

$$\begin{aligned} {\mu }_{j\mathbf {r}}:=\lambda \sqrt{\frac{\left( 1+\frac{p}{2}\right) 2^{|\lfloor j\mathbf {r}\rfloor |+2\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )}\max \{1,~(\ln 2)(|\lfloor j\mathbf {r}\rfloor |+2\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha ))\}}{n}} \end{aligned}$$

(4.1)

satisfies

$$\begin{aligned} {\mu }_{j\mathbf {r}}\lesssim \sqrt{n^{-1}[|\lfloor j\mathbf {r}\rfloor |+2\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )] 2^{|\lfloor j\mathbf {r}\rfloor |+2\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )}}. \end{aligned}$$

(4.2)

Again, ${\mu }_{j\mathbf {r}}$ depends on $\lfloor jr_l\rfloor ~(1\le l\le d)$. When $\alpha =0$, (4.1)–(4.2) are the same as (21)–(22) in [26].

Next, we introduce

$$\begin{aligned}&\mathcal {H}^d:=\{\mathbf {t}=(t_1,\ldots ,t_d)\in \mathbb {R}^d,~0\le \lfloor t_l\rfloor \le \log _2 \frac{n}{\log _{2}n} \\&\quad \text { and }[|\lfloor t\rfloor |+2\lfloor t\rfloor \cdot \beta (\alpha )]2^{|\lfloor t\rfloor |+2\lfloor t\rfloor \cdot \beta (\alpha )}\le n\}. \end{aligned}$$

When $\alpha <1$, $\beta (\alpha )=(0,\ldots ,0)$ and

$$\begin{aligned} \mathcal {H}^d:=\left\{ \mathbf {t}=(t_1,\ldots ,t_d)\in \mathbb {R}^d,~0\le \lfloor t_l\rfloor \le \log _2 \frac{n}{\log _{2}n}\right\} . \end{aligned}$$

Then $j_0\mathbf {r}_0\in \mathcal {H}^d$ is determined by the following rule:

(i)
$\hat{\xi }_{j\mathbf {r}}(x):=\max \limits _{j^*\mathbf {r}^*\in \mathcal {H}^d}[|\hat{f}_{j\mathbf {r},j^*\mathbf {r}^*}(x)-\hat{f}_{j^*\mathbf {r}^*}(x)|- {\mu }_{j^*\mathbf {r}^*}-{\mu }_{j\mathbf {r}}]_{+}$ with $a_{+}:=\max \{0,a\}$;
(ii)
$\hat{\xi }_{j_0\mathbf {r}_0}(x)+2{\mu }_{j_0\mathbf {r}_0}:=\min \limits _{j\mathbf {r}\in \mathcal {H}^d}[\hat{\xi }_{j\mathbf {r}}(x)+2{\mu }_{j\mathbf {r}}]$.

Although $\mathcal {H}^d$ is an infinite set, the sets after “max” and “min” in (i) and (ii) are finite, because $\hat{\xi }_{j\mathbf {r}}$ and ${\mu }_{j\mathbf {r}}$ depends on $\lfloor jr_l\rfloor ~(l=1,\ldots ,d)$. Therefore $\hat{\xi }_{j\mathbf {r}}(x)$ and $j_0\mathbf {r}_0$ are well-defined. Clearly, $j_0\mathbf {r}_0$ is completely decided by the known samples $\{Z_k\}$. It doesn’t depend on any unknown information of f. To prove Theorem 4.1, we need two well-known lemmas.

Lemma 4.1

Let $(X,\mathcal {F},\mu )$ be a measurable space and $f\in L^p(X,\mathcal {F},\mu )$ with $0<p<\infty $. Then with $\lambda (t):=\mu \{y\in X, ~|f(y)|>t\}$,

$$\begin{aligned} \int _{X}|f(y)|^pd\mu =p\int _{0}^{+\infty }t^{p-1}\lambda (t)dt. \end{aligned}$$

Lemma 4.2

(Bernstein inequality) Let $X_1,\ldots ,X_n$ be i.i.d, $EX_i=0$ and $|X_i|\le \Vert X\Vert _{\infty }$. Then for each $\gamma > 0$,

$$\begin{aligned} P\left\{ \left| \frac{1}{n}\sum _{i=1}^{n}X_i\right| \ge \gamma \right\} \le 2\exp \left( -\frac{n\gamma ^2}{2(EX_i^2+\Vert X\Vert _{\infty }\gamma /3)}\right) . \end{aligned}$$

Theorem 4.1

Let g satisfy (C1) and $1/s:=1/d\sum \limits _{l=1}^d(1+2\beta _l(\alpha ))/s_l$ with $s_l>0~(l=1,\ldots ,d)$. Then for $j_0\mathbf {r}_0$ given in the above selection rule,

$$\begin{aligned} \sup _{x\in \Omega _{x_0}}\sup _{f\in H^{\mathbf {s}}(\Omega _{x_0},~L,~M)}\left[ \mathrm{E} |\hat{f}_{j_0\mathbf {r}_0}(x)-f(x)|^p\right] ^{\frac{1}{p}}\lesssim \left( \frac{\ln n}{n}\right) ^{\frac{s}{2s+d}}. \end{aligned}$$

(4.3)

Proof

With the choice $2^{\lfloor j_1r_{1l}\rfloor }\sim (\frac{n}{\ln n})^{[s/(2s+d)]\cdot (1/s_l)} ~(1\le l\le d)$, Remark 3.1 tells

$$\begin{aligned}&\sup _{x\in \Omega _{x_0}}\sup _{f\in H^{\mathbf {s}}(\Omega _{x_0},L,~M)} \left[ \mathrm{E}|\hat{f}_{j_1\mathbf {r}_1}(x)-f(x)|^p\right] ^{\frac{1}{p}}\nonumber \\&\quad \lesssim \left( \frac{\ln n}{ n}\right) ^{\frac{s}{2s+d}}. \end{aligned}$$

(4.4)

It is easy to see $j_1\mathbf {r}_1\in \mathcal {H}^d$ because $\sum _{l=1}^d [s/(2s+d)]\cdot (1/s_l)+\sum _{l=1}^d[s/(2s+d)]\cdot (2\beta _l(\alpha )/s_l) =[s/(2s+d)]\cdot \sum \nolimits _{l=1}^d(1+2\beta _l(\alpha ))/s_l= d/(2s+d)< 1$. By selection rule (i) and (ii),

$$\begin{aligned}&|\hat{f}_{j_1\mathbf {r}_1,j_0\mathbf {r}_0}- \hat{f}_{j_0\mathbf {r}_0}|+|\hat{f}_{j_1\mathbf {r}_1,j_0\mathbf {r}_0}-\hat{f}_{j_1\mathbf {r}_1}|\\&\quad \le (\hat{\xi }_{j_1\mathbf {r}_1}+{\mu }_{j_0\mathbf {r}_0}+ {\mu }_{j_1\mathbf {r}_1})+ (\hat{\xi }_{j_0\mathbf {r}_0}+{\mu }_{j_0\mathbf {r}_0}+{\mu }_{j_1\mathbf {r}_1})\\&\quad =(\hat{\xi }_{j_0\mathbf {r}_0}+ 2{\mu }_{j_0\mathbf {r}_0})+(\hat{\xi }_{j_1\mathbf {r}_1}+2{\mu }_{j_1\mathbf {r}_1})\le 2(\hat{\xi }_{j_1\mathbf {r}_1}+2{\mu }_{j_1\mathbf {r}_1}). \end{aligned}$$

Therefore,

$$\begin{aligned}&|\hat{f}_{j_0\mathbf {r}_0}-f|^p\lesssim |\hat{f}_{j_0\mathbf {r}_0}- \hat{f}_{j_1\mathbf {r}_1,j_0\mathbf {r}_0}|^p+|\hat{f}_{j_1\mathbf {r}_1,j_0\mathbf {r}_0}\\&\quad -\hat{f}_{j_1\mathbf {r}_1}|^p+ |\hat{f}_{j_1\mathbf {r}_1}-f|^p\lesssim \hat{\xi }_{j_1\mathbf {r}_1}^p+{\mu }_{j_1\mathbf {r}_1}^p+|\hat{f}_{j_1\mathbf {r}_1}-f|^p. \end{aligned}$$

According to $2^{\lfloor j_1r_{1l}\rfloor }\sim (\frac{n}{\ln n})^{[s/(2s+d)]\cdot (1/s_l)}$ and $1/s:=1/d\sum \limits _{l=1}^d(1+2\beta _l(\alpha ))/s_l$,

$$\begin{aligned} 2^{|\lfloor j_1\mathbf {r}_1\rfloor |+2\lfloor j_1\mathbf {r}_1\rfloor \cdot \beta (\alpha )}\sim \left( \frac{n}{\ln n}\right) ^{[s/(2s+d)]\sum _{l=1}^d\frac{1+2\beta _l(\alpha )}{s_l}}= \left( \frac{n}{\ln n}\right) ^{d/(2s+d)} \end{aligned}$$

and ${\mu }_{j_1\mathbf {r}_1}^2\lesssim n^{-1}\ln \frac{n}{\ln n} (\frac{n}{\ln n})^{\frac{d}{2s+d}}\le (\frac{n}{\ln n})^{-2s/(2s+d)}$ due to (4.2). Furthermore, ${\mu }_{j_1\mathbf {r}_1}^p\lesssim (\frac{n}{\ln n})^{-sp/(2s+d)}$. These with (4.4) show that

$$\begin{aligned}&\sup _{x\in \Omega _{x_0}}\sup _{f\in H^{\mathbf {s}}(\Omega _{x_0},L,~M)}\left[ \mathrm{E} |\hat{f}_{j_0\mathbf {r}_0}(x)-f(x)|^p\right] ^{\frac{1}{p}}\\&\quad \lesssim \sup _{x\in \Omega _{x_0}}\sup _{f\in H^{\mathbf {s}}(\Omega _{x_0},L,~M)} (\mathrm{E}\hat{\xi }_{j_1\mathbf {r}_1}^p)^{\frac{1}{p}}+\left( \frac{\ln n}{n}\right) ^{\frac{s}{2s+d}}, \end{aligned}$$

where one uses the facts that $(|a|+|b|+|c|)^{\theta }\lesssim |a|^{\theta }+|b|^{\theta }+ |c|^{\theta }$ for $\theta >0$, and $\sup (|x|+|y|+|z|)\le \sup |x|+\sup |y|+\sup |z|$. Hence, it suffices for the desired conclusion (4.3) to show

$$\begin{aligned} \sup _{x\in \Omega _{x_0}}\sup _{f\in H^{\mathbf {s}}(\Omega _{x_0},L,M)} (\mathrm{E}\hat{\xi }_{j_1\mathbf {r}_1}^p)^{\frac{1}{p}}\lesssim (\frac{\ln n}{n})^{\frac{s}{2s+d}}. \end{aligned}$$

(4.5)

By the same arguments as in [26],

$$\begin{aligned} \mathrm{E}\hat{\xi }_{j_1\mathbf {r}_1}^p\lesssim 2^{-\min \limits _{1\le l\le d} \lfloor j_1r_{1l}\rfloor s_lp}+\sum _{\lfloor jr_1\rfloor =0}^{\lfloor \log _{2} (\frac{n}{\log _{2}n})\rfloor }\ldots \sum _{\lfloor jr_d\rfloor =0}^{\lfloor \log _{2} (\frac{n}{\log _{2}n})\rfloor }\mathrm{E}(|\hat{f}_{j\mathbf {r}}-\mathrm{E}\hat{f}_{j\mathbf {r}}|- {\mu }_{j\mathbf {r}})_{+}^p.\nonumber \\ \end{aligned}$$

(4.6)

For $y> 0$, $P\{(|\hat{f}_{j\mathbf {r}}-\mathrm{E}\hat{f}_{j\mathbf {r}}|- {\mu }_{j\mathbf {r}})_{+}\ge y\}=P\{|\hat{f}_{j\mathbf {r}}-\mathrm{E}\hat{f}_{j\mathbf {r}}|- {\mu }_{j\mathbf {r}}\ge y\}$. Then Lemma 4.1 tells

$$\begin{aligned} \mathrm{E}(|\hat{f}_{j\mathbf {r}}-\mathrm{E}\hat{f}_{j\mathbf {r}}|-{\mu }_{j\mathbf {r}})_{+}^p= p\int _{0}^{\infty }y^{p-1}P\{|\hat{f}_{j\mathbf {r}} -\mathrm{E}\hat{f}_{j\mathbf {r}}|\ge {\mu }_{j\mathbf {r}}+y\}dy, \end{aligned}$$

which turns out to be (by variable change)

$$\begin{aligned} \mathrm{E}(|\hat{f}_{j\mathbf {r}}-\mathrm{E}\hat{f}_{j\mathbf {r}}|-{\mu }_{j\mathbf {r}})_{+}^p =p{\mu }_{j\mathbf {r}}^p\int _{0}^{\infty }t^{p-1} P\{|\hat{f}_{j\mathbf {r}}-\mathrm{E}\hat{f}_{j\mathbf {r}}|\ge (1+t){\mu }_{j\mathbf {r}}\}dt. \end{aligned}$$

(4.7)

According to the definition of $\hat{f}_{j\mathbf {r}}$,

$$\begin{aligned} \hat{f}_{j\mathbf {r}}(x):=\sum _{\mathbf {k}}\hat{\alpha }_{j\mathbf {r},\mathbf {k}} \Phi _{j\mathbf {r};\mathbf {k}}(x) =\frac{1}{n}\sum _{i=1}^{n}\sum _{\mathbf {k}} (K\phi )_{jk}(Z_i)\Phi _{j\mathbf {r};\mathbf {k}}(x) \end{aligned}$$

and $\hat{f}_{j\mathbf {r}}(x)-\mathrm{E}\hat{f}_{j\mathbf {r}}(x)=1/n\sum _{i=1}^{n}\eta _i$ with

$$\begin{aligned} \eta _i:=\sum _k\Phi _{j\mathbf {r};\mathbf {k}}(x)[(K\phi )_{jk}(Z_i)-\mathrm{E}(K\phi )_{jk}(Z_i)]. \end{aligned}$$

(4.8)

Clearly, $\{\eta _i\}$ are i.i.d and $E\eta _i=0$. By (3.6), $|(K\phi )_{jk}(Z_i)|\le 2^{\frac{|\lfloor j\mathbf {r}\rfloor |}{2}+\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )}$ and

$$\begin{aligned} |\eta _l|\lesssim \sum _{\mathbf {k}}|\Phi _{j\mathbf {r};\mathbf {k}}(x)| \Vert (K\phi )_{jk}\Vert _{\infty }\lesssim 2^{|\lfloor j\mathbf {r}\rfloor |+\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )}. \end{aligned}$$

(4.9)

On the other hand,

$$\begin{aligned} E\eta _i^2\le \int \left[ \sum _k\Phi _{j\mathbf {r};\mathbf {k}}(x)\frac{1}{2\pi }\int e^{-ity}\overline{(\Phi _{j\mathbf {r};~\mathbf {k}})^{ft}}(t) G_{\alpha }^{-1}(t)dt\right] ^2h(y)dy\end{aligned}$$

thanks to (4.8) and the definition of $(K\phi )_{jk}$ (see (3.3)). Since $\Vert f\Vert _{\infty }\le M$, $\Vert h\Vert _{\infty }\le M$ and $E\eta _i^2$ can be bounded by

$$\begin{aligned} M\int \left| \sum _k\Phi _{j\mathbf {r};\mathbf {k}}(x)\frac{1}{2\pi }\int e^{-ity}\overline{(\Phi _{j\mathbf {r};~\mathbf {k}})^{ft}}(t) G_{\alpha }^{-1}(t)dt\right| ^2dy. \end{aligned}$$

Using Parseval identity and (C1), one knows

$$\begin{aligned} E\eta _i^2\le M\sum _k\Phi ^2_{j\mathbf {r};\mathbf {k}}(x)\int [\overline{(\Phi _{j\mathbf {r};~\mathbf {k}})^{ft}}(t) G_{\alpha }^{-1}(t)]^2dt\le c_22^{|\lfloor j\mathbf {r}\rfloor |+2\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )} \end{aligned}$$

(4.10)

for some $c_2>0$. Then it follows from Lemma 4.2 and (4.9)–(4.10) that

$$\begin{aligned}&P\{|\hat{f}_{j\mathbf {r}}-\mathrm{E}\hat{f}_{j\mathbf {r}}|\ge (1+t){\mu }_{j\mathbf {r}}\}\nonumber \\&\quad \le 2\exp \left\{ -\frac{n(1+t)^2{\mu }_{j\mathbf {r}}^2}{2[c_22^{|\lfloor j\mathbf {r}\rfloor |+2\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )}+\frac{1}{3}c_12^{|\lfloor j\mathbf {r}\rfloor |+\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )}\cdot (1+t) {\mu }_{jr}]}\right\} . \end{aligned}$$

(4.11)

For $j\mathbf {r}\in \mathcal {H}^d$, $(|\lfloor j\mathbf {r}\rfloor |+2\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )) 2^{(\lfloor j\mathbf {r}\rfloor +2\lfloor j\mathbf {r}\rfloor \beta (\alpha ))}\le n$ and ${\mu }_{j\mathbf {r}}\lesssim 1$ because of (4.2). Furthermore,

$$\begin{aligned}&2[c_22^{|\lfloor j\mathbf {r}\rfloor |+2\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )}+\frac{1}{3}c_12^{|\lfloor j\mathbf {r}\rfloor |+ \lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )}\cdot (1+t){\mu }_{jr}]\\&\quad \le c_32^{|\lfloor j\mathbf {r}\rfloor |+ 2\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )}(1+t)~~~~(c_3>0). \end{aligned}$$

This with (4.1) concludes the right hand side of (4.11) bounded by

$$\begin{aligned} 2\exp \left\{ -\frac{1+t}{c_3}\lambda ^2(1+\frac{p}{2})\max \{1,~(\ln 2){(|\lfloor j\mathbf {r}\rfloor |+ 2\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha ))}\}\right\} . \end{aligned}$$

(4.12)

With the choice $\lambda \ge \sqrt{c_3}$, (4.11) reduces to

$$\begin{aligned}&P\{|\hat{f}_{j\mathbf {r}}-\mathrm{E}\hat{f}_{j\mathbf {r}}|\ge (1+t){\mu }_{j\mathbf {r}}\} \le 2\\&\quad \exp \left\{ -(1+\frac{p}{2})(1+t)\max \{1,~(\ln 2){(|\lfloor j\mathbf {r}\rfloor |+2\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha ))}\}\right\} \\&\le 2\exp \left\{ -(\ln 2)(1+\frac{p}{2}){(|\lfloor j\mathbf {r}\rfloor |+2\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha ))}\right\} \\&\quad \cdot \exp \left\{ -(1+\frac{p}{2})t\right\} \lesssim 2^{-\left( 1+\frac{p}{2}\right) {(|\lfloor j\mathbf {r}\rfloor |+ 2\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha ))}}\cdot e^{-t}. \end{aligned}$$

Substituting this above estimate into (4.7), one obtains

$$\begin{aligned}&\mathrm{E}(|\hat{f}_{j\mathbf {r}}-\mathrm{E}\hat{f}_{j\mathbf {r}}|-{\mu }_{j\mathbf {r}})_{+}^p\\&\quad \lesssim \left( \int _0^{+\infty }t^{p-1}e^{-t}dt\right) {\mu }_{j\mathbf {r}}^p 2^{-(1+\frac{p}{2}){(|\lfloor j\mathbf {r}\rfloor |+2\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha ))}}\\&\quad \lesssim 2^{-{(|\lfloor j\mathbf {r}\rfloor |+2\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha ))}}\left( \frac{{|\lfloor j\mathbf {r}\rfloor |+2\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )}}{n}\right) ^{\frac{p}{2}}\\&\quad \lesssim 2^{-{(|\lfloor j\mathbf {r}\rfloor |+2\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha ))}}\left( \frac{\ln n}{n}\right) ^{\frac{p}{2}} \end{aligned}$$

due to (4.2) and $|\lfloor j\mathbf {r}\rfloor |+2\lfloor j\mathbf {r}\rfloor \cdot \beta (\alpha )\lesssim \ln n$. Finally, it follows from (4.6) and the choice $2^{\lfloor j_1r_{1l}\rfloor }\sim (\frac{n}{\ln n})^{\frac{s}{2s+d}\frac{1}{s_l}}$ that

$$\begin{aligned} \mathrm{E}\hat{\xi }_{j_1\mathbf {r}_1}^p\lesssim 2^{-\min \limits _{1\le l\le d} \lfloor j_1r_{1l}\rfloor s_lp}+(\frac{\ln n}{n})^{\frac{p}{2}}\lesssim \left( \frac{\ln n}{n}\right) ^{\frac{sp}{2s+d}}. \end{aligned}$$

Because all constants in “$\lesssim $” of the whole proof are independent of $x\in \Omega _{x_0}$ and $f\in H^\mathbf {s}(\Omega _{x_0,L,M})$, the desired conclusion (4.5) reaches. This completes the proof. $\square $

Remark 4.1

Theorem 4.1 with $\alpha =0$ and $\alpha =1$ reduces to Theorem 4 [32] and Corollary 1 [5] respectively. From the proof of Theorem 4.1, we find that the parameter $\lambda $ doesn’t depend on L, but it depends on M as in the classical wavelet density estimation.

Remark 4.2

Although Theorem 4.1 can be proved by the method in [22], our proof looks simpler and more elementary. In fact, it is not easy to prove pointwise oracle inequality for losses [22]; We use the obtained linear estimation to show Theorem 4.1 directly, which seems natural as well.

The selection rule used in this section is the same as that in Lepski and Willer [22]. However, we use wavelet estimators instead of kernel-type ones. For some density functions, a wavelet estimator picks up local information more effectively. In addition, wavelet estimation has fast algorithm [15, 19].

Remark 4.3

In contrast to the linear wavelet estimator $\hat{f}_n$ in Theorem 3.1, the estimator $\hat{f}_{j_0\mathbf {r}_0}$ in Theorem 4.1 is adaptive due to the selection rule (i)–(ii). Of course, we pay the price: the convergence rate loses $(\ln n)^{-\frac{s}{2s+d}}$. The next section shows it necessary to get the adaptivity.

5 Optimality

In this part, we show the convergence rate of Theorem 4.1 the best possible, for which the following lemma is needed.

Lemma 5.1

Let $\Phi _{s_i}~(i=0,1)$ be a density set, $f_{i,n}\in \Phi _{s_i}~(n=1,2,\ldots )$ and $h_{i,n}:=(1-\alpha )f_{i,n}+\alpha (f_{i,n}*g)$. If

$$\begin{aligned} h_{1,n}(x)>0,~a_nd(f_{1,n},f_{0,n}) > rsim 1\text { and }\int H_{1,n}^{-1}H_{0,n}^2dx\le a_n^{-1}b_n\end{aligned}$$

hold with $a_n,~b_n>0$ and $H_{i,n}(x):=\prod _{j=1}^nh_{i,n}(x_j)$, then for $\hat{f}_n:=\hat{f}_n(Z_1,\ldots ,Z_n,x)$,

$$\begin{aligned} I_n:=a_n\sup _{f\in \Phi _{s_0}}[E_hd^p(\hat{f}_n,f)]^{\frac{1}{p}}+b_n\sup _{f\in \Phi _{s_1}} [E_hd^p(\hat{f}_n,f)]^{\frac{1}{p}} > rsim 1. \end{aligned}$$

(5.1)

Proof

According to Jensen’s inequality, $I_n\ge a_nE_{h_{0,n}}d(\hat{f}_n,f_{0,n})+b_nE_{h_{1,n}}d(\hat{f}_n,f_{1,n})\ge E_{h_{0,n}}a_n|d(\hat{f}_n,f_{1,n})-d(f_{1,n},f_{0,n})|+a_n^{-1}b_nE_{h_{1,n}}a_nd(\hat{f}_n,f_{1,n})$. Denoting $T_n:=d(\hat{f}_n,f_{1,n})/d(f_{1,n},f_{0,n})$ and using $a_nd(f_{1,n},f_{0,n}) > rsim 1$, one obtains that

$$\begin{aligned} I_n& > rsim E_{h_{0,n}}|T_n-1|+\frac{b_n}{a_n}E_{h_{1,n}}T_n\\&=\int |T_n-1|\frac{H_{0,n}}{H_{1,n}}H_{1,n}+\frac{b_n}{a_n}\int T_nH_{1,n}\\&\ge \int \min \{\frac{H_{0,n}}{H_{1,n}},\frac{b_n}{a_n}\}(|T_n-1|+T_n)H_{1,n}. \end{aligned}$$

Note that $\min \{a,b\}=\frac{1}{2}(a+b-|a-b|)$ and $|a-1|+|a|\ge 1$. Then

$$\begin{aligned} I_n& > rsim \frac{1}{2}\int \left[ (\frac{H_{0,n}}{H_{1,n}}+\frac{b_n}{a_n})-|\frac{H_{0,n}}{H_{1,n}}-\frac{b_n}{a_n}|\right] H_{1,n}\nonumber \\&=\frac{1}{2}\left[ 1+\frac{b_n}{a_n}-\int |\frac{H_{0,n}}{H_{1,n}}-\frac{b_n}{a_n}|H_{1,n}\right] . \end{aligned}$$

(5.2)

By Jensen’s inequality, (5.2) reduces to

$$\begin{aligned} I_n& > rsim \frac{1}{2}\left[ 1+\frac{b_n}{a_n}-\sqrt{\int \Big (\frac{H_{0,n}}{H_{1,n}}-\frac{b_n}{a_n}\Big )^2H_{1,n}}\right] \\&=\frac{1}{2}\left[ 1+\frac{b_n}{a_n}-\sqrt{\Big (\int H_{1,n}^{-1}H_{0,n}^2\Big )-2\frac{b_n}{a_n}+\Big (\frac{b_n}{a_n}\Big )^2}\right] . \end{aligned}$$

This with the given assumption $\int H_{1,n}^{-1}H_{0,n}^2\le a_n^{-1}b_n$ concludes

$$\begin{aligned} I_n > rsim \frac{1}{2}\left[ 1+\frac{b_n}{a_n}-\sqrt{\left( \frac{b_n}{a_n}\right) ^2-\frac{b_n}{a_n}}\right] \ge \frac{1}{2}, \end{aligned}$$

which is the desired conclusion (5.1). $\square $

To state the main theorem in this section, recall that

$$\begin{aligned} R_{p,n}(\hat{f}_n,\Sigma _s):=\sup _{f\in \Sigma _s}\left[ E|\hat{f}_n(x)-f(x)|^p\right] ^{\frac{1}{p}}~~(p\ge 1), \end{aligned}$$

where $\Sigma _s$ is a density set for $s\in S\subset \mathbb {R}^m$. A positive sequence $\{\varphi _{n,s}\}~(s\in S)$ is called adaptively admissible if there exists an estimator $\hat{f}_n$ such that

$$\begin{aligned} \limsup _{n\rightarrow \infty }\sup _{s\in S}\left[ \varphi _{n,s}^{-1}R_{p,n}(\hat{f}_n,\Sigma _s)\right] <+\infty . \end{aligned}$$

For two positive sequences $\{\varphi _{n,s}\}$ and $\{\psi _{n,s}\}$, introduce

$$\begin{aligned} A^{(0)}[\psi _{n,s}/\varphi _{n,s}]:=\left\{ s\in S, \lim _{n\rightarrow \infty }\frac{\psi _{n,s}}{\varphi _{n,s}}=0\right\} \end{aligned}$$

and

$$\begin{aligned} A^{(\infty )}[\psi _{n,s}/\varphi _{n,s}]:=\left\{ s\in S, \lim _{n\rightarrow \infty }\frac{\psi _{n,s_0}}{\varphi _{n,s_0}}\frac{\psi _{n,s}}{\varphi _{n,s}}=\infty , \forall s_0\in A^{(0)}[\psi _{n,s}/\varphi _{n,s}]\right\} . \end{aligned}$$

The sequence $\{\psi _{n,s}\}$ outperforms $\{\varphi _{n,s}\}$ on $A^{(0)}[\psi _{n,s}/\varphi _{n,s}]$, while $\{\varphi _{n,s}\}$ does much better than $\{\psi _{n,s}\}$ on $A^{(\infty )}[\psi _{n,s}/\varphi _{n,s}]$. The following definition is a special case of the corresponding ones in References [32] and [18].

Definition 5.1

A positive sequence $\{\varphi _{n,s}\}$ is called optimal rate of adaptive convergence on $\Sigma _s~(s\in S)$, if

1
$\{\varphi _{n,s}\}$ is adaptively admissible;
2
for adaptively admissible sequence $\{\psi _{n,s}\}$ satisfying $A^{(0)}[\psi _{n,s}/\varphi _{n,s}]\ne \emptyset $,
1. (i)
  $A^{(0)}[\psi _{n,s}/\varphi _{n,s}]$ is contained in an $(m-1)-$dimensional manifold,
2. (ii)
  $A^{(\infty )}[\psi _{n,s}/\varphi _{n,s}]$ contains an open set of S.

Theorem 5.1

Let g satisfy (C2), $\Sigma _s=H^{\mathbf {s}}(\Omega _{x_0},L,M)$ with $\mathbf {s}=(s_1,\ldots ,s_d)~(s_l>0)$ and $\frac{1}{s}=\frac{1}{d}\sum \limits _{l=1}^d\frac{1+2\beta _l(\alpha )}{s_l}$. Then an optimal rate of adaptive convergence on $\Sigma _s$ is

$$\begin{aligned} \varphi _{n,s}=\left( \frac{\ln n}{n}\right) ^{\frac{s}{2s+d}}. \end{aligned}$$

Proof

By Theorem 4.1, $\{\varphi _{n,s}\}~(s\in \mathbb {R}^{+}:=(0,+\infty ))$ is an adaptively admissible sequence. Let $\{\psi _{n,s}\}$ be any adaptively admissible sequence satisfying $s_0\in A^{(0)}[\psi _{n,s}/\varphi _{n,s}]$. Then it suffices to show

$$\begin{aligned} A^{(0)}[\psi _{n,s}/\varphi _{n,s}]=\{s_0\},~ A^{(\infty )}[\psi _{n,s}/\varphi _{n,s}] \text { contains an open subset of } \mathbb {R}^{+}_, \end{aligned}$$

(5.3)

according to Definition 5.1.

By $s_0\in A^{(0)}[\psi _{n,s}/\varphi _{n,s}]$, one knows

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\psi _{n,s_0}}{\varphi _{n,s_0}}=0. \end{aligned}$$

(5.4)

Since $\{\psi _{n,s}\}$ is an adaptively admissible sequence, there exist some estimator $\hat{f}_n^*$ and positive constant C such that

$$\begin{aligned}&\sup _{f\in \Sigma _{s_0}}[E_h\psi _{n,s_0}^{-p}|\hat{f}_n^*(x_0)-f(x_0)|^p]^{\frac{1}{p}}\le C, \end{aligned}$$

(5.5)

$$\begin{aligned}&\sup _{f\in \Sigma _{\tilde{s}_0}}[E_h\psi _{n,\tilde{s}_0}^{-p}|\hat{f}_n^*(x_0)-f(x_0)|^p]^{\frac{1}{p}}\le C \end{aligned}$$

(5.6)

hold with $\tilde{s}_0\in \mathbb {R}\setminus \{s_0\}$.

The main work for (5.3) is to show for any $\tilde{s}_0\in (s_0,\infty )$,

$$\begin{aligned} \lim \limits _{n\rightarrow \infty }\frac{\psi _{n,s_0}}{\varphi _{n,s_0}}\cdot \frac{\psi _{n,\tilde{s}_0}}{\varphi _{n,\tilde{s}_0}}=+\infty . \end{aligned}$$

(5.7)

Choose the Daubechies wavelet function $\psi _{2N}$ (for large N) as $\psi :=\psi _{2N}$, and define

$$\begin{aligned} f_{0,n}(x)=f_1(x)+c\gamma _n\prod _{l=1}^d\psi \left( \frac{x_l-x_{0l}}{\delta _{nl}}\right) , \end{aligned}$$

(5.8)

where $f_1(x)=\prod \limits _{l=1}^d\frac{1}{\pi (1+|x_l-x_{0l}|^2)}$ is the d dimensional Cauchy density, $\gamma _n=(\frac{\ln n}{n})^{\frac{s_0}{2s_0+d}}$, $\delta _{nl}=\gamma _n^{\frac{1}{s_{l}}}$ and $\frac{1}{s_0}=\frac{1}{d}\sum \limits _{l=1}^d\frac{1+2\beta _l(\alpha )}{s_l}$. The constant c will be specified later on. It is easy to see $f_1\in \Psi _{\tilde{s}_0}$, $f_{0,n}\in \Psi _{s_0}$ for large n, and $h_{1}=(1-\alpha )f_1+\alpha (f_1*g)>0$. To use Lemma 5.1, one takes

$$\begin{aligned} a_n:=\gamma _n^{-1}\text { and }b_n:=n^{\tau }\text { with } \frac{s_0}{2s_0+d}<\tau <\frac{\tilde{s}_0}{2\tilde{s}_0+d}. \end{aligned}$$

Then $a_nd(f_1,f_{0,n})=a_n|f_1(x_0)-f_{0,n}(x_0)|=a_nc\gamma _n|\psi (0)|^d=c|\psi (0)|^d > rsim 1$ thanks to $\psi (0)\ne 0$ [37].

Clearly,

$$\begin{aligned} \int h_1^{-1}(x)h_{0,n}^2(x)dx=1+\int h_1^{-1}(x)(h_{0,n}(x)-h_1(x))^2dx. \end{aligned}$$

Similar to the estimates of $I_{1n}$ and $I_{2n}$ in the proof of Theorem 2.1 (on pp. 9–10), there exists a constant $\tilde{c}$ depending on $x_0$ such that

$$\begin{aligned} \int h_1^{-1}(x)h_{0,n}^2(x)dx\le 1+\tilde{c}c^2\gamma _n^2\prod _{l=1}^d\delta _{nl}^{1+2\beta _l(\alpha )}=1+\tilde{c}c^2\frac{\ln n}{n}. \end{aligned}$$

(5.9)

Here, $h_1$ and $h_{0,n}$ play the same roles as $h_0$ and $h_n$ there; the constant c appears in the definition of $f_{0,n}$, see (5.8), while $\tilde{c}$ comes from estimates of $I_{1n}$ and $I_{2n}$; In contrast to $a_n=n^{-\frac{s}{2s+d}}$ in (2.1), we choose $\gamma _n=(\frac{\ln n}{n})^{\frac{s_0}{2s_0+d}}$ here so that the factor $\frac{\ln n}{n}$ appears in (5.9). Furthermore, (5.9) reduces to

$$\begin{aligned} \left( \int h_1^{-1}(x)h_{0,n}^2(x)dx\right) ^n\le (1+\tilde{c}c^2\frac{\ln n}{n})^n\le \exp {\{n\cdot \tilde{c}c^2\frac{\ln n}{n}\}}=n^{\tilde{c}c^2}.\end{aligned}$$

Since $\tau >\frac{s_0}{2s_0+d}$, one has $n^{\tilde{c}c^2}<n^{\tau }(\frac{\ln n}{n})^{\frac{s_0}{2s_0+d}}=a_n^{-1}b_n$ by choosing small $c>0$. Hence,

$$\begin{aligned} \int H_1^{-1}(x)H_{0,n}^2(x)dx=\left( \int h_1^{-1}(x)h_{0,n}^2(x)dx\right) ^n\le a_n^{-1}b_n. \end{aligned}$$

According to Lemma 5.1 with $d(f,g)=|f(x_0)-g(x_0)|$ and $b_n=n^{\tau }$,

$$\begin{aligned} a_n\sup _{f\in \Sigma _{s_0}}[E_hd^p(\hat{f}_n^*,f)]^{\frac{1}{p}}+n^{\tau }\sup _{f\in \Sigma _{\tilde{s}_0}} [E_hd^p(\hat{f}_n^*,f)]^{\frac{1}{p}} > rsim 1. \end{aligned}$$

On the other hand, $\lim _{n\rightarrow \infty }a_n\sup _{f\in \Phi _{s_0}}[E_hd^p(\hat{f}_n^*,f)]^{\frac{1}{p}}=0$ thanks to (5.4)–(5.5) and $a_n=\varphi _{n,s_0}^{-1}$. Then

$$\begin{aligned} \underset{n\rightarrow \infty }{\underline{\lim }}\sup \limits _{f\in \Sigma _{\tilde{s}_0}}n^{\tau } [E_hd^p(\hat{f}_n^*,f)]^{\frac{1}{p}} > rsim 1. \end{aligned}$$

This with (5.6) shows

$$\begin{aligned} \underset{n\rightarrow \infty }{\underline{\lim }}n^{\tau }\psi _{n,\tilde{s}_0} > rsim 1. \end{aligned}$$

(5.10)

Because $\tau <\frac{\tilde{s}_0}{2\tilde{s}_0+d}$, there exists $a>0$ such that $\tau +a<\frac{\tilde{s}_0}{2\tilde{s}_0+d}$. Moreover,

$$\begin{aligned} \underset{n\rightarrow \infty }{\underline{\lim }}n^{-a}\frac{\psi _{n,\tilde{s}_0}}{\varphi _{n,\tilde{s}_0}} > rsim 1 \end{aligned}$$

(5.11)

due to (5.10) and $\varphi _{n,\tilde{s}_0}=(\frac{\ln n}{n})^{\frac{\tilde{s}_0}{2\tilde{s}_0+d}}$. By Theorem 2.1 and (5.5), $\psi _{n,s_0} > rsim n^{-\frac{s_0}{2s_0+d}}$ and $\frac{\psi _{n,s_0}}{\varphi _{n,s_0}} > rsim (\ln n)^{-\frac{s_0}{2s_0+d}}$. Combining this with (5.11) and $a>0$, one obtains

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\psi _{n,s_0}}{\varphi _{n,s_0}}\cdot \frac{\psi _{n,\tilde{s}_0}}{\varphi _{n,\tilde{s}_0}}=+\infty , \end{aligned}$$

which is the desired conclusion (5.7).

Using (5.7), one can conclude (5.3) easily. Suppose $s_1\in A^{(0)}[\psi _{n,s}/\varphi _{n,s}]\setminus \{s_0\}$. Then $s_1\in (0,s_0)$ thanks to the definition of $A^{(0)}[\psi _{n,s}/\varphi _{n,s}]$ and (5.7). Replacing $s_0$ by $s_1$, $\tilde{s}_0$ by $s_0$ in the proof of (5.7), one finds

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\psi _{n,s_1}}{\varphi _{n,s_1}}\cdot \frac{\psi _{n,s_0}}{\varphi _{n,s_0}}=+\infty ,\end{aligned}$$

which contradicts with $s_0,s_1\in A^{(0)}[\psi _{n,s}/\varphi _{n,s}]$. Hence, $A^{(0)}[\psi _{n,s}/\varphi _{n,s}]=\{s_0\}$. Furthermore, $A^{(\infty )}[\psi _{n,s}/\varphi _{n,s}] \supseteq (s_0,+\infty )$ thanks to (5.7). Now, (5.3) holds true and the proof is done. $\square $

6 Concluding Remark

In order to deal with density estimation in model $Z=X+\varepsilon Y$, we suppose in this paper that the observed random samples $Z_i$ are i. i. d and verify

$$\begin{aligned}Z_i=X_i+\varepsilon _i Y_i~~(i=1,\ldots ,n),\end{aligned}$$

where the sequences $\{X_i\}$, $\{Y_i\}$ and $\{\varepsilon _i\}$ are mutually independent. The mutual independence plays a key role for (1.2), which is the starting point of the whole paper.

A natural question is that if the i. i. d assumption on $Z_i$ can be replaced by some weaker condition, for example, $\alpha -$mixed or negatively associated. For lower bound estimation, Fano’s lemma is a fundamental tool. To get the relation $K(P_{H_n},P_{H_0})=nK(P_{h_n},P_{h_0})$ (see page 8), we use i. i. d assumption on $Z_i$. It seems hard to estimate Kullback distance $K(P_{H_n},P_{H_0})$ without that condition.

Upper bound estimates need usually Rosenthal’s inequality and Bernstein inequality, which require independent condition on $Z_i$. Of course, there are existing some replacements for $\alpha -$mixed data. However, they become more complicated [33]. We will consider the corresponding density estimation for that case in future.

Although the convergence rate $n^{-\frac{s}{2s+d}}$ in Theorem 3.1 depends heavily on the dimension d, we can reduce the dimension influence under some independence hypothesis, as in [32]. For a partition $\mathcal {P}$ of $\mathcal {I}_d:=\{1,\ldots ,d\}$, a density function f is said to have independence structure $\mathcal {P}$, if

$$\begin{aligned} f(x)=\prod \limits _{I\in \mathcal {P}}f_{I}(x_{I}). \end{aligned}$$

With $I=\{n_1,\ldots ,n_{|I|}\}\in \mathcal {P}$ and $1\le n_1<\cdots <n_{|I|}\le d$, $x_{I}$ stands for an element $(x_{n_1},\ldots ,x_{n_{|I|}})$ in $\mathbb {R}^{|I|}$, where |I| denotes the cardinality of I. We use $f\in H_{\mathcal {P}}^{\mathbf {s}} (\Omega _{x_0},L,~M)$ to denote $f_{I}\in H^{\mathbf {s}_I}(\Omega _{x_{0I}},L_I,~M_I)$ for each $I\in \mathcal {P}$ and $M=:\prod _{I\in \mathcal {P}}M_I,~L=:\prod _{I\in \mathcal {P}}L_I$.

Let $\hat{f}_{n,I}(x_{I})$ be the corresponding linear wavelet estimator of $f_{I}$ given in (3.4) and define $\hat{f}_n(x)$ for $f\in H_{\mathcal {P}}^{\mathbf {s}}(\Omega _{x_{0}},L,~M)$ by

$$\begin{aligned} \hat{f}_n(x):=\prod _{I\in \mathcal {P}}\hat{f}_{n,I}(x_{I}). \end{aligned}$$

Then we can use Theorem 3.1 and the inequality

$$\begin{aligned} \Big |\prod _{i=1}^ma_i-\prod _{i=1}^mb_i \Big |\le m\max { \{ |a_i|^{m-1},|b_i|^{m-1}\}} \cdot \max _{i}|a_i-b_i| \end{aligned}$$

to prove the following result.

Theorem 6.1

Let g satisfy (C1) and $\mathbf {s}=(s_1,\ldots ,s_d)$ with $s_i>0$. Define $s':=\min \limits _{I\in \mathcal {P}} \left( \sum \limits _{i\in I}[1+2\beta _i(\alpha )]/s_i \right) ^{-1}$ and $1/s_{\text {I}}:=|I|^{-1} \sum \limits _{i\in I}(1+2\beta _i(\alpha ))/s_i$. Then with $2^{\lfloor jr_i\rfloor } \sim n^{[s_{\text {I}}/(2s_{\text {I}}+|I|)](1/s_i)}~ (i\in I\in \mathcal {P})$ and $p\in [1, \infty )$,

$$\begin{aligned} \sup _{x\in \Omega _{x_0}}\sup _{f\in H^{\mathbf {s}}_{\mathcal {P}}(\Omega _{x_0},L,~M)} [ \mathrm{E}|\hat{f}_n(x)-f(x)|^p]^{1/p}\lesssim n^{-s'/(2s'+1)}. \end{aligned}$$

(6.1)

We omit the proof, since the arguments are elementary.

When $\mathcal {P}=\{\{1\},\ldots ,\{d\}\}$, the density f has complete independence structure and

$$\begin{aligned} s'=\min _{1\le l\le d}\frac{s_l}{1+2\beta _l(\alpha )}, \end{aligned}$$

which doesn’t depend on the dimension d. Although the estimator $\hat{f}_n$ in Theorem 6.1 is not adaptive, we can apply our selection rule (in Sect. 4) to each $\hat{f}_{n,I}$ in order to get an adaptive estimation.

The next theorem shows the optimality of the estimation in Theorem 6.1.

Theorem 6.2

Let g satisfy (C2) and $s'=\underset{I\in \mathcal {P}}{\min }\left\{ [\sum \limits _{i\in I} (1+2\beta _i(\alpha ))/s_i]^{-1}\right\} $. Then for $M\ge \pi ^{-d}$ and $p\in [1,+\infty )$,

$$\begin{aligned} \inf _{\hat{f}_n}\sup _{x\in \Omega _{x_0}}\sup _{f\in H_{\mathcal {P}}^{\mathbf {s}}(\Omega _{x_{0}},L,~M)} [ \mathrm{E}|\hat{f}_n(x)-f(x)|^p] ^{1/p} > rsim n^{-s'/(2s'+1)}, \end{aligned}$$

where $\hat{f}_n$ runs over all possible estimators for $f\in H_{\mathcal {P}}^{\mathbf {s}}(\Omega _{x_{0}},~M)$.

We outline a proof here. As in the proof of Theorem 2.1, choose the one dimensional Cauchy density function $\tilde{f}_0$ and Meyer’s wavelet $\psi $. Then for $I\in \mathcal {P}$ and $s_{\text {I}}^{-1}= \frac{1}{|I|}\sum \limits _{i\in I}(1+2\beta _i(\alpha ))/s_i)$, define

$$\begin{aligned} f_{0,I}(x_I):=\prod _{i\in I}\tilde{f}_0(x_i)\text { and }\Psi _{n,I}(y_I):= a_n\prod _{i\in I}\psi \Big (\frac{y_i-x_i}{\delta _{ni}}\Big ) \end{aligned}$$

with $a_n:=n^{-s_{\text {I}}/(2s_{\text {I}}+|I|)}$ and $\delta _{ni}=a_n^{\frac{1}{s_i}}$. Furthermore, define $f_0(x):=\prod \limits _{I\in \mathcal {P}}f_{0,I}(x_I)$ and $f_n(y)=f_0(y)+\Psi _n(y)$ with

$$\begin{aligned} \Psi _n(y)= a_n\prod _{i\notin I}\tilde{f}_0(y_i)\prod _{i\in I}\psi \Big (\frac{y_i-x_i}{\delta _{ni}}\Big ). \end{aligned}$$

(6.2)

The remaining proofs are similar to Theorem 2.1.

References

Butucea, C.: The adaptive rate of convergence in a problem of pointwise density estimation. Stat. Probab. Lett. 47, 85–90 (2000)
Article MathSciNet Google Scholar
Butucea, C.: Exact adaptive pointwise estimation on Sobolev classes of densities. ESAIM Prob. Stat. 5, 1–31 (2001)
Article MathSciNet Google Scholar
Benhaddou, R.: Minimax lower bounds for the simultaneous wavelet deconvolution with fractional Gaussian noise and unknown kernels. Stat. Prob. Lett. 140, 91–95 (2018)
Article MathSciNet Google Scholar
Carroll, R.J., Hall, P.: Optimal rates of convergence for deconvolving a density. J. Am. Stat. Assoc. 83, 1184–1186 (1988)
Article MathSciNet Google Scholar
Comte, F., Lacour, C.: Anisotropic adaptive kernel deconvolution. Ann. Inst. Henri Poincaré Prob. Stat. 49, 569–609 (2013)
Article MathSciNet Google Scholar
Delyon, B., Juditsky, A.: On minimax wavelet estimators. Appl. Comput. Harmon. Anal. 3, 215–228 (1996)
Article MathSciNet Google Scholar
Devroye, L.: Consistent deconvolution in density estimation. Can. J. Stat. 17, 235–239 (1989)
Article MathSciNet Google Scholar
Donoho, D.L., Johnstone, I.M., Kerkyacharian, G., Picard, D.: Density estimation by wavelet thresholding. Ann. Stat. 24, 508–539 (1996)
Article MathSciNet Google Scholar
Doukhan, P., León, J.R.: Déviation quadratique déstimateurs de densité par projections orthogonales. C. R. Acad. Sci. Paris Sér. I Math. 310, 425–430 (1990)
MathSciNet MATH Google Scholar
Fan, J.: On the optimal rates of convergence for nonparametric deconvolution problem. Ann. Stat. 19, 1257–1272 (1991)
Article MathSciNet Google Scholar
Fan, J., Koo, J.-Y.: Wavelet deconvolution. IEEE Trans. Inf. Theory 48, 734–747 (2002)
Article MathSciNet Google Scholar
Goldenshluger, A., Lepski, O.: Bandwidth selection in kernel density estimation: oracle inequalities and adaptive minimax optimality. Ann. Stat. 39, 1608–1632 (2011)
Article MathSciNet Google Scholar
Goldenshluger, A., Lepski, O.: On adaptive minimax density estimation on $\mathbb{R}^d$. Prob. Theory Relat. Fields 159, 479–543 (2014)
Article Google Scholar
Hesse, C.H.: Deconvolving a density from partially contaminated observations. J. Multivariate Anal. 55, 246–260 (1995)
Article MathSciNet Google Scholar
Härdle, W.K., Kerkyacharian, G., Picard, D., Tsybakov, A.: Wavelets, Approximation, and Statistical Applications. Springer, New York (1998)
Book Google Scholar
Ibragimov, I.A., Hasminskii, R.Z.: Statistical Estimation: Asymptotic Theory. Springer, New York (1981)
Book Google Scholar
Kerkyacharian, G., Picard, D.: Density estimation in Besov spaces. Stat. Prob. Lett. 13, 15–24 (1992)
Article MathSciNet Google Scholar
Klutchnikoff, N.: Pointwise adaptive estimation of a multivariate function. Math. Methods Stat. 23, 132–150 (2014)
Article MathSciNet Google Scholar
Kou, J.K., Liu, Y.M.: Nonparametric regression estimations over $L^p$ risk based on biased data. Commun. Stat. Theor. Methods 46, 2375–2395 (2017)
Article Google Scholar
Lepski, O.: Multivariate density estimation under sup-norm losses: oracle approach, adaptation and independence structure. Ann. Stat. 41, 1005–1034 (2013)
Article Google Scholar
Lepski, O., Willer, T.: Lower bounds in the convolution structure density model. Bernoulli 23, 884–926 (2017)
Article MathSciNet Google Scholar
Lepski, O., Willer, T.: Oracle inequalities and adaptive estimation in the convolution structure density model. Ann. Stat. 47, 233–287 (2019)
Article MathSciNet Google Scholar
Li, R., Liu, Y.M.: Wavelet optimal estimations for a density with some additive noises. Appl. Comput. Harmon. Anal. 36, 416–433 (2014)
Article MathSciNet Google Scholar
Liu, M., Taylor, R.: A consistent nonparametric density estimator for the deconvolution problem. Can. J. Stat. 17, 427–438 (1989)
Article MathSciNet Google Scholar
Liu, Y.M., Wang, H.Y.: Convergence order of wavelet thresholding estimator for differential operators on Besov spaces. Appl. Comput. Harmon. Anal. 32, 342–356 (2012)
Article MathSciNet Google Scholar
Liu, Y.M., Wu, C.: Point-wise estimation for anisotropic densities. J. Multivariate Anal. 171, 112–125 (2019)
Article MathSciNet Google Scholar
Liu, Y.M., Zeng, X.C.: Asymptotic normality for wavelet deconvolution density estimators. Appl. Comput. Harmon. Anal. 48, 321–342 (2020)
Article MathSciNet Google Scholar
Lounici, K., Nickl, R.: Global uniform risk bounds for wavelet deconvolution estimators. Ann. Stat. 39, 201–231 (2011)
Article MathSciNet Google Scholar
Masry, E.: Strong consistency and rates for deconvolution of multivariate densities of stationary processes. Stoch. Process. Appl. 47, 53–74 (1993)
Article MathSciNet Google Scholar
Pensky, M.: Density deconvolution based on wavelets with bounded supports. Stat. Prob. Lett. 56, 261–269 (2002)
Article MathSciNet Google Scholar
Pensky, M., Vidakovic, B.: Adaptive wavelet estimator for nonparametric density deconvolution. Ann. Stat. 27, 2033–2053 (1999)
Article MathSciNet Google Scholar
Rebelles, G.: Pointwise adaptive estimation of a multivariate density under independence hypothesis. Bernoulli 21, 1984–2023 (2015)
Article MathSciNet Google Scholar
Shao, Q., Yu, H.: Weak convergence for weighted empirical process of dependent sequences. Ann. Prob. 24, 2098–2127 (1996)
Article Google Scholar
Stefanski, L., Carroll, R.: Deconvoluting kernel density estimators. Statistics 21, 169–184 (1990)
Article MathSciNet Google Scholar
Triebel, H.: Theory of Function Spaces III. Birkhäuser, Berlin (2006)
MATH Google Scholar
Tsybakov, A.B.: Pointwise and sup-norm sharp adaptive estimation of functions on the Sobolev classes. Ann. Stat. 26, 2420–2469 (1998)
Article MathSciNet Google Scholar
Walnut, D.F.: An Introduction to Wavelet Analysis. Birkhäuser, Boston (2004)
Book Google Scholar
Walter, G.G.: Density estimation in the presence of noise. Stat. Prob. Lett. 41, 237–246 (1999)
Article MathSciNet Google Scholar
Wishart, J.R.: Smooth hyperbolic wavelet deconvolution with anisotropic structure. Electron. J. Stat. 13, 1694–1716 (2019)
Article MathSciNet Google Scholar
Yuan, M., Chen, J.: Deconvolving multidimensional density from partially contaminated observations. J. Stat. Plann. Inference 104, 147–160 (2002)
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 11771030) and the Beijing Natural Science Foundation (No. 1172001). The authors would like to thank two referees for their important comments and suggestions.

Author information

Authors and Affiliations

Department of Applied Mathematics, Beijing University of Technology, Beijing, 100124, P. R. China
Youming Liu & Cong Wu

Authors

Youming Liu
View author publications
You can also search for this author in PubMed Google Scholar
Cong Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cong Wu.

Additional information

Communicated by Stephane Jaffard.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Y., Wu, C. Point-Wise Wavelet Estimation in the Convolution Structure Density Model. J Fourier Anal Appl 26, 81 (2020). https://doi.org/10.1007/s00041-020-09794-y

Download citation

Received: 31 October 2019
Revised: 21 July 2020
Accepted: 02 October 2020
Published: 20 October 2020
DOI: https://doi.org/10.1007/s00041-020-09794-y

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Point-Wise Wavelet Estimation in the Convolution Structure Density Model

Abstract

Similar content being viewed by others

On a Plug-In Wavelet Estimator for Convolutions of Densities

Adaptive Wavelet Density Estimation Under Independence Hypothesis

Adaptive wavelet estimations for the derivative of a density in GARCH-type model

1 Introduction

Definition 1.1

2 Lower Bound

Lemma 2.1

Theorem 2.1

Proof

Remark 2.1

Remark 2.2

3 Upper Bound

Lemma 3.1

Proof

Lemma 3.2

Theorem 3.1

Proof

Remark 3.1

Remark 3.2

4 Adaptive Estimation

Lemma 4.1

Lemma 4.2

Theorem 4.1

Proof

Remark 4.1

Remark 4.2

Remark 4.3

5 Optimality

Lemma 5.1

Proof

Definition 5.1

Theorem 5.1

Proof

6 Concluding Remark

Theorem 6.1

Theorem 6.2

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation