1 Introduction and preliminary

Nonparametric regression estimation plays important roles in practical applications. The classical approach uses the Nadaraya–Watson type kernel estimators (Ahmad 1995). Because wavelet bases have the local property in both time and frequency domains, a wavelet provides a new method for analyzing functions (signals) with discontinuities or sharp spikes. Therefore it is natural to get better estimations than the kernel method for some cases. Great achievements have been made in this area, see Delyon and Judisky (1996), Hall and Patil (1996), Masry (2000), Chaubey et al. (2013), Chaubey and Shirazi (2015), Chesneau and Shirazi (2014) and Chesneau et al. (2015).

In this paper we consider the following model: Let \((X_{i},Y_{i})_{i\in {\mathbb {Z}}}\) be a strictly stationary random process defined on a probability space \((\varOmega , {\mathscr {F}},P)\) with the common density function

$$\begin{aligned} f(x,y)=\frac{\omega (x,y)g(x,y)}{\mu },\quad (x,y)\in {\mathbb {R}}^{d}\times {\mathbb {R}} \end{aligned}$$

where \(\omega \) stands for a known positive function, g denotes the density function of the unobserved random variables (UV) and \(\mu =E \omega (U,V)<\infty \). Then we want to estimate the unknown regression function

$$\begin{aligned} r(x)=E(\rho (V)|U=x),\quad x\in {\mathbb {R}}^{d} \end{aligned}$$

from a sequence of strong mixing data \((X_{1},Y_{1}), (X_{2},Y_{2}), \ldots , (X_{n},Y_{n})\).

Chaubey et al. (2013) provide an upper bound of the mean integrated squared error for a linear wavelet estimator when \(\rho (V)=V\) and \(V\in [a, b]\); Chaubey and Shirazi (2015) consider the case of a nonlinear wavelet estimator with \(d=1\).

In this paper, we further extend these work to the d-dimensional setting over \(L^{p}\) risk for \(1\le p<\infty \). When \(p=2\), our result reduces to Theorem 4.1 of Chaubey et al. (2013); in the case of \(d=1\) and \(p=2\), it becomes Theorem 5.1 of Chaubey and Shirazi (2015).

1.1 Wavelets and Besov spaces

As a central notion in wavelet analysis, Multiresolution Analysis plays an important role for constructing a wavelet basis, which means a sequence of closed subspaces \(\{V_{j}\}_{j\in {\mathbb {Z}}}\) of the square integrable function space \(L^{2}({\mathbb {R}}^{d})\) satisfying the following properties:

  1. (i)

     \(V_{j}\subseteq V_{j+1}\), \(j\in {\mathbb {Z}}\). Here and after, \({\mathbb {Z}}\) denotes the integer set and \(\mathbb {N}:=\{n\in {\mathbb {Z}}, n\ge 0\};\)

  2. (ii)

     \(\overline{\bigcup \limits _{j\in {\mathbb {Z}}} V_{j}}=L^{2}({\mathbb {R}}^{d})\). This means the space \(\bigcup \nolimits _{j\in {\mathbb {Z}}} V_{j}\) being dense in \(L^{2}({\mathbb {R}}^{d})\);

  3. (iii)

     \(f(2\cdot )\in V_{j+1}\) if and only if \(f(\cdot )\in V_{j}\) for each \(j\in {\mathbb {Z}}\);

  4. (iv)

     There exists a scaling function \(\varphi (x)\in L^{2}({\mathbb {R}}^{d})\) such that \(\{\varphi (\cdot -k),k\in {\mathbb {Z}}^{d}\}\) forms an orthonormal basis of \(V_{0}=\overline{\mathrm{span}}\{\varphi (\cdot -k)\}\).

When \(d=1\), there is a simple way to define an orthonormal wavelet basis. Examples include the Daubechies wavelets with compact supports. For \(d\ge 2\), the tensor product method gives an MRA \(\{V_{j}\}\) of \(L^{2}({\mathbb {R}}^{d})\) from one-dimensional MRA. In fact, with a scaling function \(\varphi \) of tensor products, we find \(M=2^{d}-1\) wavelet functions \(\psi ^{\ell }~(\ell =1,2,\ldots ,M)\) such that for each \(f\in L^{2}({\mathbb {R}}^{d})\), the following decomposition

$$\begin{aligned} f=\sum _{k\in {\mathbb {Z}}^{d}}\alpha _{j_{0}, k}\varphi _{j_{0}, k}+\sum _{j=j_{0}}^{\infty }\sum _{\ell =1}^{M}\sum _{k\in {\mathbb {Z}}^{d}}\beta _{j,k}^{\ell }\psi _{j, k}^{\ell } \end{aligned}$$

holds in \(L^{2}({\mathbb {R}}^{d})\) sense, where \(\alpha _{j_{0},k}=\langle f,\varphi _{j_{0},k}\rangle \), \(\beta _{j,k}^{\ell }=\langle f,\psi _{j,k}^{\ell }\rangle \) and

$$\begin{aligned} \varphi _{j_{0},k}(x)=2^{\frac{dj_{0}}{2}}\varphi (2^{j_{0}}x-k),\quad \psi ^{\ell }_{j,k}(x)=2^{\frac{dj}{2}}\psi ^{\ell }(2^{j}x-k). \end{aligned}$$

Let \(P_{j}\) be the orthogonal projection operator from \(L^{2}({\mathbb {R}}^{d})\) onto the space \(V_{j}\) with the orthonormal basis \(\{\varphi _{j,k}(\cdot )=2^{jd/2}\varphi (2^{j}\cdot -k),k\in {\mathbb {Z}}^{d}\}\). Then for \(f\in L^{2}({\mathbb {R}}^{d})\),

$$\begin{aligned} P_{j}f=\sum _{k\in {\mathbb {Z}}^{d}}\alpha _{j,k}\varphi _{j,k}. \end{aligned}$$
(1)

A wavelet basis can be used to characterize Besov spaces. The next lemma provides equivalent definitions for those spaces, for which we need one more notation: a scaling function \(\varphi \) is called m-regular, if \(\varphi \in C^{m}({\mathbb {R}}^{d})\) and \(|D^{\alpha }\varphi (x)|\le c(1+|x|^{2})^{-\ell }\) for each \(\ell \in {\mathbb {Z}}\) and each multi-index \(\alpha \in \mathbb {N}^d\) with \(|\alpha |\le m\).

Lemma 1.1

(Meyer 1990) Let \(\varphi \) be m-regular, \( \psi ^{\ell } \) (\( \ell =1, 2, \ldots , M, M=2^{d}-1 \)) be the corresponding wavelets and \(f\in L^{p}({\mathbb {R}}^{d})\). If \(\alpha _{j,k}=\langle f,\varphi _{j,k} \rangle \), \(\beta _{j,k}^{\ell }=\langle f,\psi _{j,k}^{\ell } \rangle \), \(p,q\in [1,\infty ]\) and \(0<s<m\), then the following assertions are equivalent:

  1. (1)

     \(f\in B^{s}_{p,q}({\mathbb {R}}^{d})\);

  2. (2)

     \(\{2^{js}\Vert P_{j+1}f-P_{j}f\Vert _{p}\}\in l_{q};\)

  3. (3)

     \(\{2^{j(s-\frac{d}{p}+\frac{d}{2})}\Vert \beta _{j}\Vert _{p}\}\in l_{q}.\)

The Besov norm of f can be defined by

$$\begin{aligned} \Vert f\Vert _{B^{s}_{p,q}}:=\Vert (\alpha _{j_{0}})\Vert _{p}+\left\| \left( 2^{j\left( s-\frac{d}{p}+\frac{d}{2}\right) } \Vert \beta _{j}\Vert _{p}\right) _{j\ge j_{0}} \right\| _{q} ~\mathrm{with}~ \Vert \beta _{j}\Vert _{p}^{p}=\sum \limits _{\ell =1}^{M}\sum \limits _{k\in {\mathbb {Z}}^{d}} \left| \beta ^{\ell }_{j,k} \right| ^{p}\!. \end{aligned}$$

We also need a lemma (Härdle et al. 1998) in our later discussions.

Lemma 1.2

Let a scaling function \(\varphi \in L^{2}({\mathbb {R}}^{d})\) be m-regular and \(\{\alpha _{k}\}\in l_{p}, 1\le p\le \infty \). Then there exist \(c_{2}\ge c_{1}>0\) such that

$$\begin{aligned} c_{1}2^{j\left( \frac{d}{2}-\frac{d}{p}\right) }\Vert (\alpha _{k})\Vert _{p}\le \left\| \sum _{k\in {\mathbb {Z}}^{d}}\alpha _{k}2^{\frac{jd}{2}}\varphi (2^{j}x-k)\right\| _{p}\le c_{2}2^{j\left( \frac{d}{2}-\frac{d}{p}\right) }\Vert (\alpha _{k})\Vert _{p}. \end{aligned}$$

In Härdle et al. (1998), the authors assume a weaker condition than m-regularity. For \(d=1\), the proof of the lemma can be found in Härdle et al. (1998). Similar arguments work as well for \(d\ge 2\). In addition, Lemma 1.2 holds if the scaling function \(\varphi \) is replaced by the corresponding wavelet \(\psi \).

1.2 Problem and main theorem

In this paper we aim to estimate the unknown regression function

$$\begin{aligned} r(x)=E(\rho (V)|U=x),\quad x\in {\mathbb {R}}^{d} \end{aligned}$$
(2)

from a strong mixing sequence (see Definition 1.1) of bivariate random variables \((X_{1},Y_{1}), (X_{2},Y_{2}),\ldots , (X_{n},Y_{n})\) with the common density function

$$\begin{aligned} f(x,y)=\frac{\omega (x,y)g(x,y)}{\mu },\quad (x,y)\in {\mathbb {R}}^{d}\times {\mathbb {R}}, \end{aligned}$$
(3)

where \(\omega \) denotes a weight function, g stands for the density function of the unobserved bivariate random variable (UV) and \(\mu =E\omega (U,V)<\infty \). In addition, h(x) is assumed to be the known density of U with compact support on \([0,1]^{d}\), as in Chaubey et al. (2013) and Chaubey and Shirazi (2015). Throughout the paper, we always require \( \mathrm{supp}~X_{i} \subseteq [0,1]^{d}\).

Definition 1.1

A strictly stationary sequence of random vectors \(\{X_{i}\}_{i\in {\mathbb {Z}}}\) is said to be strong mixing, if

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\alpha (k)=\lim \limits _{k\rightarrow \infty }\sup \{|{\mathbb {P}} (A\cap B)-{\mathbb {P}}(A) {\mathbb {P}} (B)|:~ A\in \digamma ^{0}_{-\infty },\quad B\in \digamma ^{\infty }_{k}\}=0, \end{aligned}$$

where \( \digamma ^{0}_{-\infty } \) denotes the \(\sigma \) field generated by \( \{X_{i}\}_{i \le 0}\) and \( \digamma ^{\infty }_{k} \) does by \( \{X_{i}\}_{i \ge k}\).

Obviously, the independent and identically distributed (i.i.d) data are strong mixing, since \({\mathbb {P}} (A\cap B)={\mathbb {P}}(A) {\mathbb {P}} (B)\) and \(\alpha (k)\equiv 0\) in that case. In addition, \(\{X_{i}\}\) is said to be geometrically strong mixing , when \(\alpha (k)\le \gamma \delta ^{k}\) for some \(\gamma >0\) and \(0<\delta <1\). Now, we provide two examples for geometrically strong mixing data.

Example 1

(Kulik 2008)  Let \(X_t=\sum \nolimits _{j\in {\mathbb {Z}}}a_j\varepsilon _{t-j}\) with

$$\begin{aligned} \{\varepsilon _t,~t\in {\mathbb {Z}}\}\overset{i.i.d.}{\sim } N(0,~\sigma ^2)~\text{ and }~ a_k=\left\{ \begin{array}{cc} 2^{-k}, &{} k\ge 0, \\ 0, &{} k<0. \end{array}\right. \end{aligned}$$

Then it can be proved by Theorem 2 and Corollary 1 of Doukhan (1994) on Page 58 that \(\{X_t,~t\in {\mathbb {Z}}\}\) is a geometrically strong mixing sequence.

Example 2

(Mokkadem 1988)  Let \(\{\varepsilon (t),t\in {\mathbb {Z}}\}\overset{i.i.d.}{\sim } N_r(\mathbf {0},\Sigma )\) (r-dimensional normal distribution) and \(\{Y(t),~t\in {\mathbb {Z}}\}\) satisfy the auto-regression moving average equation

$$\begin{aligned} \sum _{i=0}^{p}B(i)Y(t-i)=\sum _{k=0}^qA(k)\varepsilon (t-k) \end{aligned}$$

with \(l\times r\) and \(l\times l\) matrices A(k), B(i) respectively, as well as B(0) being the identity matrix. If the absolute values of the zeros of the determinant \(\text{ det }~P(z):=\text{ det }\sum \nolimits _{i=0}^pB(i)z^i (z\in \mathbb {C})\) are strictly greater than 1, then \(\{Y(t),~t\in {\mathbb {Z}}\}\) is geometrically strong mixing.

Those two important examples tell us that the strong mixing data does not reduce to the classical i.i.d, although \(\alpha (k)\) goes to zero in the so fast way, \(\alpha (k)=O(\delta ^{k})\).

It is well known that a Lebesgue measurable function maps i.i.d. data to i.i.d. data. When dealing with strong mixing data, it seems necessary to require the functions \(\omega , \rho , h\) in (2) and (3) to be Borel measurable. A Borel measurable function f on \({\mathbb {R}}^{d}\) means \(\{x\in {\mathbb {R}}^{d}, f(x)>c\}\) being a Borel set for each \(c\in {\mathbb {R}}\). In that case, we can prove easily that \(\{f(X_{i})\}\) remains strong mixing and \(\alpha _{f(X)}(k)\le \alpha _{X}(k)~(k=1, 2, \ldots )\), if \(\{X_{i}\}\) has the same property, see Guo (2016). This note is important for the proofs of Propositions 2.2 and 2.3.

Before introducing our estimators, we formulate the following assumptions:

  1. H1.

    The density function h of the random variable U has a positive lower bound,

    $$\begin{aligned} \inf \limits _{x\in [0,1]^{d}}h(x)\ge c_{1}>0. \end{aligned}$$
  2. H2.

    The weight function \(\omega \) has both positive upper and lower bounds, i.e., for \((x,y)\in [0,1]^{d}\times {\mathbb {R}},\)

    $$\begin{aligned} 0<c_{2}\le \omega (x,y)\le c_{3}<+\infty . \end{aligned}$$
  3. H3.

    There exists a constant \(c_{4}>0\) such that,

    $$\begin{aligned} \sup \limits _{y\in {\mathbb {R}}}|\rho (y)|\le c_{4},\quad \int _{{\mathbb {R}}}|\rho (y)|dy\le c_{4}. \end{aligned}$$
  4. H4.

    The strong mixing coefficient of \(\{(X_{i}, Y_{i}), i=1, 2, \ldots , n\}\) satisfies \(\alpha (k)\le \gamma e^{-c_{5}k}\) with \(\gamma>0, c_{5}>0\).

  5. H5.

    The density \(f_{(X_{1}, Y_{1}, X_{k+1}, Y_{k+1})}\) of \((X_{1}, Y_{1}, X_{k+1}, Y_{k+1})~(k\ge 1)\) and the density \(f_{(X_{1}, Y_{1})}\) of \((X_{1}, Y_{1})\) satisfy that for \((x, y, x^{*}, y^{*})\in [0,1]^{d}\times {\mathbb {R}}\times [0,1]^{d}\times {\mathbb {R}}\),

    $$\begin{aligned} \sup \limits _{k\ge 1}\sup \limits _{(x, y, x^{*}, y^{*})\in [0,1]^{d}\times {\mathbb {R}}\times [0,1]^{d}\times {\mathbb {R}}}|h_{k}(x, y, x^{*}, y^{*})|\le c_{6}, \end{aligned}$$

    where \(h_{k}(x, y, x^{*}, y^{*})=f_{(X_{1}, Y_{1}, X_{k+1}, Y_{k+1})}(x, y, x^{*}, y^{*}) -f_{(X_{1}, Y_{1})}(x,y)f_{(X_{k+1}, Y_{k+1})}(x^{*}, y^{*})\) and \(c_{6}>0\).

The assumptions H1 and H2 are standard for the nonparametric regression model with biased data (Chaubey et al. 2013; Chesneau and Shirazi 2014). In Chaubey and Shirazi (2015), the authors assume \(h\equiv 1\). While \(Y\in [a,b]\) is required by Chaubey et al. (2013). Condition H5 can be viewed as a ‘Castellana-Leadbetter’ type condition in Masry (2000).

We choose d-dimensional scaling function

$$\begin{aligned} \varphi (x)=\varphi (x_{1},\ldots ,x_{d}):=D_{2N}(x_{1})\cdots D_{2N}(x_{d}) \end{aligned}$$

with \(D_{2N}(\cdot )\) being the one-dimensional Daubechies scaling function. Then \(\varphi \) is m-regular \((m>0)\) when N gets large enough. Note that \(D_{2N}\) has compact support \([0,2N-1]\) and the corresponding wavelet has compact support \([-N+1,~N]\). Then for \(r\in L^{2}({\mathbb {R}}^{d})\) with \(\mathrm{supp}~r\subseteq [0,1]^{d}\) and \(M=2^{d}-1\),

$$\begin{aligned} r(x)=\sum _{k\in \varLambda _{j_{0}}}\alpha _{j_{0},k} \varphi _{j_{0},k}(x)+\sum _{j=j_{0}}^{\infty }\sum _{\ell =1}^{M} \sum _{k\in \varLambda _{j}}\beta _{j,k}^{\ell }\psi _{j,k}^{\ell }(x), \end{aligned}$$

where \(\varLambda _{j_{0}}=\{1-2N, 2-2N, \ldots , 2^{j_{0}}\}^{d}, ~\varLambda _{j}=\{-N, -N+1, \ldots , 2^{j}+N-1\}^{d}\) and

$$\begin{aligned} \alpha _{j_{0},k}=\int _{[0,1]^{d}}r(x)\varphi _{j_{0},k}(x)dx,\quad \beta _{j,k}^{\ell }=\int _{[0,1]^{d}}r(x)\psi _{j,k}^{\ell }(x)dx. \end{aligned}$$

We introduce

$$\begin{aligned} \widehat{\mu }_{n}= & {} \left[ \frac{1}{n}\sum _{i=1}^{n}\frac{1}{ \omega (X_{i},Y_{i})}\right] ^{-1}, \end{aligned}$$
(4)
$$\begin{aligned} \widehat{\alpha }_{j_{0},k}= & {} \frac{\widehat{\mu }_{n}}{n}\sum _{i=1}^{n}\frac{\rho (Y_{i})}{\omega (X_{i},Y_{i})h(X_{i})}\varphi _{j_{0},k}(X_{i}) \end{aligned}$$
(5)

and

$$\begin{aligned} \widehat{\beta }_{j,k}^{\ell }=\frac{\widehat{\mu }_{n}}{n}\sum _{i=1}^{n}\frac{\rho (Y_{i})}{\omega (X_{i},Y_{i})h(X_{i})}\psi _{j,k}^{\ell }(X_{i}). \end{aligned}$$
(6)

By H1–H3, the definitions in (4)–(6) are all well-defined. When \(\rho (Y)=Y\), \(\widehat{\mu }_{n}\) and \(\widehat{\alpha }_{j_{0},k}\) in (4) and (5) are the same as those of Chaubey et al. (2013). If \(d=1\) and \(h(x)=I_{[0, 1]}(x)\), then \(\widehat{\mu }_{n}\), \(\widehat{\alpha }_{j_{0},k}\) and \(\widehat{\beta }_{j,k}^{\ell }\) in (4)–(6) reduce completely to those in Chaubey and Shirazi (2015).

We define our linear wavelet estimator

$$\begin{aligned} \widehat{r}^{lin}_{n}(x)=\sum _{k\in \varLambda _{j_{0}}}\widehat{\alpha }_{j_{0},k}\varphi _{j_{0},k}(x) \end{aligned}$$
(7)

and the nonlinear wavelet estimator

$$\begin{aligned} \widehat{r}^{non}_{n}(x)=\widehat{r}^{lin}_{n}(x)+\sum _{j=j_{0}}^{j_{1}}\sum \limits _{\ell =1}^{M}\sum _{k\in \varLambda _{j}} \widehat{\beta }_{j,k}^{\ell }I_{\left\{ |\widehat{\beta }_{j,k}^{\ell }|\ge \kappa t_{n}\right\} }\psi _{j,k}^{\ell }(x), \end{aligned}$$
(8)

where \(t_{n}\), \(j_{0}\) and \(j_{1}\) are specified in the Main Theorem, while the constant \(\kappa \) will be chosen in the proof of the theorem.

Comparing with the wavelet estimators in Chaubey et al. (2013) and Chaubey and Shirazi (2015), we use a wavelet basis in the whole space instead of wavelets on an interval. In the later case, boundary elements need be treated appropriately.

The following notations are needed to state our main theorem: For \(H>0\),

$$\begin{aligned} B^{s}_{p,q}(H):=\{r\in B^{s}_{p,q}({\mathbb {R}}^{d}),\quad \Vert r\Vert _{B^{s}_{p,q}}\le H\} \end{aligned}$$

and \(x_{+}:=\max \{x,0\}.\) In addition, \(A\lesssim B\) denotes \(A\le cB\) for some constant \(c>0\); \(A\gtrsim B\) means \(B\lesssim A\); \(A\thicksim B\) stands for both \(A\lesssim B\) and \(B\lesssim A\). The indicator function on a set G is denoted by \(I_{G}\) as usual.

Main Theorem

Consider the problem defined by (2) and (3) under the assumptions H1–H5. Let \(r\in B^{s}_{\widetilde{p},q}(H), \widetilde{p},q\in [1,\infty ),~s>0\), \(\mathrm{supp}~r\subseteq [0,1]^{d}\) and either \(\widetilde{p}\ge p\) or \(\widetilde{p}\le p<\infty \) and \(s>\frac{d}{\widetilde{p}}\). Then for \(1\le p<+\infty \), the linear wavelet estimator \(\widehat{r}^{lin}_{n}\) defined in (7) with \(2^{j_{0}}\thicksim n^{\frac{1}{2s'+d+dI_{\{p>2\}}}}\) and \(s'=s-d(\frac{1}{\widetilde{p}}-\frac{1}{p})_{+}\) satisfies

$$\begin{aligned} E\int _{[0,1]^{d}}\left| \widehat{r}^{lin}_{n}(x)-r(x)\right| ^{p}dx\lesssim n^{-\frac{s'p}{2s'+d+dI_{\{p>2\}}}}; \end{aligned}$$
(9a)

The nonlinear estimator in (8) with \(2^{j_{0}}\sim n^{\frac{1}{2m+d+dI_{\{p>2\}}}}~(m>s)\), \(2^{j_{1}}\sim (\frac{n}{(\ln n)^{3}})^{\frac{1}{d}}\) and \(t_{n}=\left[ I_{\{1\le p\le 2\}}+2^{\frac{jd}{2}}I_{\{p>2\}}\right] \sqrt{\frac{\ln n}{n}}\) satisfies

$$\begin{aligned} E\int _{[0,1]^{d}}\Big |\widehat{r}^{non}_{n}(x)-r(x)\Big |^{p}dx\lesssim (\ln n)^{\frac{3p}{2}} {n}^{-\alpha p}, \end{aligned}$$
(9b)

where

$$\begin{aligned} \alpha =\left\{ \begin{array}{ll} \frac{s}{2s+d+dI_{\{p>2\}}}, &{} {\widetilde{p}>\frac{p(d+dI_{\{p>2\}})}{2s+d+dI_{\{p>2\}}},}\\ \frac{s-d/\widetilde{p}+d/p}{2(s-d/\widetilde{p})+d+dI_{\{p>2\}}}, &{} {\widetilde{p}\le \frac{p(d+dI_{\{p>2\}})}{2s+d+dI_{\{p>2\}}}.} \\ \end{array} \right. \end{aligned}$$
(9c)

Remark 1

When \(p=2\), (9a) reduces to Theorem 4.1 of Chaubey et al. (2013); If \(p=2\) and \(d=1\), (9b) becomes Theorem 5.1 in Chaubey and Shirazi (2015) up to a \(\ln n\) factor.

In contrast to the linear wavelet estimator \(\widehat{r}^{lin}_{n}\), the nonlinear estimator \(\widehat{r}^{non}_{n}\) is adaptive, which means both \(j_{0}\) and \(j_{1}\) do not depend on s, \(\widetilde{p}\) and q. On the other hand, the convergence rate of the nonlinear estimator remains the same as that of the linear estimator up to \((\ln n)^{\frac{3p}{2}}\), when \(\widetilde{p}\ge p\). However, it gets better for \(\widetilde{p}<p\). The same situation happens for i.i.d. case.

Remark 2

Compared with the estimation for i.i.d data in Kou and Liu (2017), the convergence rate of Main Theorem keeps the same ( up to a \(\ln n\) factor), when \(p\in [1,2]\). However, it becomes worse for \(p>2\). This exhibits a major difference between those two types of data.

For i.i.d case, a lower bound estimation under \(L^{p}\) risk is provided by Kou and Liu (2016). It is a challenging problem for strong mixing data.

Remark 3

From (9a)–(9c) in our Main Theorem, we find that the convergence rates close to zero, when the dimension d gets very large (curse of dimensionality). In fact, the same situation happens for the classical i.i.d case (Delyon and Judisky 1996; Kou and Liu 2017). To reduce the influence of the dimension d on the accuracy of estimation, a known method is to assume some independent structure of the samples (Rebelles 2015a, b; Lepski 2013). Since the strong mixing data is much more complicated than the i.i.d one, it would be a challenging problem to do the same in our setting. We shall study it in future.

2 Three propositions

In this section, we provide three propositions for the proof of the Main Theorem which is given in Sect. 3. Clearly, \(\mu :=E\omega (U,V)>0\) under the condition H2. Moreover, the following simple (but important) lemma holds.

Lemma 2.1

For the problem defined in (2) and (3) and \(\widehat{\mu }_{n}\) given by (4),

$$\begin{aligned} E(\widehat{\mu }_{n}^{-1})= & {} {\mu }^{-1}, \end{aligned}$$
(10a)
$$\begin{aligned} E\left[ \frac{\mu \rho (Y_{i})}{\omega (X_{i},Y_{i})h(X_{i})}\varphi _{j_{0},k}(X_{i})\right]= & {} \alpha _{j_{0},k}, \end{aligned}$$
(10b)
$$\begin{aligned} E\left[ \frac{\mu \rho (Y_{i})}{\omega (X_{i},Y_{i})h(X_{i})}\psi ^{\ell }_{j,k}(X_{i})\right]= & {} \beta ^{\ell }_{j,k}, \end{aligned}$$
(10c)

where \(\alpha _{j_{0},k}=\int _{[0, 1]^{d}}r(x)\varphi _{j_{0},k}(x)dx\) and \(\beta _{j,k}^{\ell }=\int _{[0, 1]^{d}}r(x)\psi _{j,k}^{\ell }(x)dx~(\ell =1,2,\ldots , M).\)

Proof

One includes a simple proof for completeness, although it is essentially the same as that of Chaubey et al. (2013). By (4),

$$\begin{aligned} E(\widehat{\mu }_{n}^{-1})= E\left[ \frac{1}{n}\sum \limits _{i=1}^{n}\frac{1}{\omega (X_{i},Y_{i})}\right] = E\left[ \frac{1}{\omega (X_{i},Y_{i})}\right] . \end{aligned}$$

This with (3) leads to

$$\begin{aligned} E(\widehat{\mu }_{n}^{-1})=\int _{[0, 1]^{d}\times {\mathbb {R}}}\frac{f(x,y)}{\omega (x,y)}dxdy=\frac{1}{\mu }\int _{[0, 1]^{d}\times {\mathbb {R}}}g(x,y)dxdy=\frac{1}{\mu }, \end{aligned}$$

which concludes (10a). Using (3) and (2), one knows that

$$\begin{aligned} E\left[ \frac{\mu \rho (Y_{i})}{\omega (X_{i},Y_{i})h(X_{i})}\varphi _{j_{0},k}(X_{i})\right]= & {} \int _{[0, 1]^{d}\times {\mathbb {R}}}\frac{\mu \rho (y)}{\omega (x,y)h(x)}\varphi _{j_{0},k}(x)f(x,y)dxdy\\= & {} \int _{[0, 1]^{d}}\varphi _{j_{0},k}(x)\int _{{\mathbb {R}}}\frac{\rho (y)}{h(x)}g(x,y)dydx\\= & {} \int _{[0, 1]^{d}}r(x)\varphi _{j_{0},k}(x)dx=\alpha _{j_{0},k}. \end{aligned}$$

This completes the proof of (10b). Similar arguments show (10c). \(\square \)

To establish the next two propositions, we need an important Lemma.

Lemma 2.2

(Davydov 1970) Let \(\{X_{i}\}_{i\in {\mathbb {Z}}}\) be strong mixing with the mixing coefficient \(\alpha (k)\), f and g be two measurable functions. If \(E|f(X_{1})|^{p}\) and \(E|g(X_{1})|^{q}\) exist for \(p, q>0\) and \(\frac{1}{p}+\frac{1}{q}<1\), then there exists a constant \(c>0\) such that

$$\begin{aligned} \Big |\mathrm{cov} \Big (f(X_{1}), g(X_{k+1})\Big )\Big |\le c\Big [\alpha (k)\Big ]^{1-\frac{1}{p}-\frac{1}{q}}\Big [E\Big |f(X_{1})\Big |^{p}\Big ]^{\frac{1}{p}}\Big [E\Big |g(X_{1})\Big |^{q}\Big ]^{\frac{1}{q}}. \end{aligned}$$

Proposition 2.1

Let \((X_{i}, Y_{i})~(i=1, 2, \ldots , n)\) be strong mixing, H1–H5 hold and \(2^{jd}\le n\). Then

$$\begin{aligned} \mathrm{var} \left[ \sum \limits _{i=1}^{n}\frac{1}{\omega (X_{i},Y_{i})}\right] \lesssim n~~ \mathrm{and}~~\mathrm{var} \left[ \sum \limits _{i=1}^{n}\frac{\rho (Y_{i})\psi ^{\ell }_{j,k}(X_{i})}{\omega (X_{i},Y_{i})h(X_{i})}\right] \lesssim n. \end{aligned}$$
(11)

Proof

Note that Condition H2 implies \(\mathrm{var} \left( \frac{1}{\omega (X_{i},Y_{i})}\right) \le E\left( \frac{1}{\omega (X_{i},Y_{i})}\right) ^{2}\lesssim 1\) and

$$\begin{aligned} \mathrm{var} \left[ \sum \limits _{i=1}^{n}\frac{1}{\omega (X_{i},Y_{i})}\right] \le n~ \mathrm{var} \left( \frac{1}{\omega (X_{i},Y_{i})}\right) +\left| \sum \limits _{v=2}^{n}\sum \limits _{i=1}^{v-1}\mathrm{cov} \left( \frac{1}{\omega (X_{v},Y_{v})}, \frac{1}{\omega (X_{i},Y_{i})}\right) \right| . \end{aligned}$$

Then it suffices to show

$$\begin{aligned} \left| \sum \limits _{v=2}^{n}\sum \limits _{i=1}^{v-1}\mathrm{cov} \left( \frac{1}{\omega (X_{v},Y_{v})}, \frac{1}{\omega (X_{i},Y_{i})}\right) \right| \lesssim n \end{aligned}$$
(12)

for the first inequality of (11). By the strict stationarity of \((X_{i}, Y_{i})\),

$$\begin{aligned}&\left| \sum \limits _{v=2}^{n}\sum \limits _{i=1}^{v-1}\mathrm{cov} \left( \frac{1}{\omega (X_{v},Y_{v})}, \frac{1}{\omega (X_{i},Y_{i})}\right) \right| \\&\quad =\left| \sum \limits _{m=1}^{n}(n-m) \, \mathrm{cov} \left( \frac{1}{\omega (X_{1},Y_{1})}, \frac{1}{\omega (X_{m+1},Y_{m+1})}\right) \right| \\&\quad \le n\sum \limits _{m=1}^{n} \left| \mathrm{cov} \left( \frac{1}{\omega (X_{1},Y_{1})}, \frac{1}{\omega (X_{m+1},Y_{m+1})}\right) \right| . \end{aligned}$$

On the other hand, Lemma 2.2 and H2 show that

$$\begin{aligned} \left| \mathrm{cov} \left( \frac{1}{\omega (X_{1},Y_{1})}, \frac{1}{\omega (X_{m+1},Y_{m+1})}\right) \right| \lesssim \sqrt{\alpha (m)}\sqrt{E\left| \frac{1}{\omega (X_{1},Y_{1})}\right| ^{4}}\lesssim \sqrt{\alpha (m)}. \end{aligned}$$

These with H4 give the desired conclusion (12),

$$\begin{aligned} \left| \sum \limits _{v=2}^{n}\sum \limits _{i=1}^{v-1}\mathrm{cov} \left( \frac{1}{\omega (X_{v},Y_{v})}, \frac{1}{\omega (X_{i},Y_{i})}\right) \right| \lesssim n\sum \limits _{m=1}^{n}\sqrt{\alpha (m)}\lesssim n. \end{aligned}$$

To prove the second inequality of (11), one observes

$$\begin{aligned} \mathrm{var} \left[ \sum \limits _{i=1}^{n}\frac{\rho (Y_{i}) \psi ^{\ell }_{j,k}(X_{i})}{\omega (X_{i},Y_{i})h(X_{i})}\right]\lesssim & {} n~\mathrm{var} \left( \frac{\rho (Y_{i})\psi ^{\ell }_{j,k}(X_{i})}{\omega (X_{i},Y_{i})h(X_{i})}\right) \\&+\left| \sum \limits _{v=2}^{n}\sum \limits _{i=1}^{v-1}\mathrm{cov} \left( \frac{\rho (Y_{v})\psi ^{\ell }_{j,k}(X_{v})}{\omega (X_{v},Y_{v})h(X_{v})},\frac{\rho (Y_{i})\psi ^{\ell }_{j,k}(X_{i})}{\omega (X_{i},Y_{i})h(X_{i})}\right) \right| . \end{aligned}$$

By (3) and H1–H3, the first term of the above inequality is bounded by

$$\begin{aligned} n E\left( \frac{\rho (Y_{i})\psi ^{\ell }_{j,k}(X_{i})}{\omega (X_{i},Y_{i})h(X_{i})}\right) ^{2}\lesssim n\int _{[0, 1]^{d}\times {\mathbb {R}}}\left[ \psi ^{\ell }_{j,k}(x)\right] ^{2}\frac{g(x,y)}{h(x)}dydx=n. \end{aligned}$$

It remains to show

$$\begin{aligned}&\left| \sum \limits _{v=2}^{n}\sum \limits _{i=1}^{v-1}\mathrm{cov} \left( \frac{\rho (Y_{v})\psi ^{\ell }_{j,k}(X_{v})}{\omega (X_{v},Y_{v})h(X_{v})},\frac{\rho (Y_{i})\psi ^{\ell }_{j,k}(X_{i})}{\omega (X_{i},Y_{i})h(X_{i})}\right) \right| \nonumber \\&\quad \le n \left( \sum \limits _{m=1}^{2^{jd}-1}+\sum \limits _{m=2^{jd}}^{n}\right) \left| \mathrm{cov} \left[ \frac{\rho (Y_{1})\psi ^{\ell }_{j,k}(X_{1})}{\omega (X_{1},Y_{1})h(X_{1})},\frac{\rho (Y_{m+1})\psi ^{\ell }_{j,k}(X_{m+1})}{\omega (X_{m+1},Y_{m+1})h(X_{m+1})}\right] \right| \lesssim n\nonumber \\ \end{aligned}$$
(13)

where the assumption \(2^{jd}\le n\) is needed.

According to H5 and H1–H3,

$$\begin{aligned}&\left| \mathrm{cov} \left( \frac{\rho (Y_{1})\psi ^{\ell }_{j,k}(X_{1})}{\omega (X_{1},Y_{1})h(X_{1})},\frac{\rho (Y_{m+1})\psi ^{\ell }_{j,k}(X_{m+1})}{\omega (X_{m+1},Y_{m+1})h(X_{m+1})}\right) \right| \\&\quad \le \int _{[0,1]^{d}\times {\mathbb {R}}\times [0,1]^{d}\times {\mathbb {R}}} \left| \frac{\rho (y)\psi ^{\ell }_{j,k}(x)}{\omega (x,y)h(x)}\cdot \frac{\rho (y^{*})\psi ^{\ell }_{j,k}(x^{*})}{\omega (x^{*},y^{*})h(x^{*})}\right| |h_{m}(x,y,x^{*},y^{*})|dxdydx^{*}dy^{*}\\&\quad \lesssim \left( \int _{{\mathbb {R}}}|\rho (y)|dy\right) ^{2} \left( \int _{[0,1]^{d}}\left| \psi ^{\ell }_{j,k}(x)\right| dx\right) ^{2}\lesssim 2^{-jd}. \end{aligned}$$

Hence,

$$\begin{aligned} \sum \limits _{m=1}^{2^{jd}-1}\left| \mathrm{cov} \left( \frac{\rho (Y_{1})\psi ^{\ell }_{j,k}(X_{1})}{\omega (X_{1},Y_{1})h(X_{1})},\frac{\rho (Y_{m+1})\psi ^{\ell }_{j,k}(X_{m+1})}{\omega (X_{m+1},Y_{m+1})h(X_{m+1})}\right) \right| \lesssim \sum \limits _{m=1}^{2^{jd}-1}2^{-jd}\lesssim 1.\nonumber \\ \end{aligned}$$
(14)

On the other hand, Lemma 2.2, H1–H3 and the arguments before (13) tell that

$$\begin{aligned}&\left| \mathrm{cov} \left( \frac{\rho (Y_{1})\psi ^{\ell }_{j,k}(X_{1})}{\omega (X_{1},Y_{1})h(X_{1})},\frac{\rho (Y_{m+1})\psi ^{\ell }_{j,k}(X_{m+1})}{\omega (X_{m+1},Y_{m+1})h(X_{m+1})}\right) \right| \lesssim \sqrt{\alpha (m)}\sqrt{E\left| \frac{\rho (Y_{1})\psi ^{\ell }_{j,k}(X_{1})}{\omega (X_{1},Y_{1})h(X_{1})}\right| ^{4}}\\&\quad \lesssim \sqrt{\alpha (m)}\sup \left| \frac{\rho (Y_{1})\psi ^{\ell }_{j,k}(X_{1})}{\omega (X_{1},Y_{1})h(X_{1})}\right| \sqrt{E\left| \frac{\rho (Y_{1})\psi ^{\ell }_{j,k}(X_{1})}{\omega (X_{1},Y_{1})h(X_{1})}\right| ^{2}}\lesssim \sqrt{\alpha (m)}~2^{\frac{jd}{2}}. \end{aligned}$$

Moreover, \( \sum \nolimits _{m=2^{jd}}^{n}\left| \mathrm{cov} \left( \frac{\rho (Y_{1})\psi ^{\ell }_{j,k}(X_{1})}{\omega (X_{1},Y_{1})h(X_{1})},\frac{\rho (Y_{m+1})\psi ^{\ell }_{j,k}(X_{m+1})}{\omega (X_{m+1},Y_{m+1})h(X_{m+1})}\right) \right| \lesssim \sum \nolimits _{m=2^{jd}}^{n}\sqrt{\alpha (m)}~2^{\frac{jd}{2}} \lesssim \lesssim \sum \nolimits _{m=1}^{n}\sqrt{m\alpha (m)}\le \sum \nolimits _{m=1}^{+\infty }m^{\frac{1}{2}}\gamma e^{-\frac{cm}{2}}<+\infty .\) This with (14) shows (13). \(\square \)

To estimate \(E\Big |\widehat{\alpha }_{j_{0},k}-\alpha _{j_{0},k}\Big |^{p}\) and \(E\Big |\widehat{\beta }^{\ell }_{j,k}-\beta ^{\ell }_{j,k}\Big |^{p}\), we introduce a moment bound, which can be found in Yokoyama (1980), Kim (1993) and Shao and Yu (1996).

Lemma 2.3

Let \(\{X_{i}, i=1, 2, \ldots , n\}\) be a strong mixing sequence of random variables with the mixing coefficients \(\alpha (n)\le c n^{-\theta }~(c>0, \theta >0)\). If \(EX_{i}=0\), \(\Vert X_{i}\Vert _{\eta }:=(E|X_{i}|^{\eta })^{\frac{1}{\eta }}<\infty \), \(2<p<\eta <+\infty \) and \(\theta >\frac{p~\eta }{2(\eta -p)}\), then there exists \(K=K(p, \eta , \theta )<\infty \) such that

$$\begin{aligned} E\left| \sum \limits _{i=1}^{n}X_{i}\right| ^{p}\le K~\Vert X_{i}\Vert _{\eta }^{p}~n^{\frac{p}{2}}. \end{aligned}$$

Proposition 2.2

Let \(r\in B^{s}_{\widetilde{p},q}(H) ~(\widetilde{p},q\in [1,\infty ),~s>0)\) and \(\widehat{\alpha }_{j_{0},k}, \widehat{\beta }^{\ell }_{j,k}\) be defined by (5) and (6). If H1–H5 hold, then

$$\begin{aligned} E\Big |\widehat{\alpha }_{j_{0},k}-\alpha _{j_{0},k}\Big |^{p}\lesssim & {} \left[ I_{\{1\le p\le 2\}}+2^{\frac{j_{0}dp}{2}}I_{\{p>2\}}\right] n^{-\frac{p}{2}}, \end{aligned}$$
(15a)
$$\begin{aligned} E\Big |\widehat{\beta }^{\ell }_{j,k}-\beta ^{\ell }_{j,k}\Big |^{p}\lesssim & {} \left[ I_{\{1\le p\le 2\}}+2^{\frac{jdp}{2}}I_{\{p>2\}}\right] n^{-\frac{p}{2}}. \end{aligned}$$
(15b)

Proof

One proves (15b) only, (15a) is similar. By the definition of \(\widehat{\beta }^{\ell }_{j,k}\),

$$\begin{aligned} \widehat{\beta }^{\ell }_{j,k}-\beta ^{\ell }_{j,k} =\frac{\widehat{\mu }_{n}}{\mu } \left[ \frac{\mu }{n}\sum \limits _{i=1}^{n}\frac{\rho (Y_{i})}{\omega (X_{i},Y_{i})h(X_{i})}\psi ^{\ell }_{j,k}(X_{i})-\beta ^{\ell }_{j,k}\right] +\beta ^{\ell }_{j,k}\cdot \widehat{\mu }_{n} \left( \frac{1}{\mu }-\frac{1}{\widehat{\mu }_{n}}\right) \end{aligned}$$

and \(E\left| \widehat{\beta }^{\ell }_{j,k}-\beta ^{\ell }_{j,k}\right| ^{p}\lesssim E\left| \frac{\widehat{\mu }_{n}}{\mu } \left[ \frac{\mu }{n}\sum \nolimits _{i=1}^{n}\frac{\rho (Y_{i})}{\omega (X_{i},Y_{i})h(X_{i})}\psi ^{\ell }_{j,k}(X_{i})-\beta ^{\ell }_{j,k}\right] \right| ^{p} +E\left| \beta ^{\ell }_{j,k}\widehat{\mu }_{n} \left( \frac{1}{\mu }-\frac{1}{\widehat{\mu }_{n}}\right) \right| ^{p}.\) Since Condition H3 implies the boundedness of r, \(\left| \beta ^{\ell }_{j,k}\right| :=\left| \int _{[0,1]^{d}}r(x)\psi ^{\ell }_{j,k}(x)dx\right| \lesssim 1\) thanks to Hölder inequality and orthonormality of \(\{\psi ^{\ell }_{j,k}\}\). On the other hand, \(\left| \frac{\widehat{\mu }_{n}}{\mu }\right| \lesssim 1\) and \(|\widehat{\mu }_{n}|\lesssim 1\) because of H2. Hence,

$$\begin{aligned} E\left| \widehat{\beta }^{\ell }_{j,k}-\beta ^{\ell }_{j,k}\right| ^{p}\lesssim E\left| \frac{\mu }{n}\sum \limits _{i=1}^{n}\frac{\rho (Y_{i})}{\omega (X_{i},Y_{i})h(X_{i})}\psi ^{\ell }_{j,k}(X_{i})-\beta ^{\ell }_{j,k}\right| ^{p}+E\left| \frac{1}{\mu }-\frac{1}{\widehat{\mu }_{n}}\right| ^{p}.\nonumber \\ \end{aligned}$$
(16)

When \(p=2\), it is easy to see from Lemma 2.1 and Proposition  2.1 that \(E\left| \widehat{\beta }^{\ell }_{j,k}-\beta ^{\ell }_{j,k}\right| ^{2}\lesssim \mathrm{var} \left[ \frac{1}{n}\sum \nolimits _{i=1}^{n}\frac{\rho (Y_{i})\psi ^{\ell }_{j,k}(X_{i})}{\omega (X_{i},Y_{i})h(X_{i})}\right] +\mathrm{var} \left[ \frac{1}{n}\sum \nolimits _{i=1}^{n}\frac{1}{\omega (X_{i},Y_{i})}\right] \lesssim \frac{1}{n}\). This with Jensen’s inequality shows \(E\left| \widehat{\beta }^{\ell }_{j,k}-\beta ^{\ell }_{j,k} \right| ^{p}\lesssim n^{-\frac{p}{2}}\) for \(1\le p\le 2\).

It remains to show (15b) true for \(p>2\). By the definition of \(\widehat{\mu }_{n}\),

$$\begin{aligned} E\left| \frac{1}{\mu }-\frac{1}{\widehat{\mu }_{n}}\right| ^{p}= E\left| \frac{1}{n}\sum \limits _{i=1}^{n}\frac{1}{\omega (X_{i},Y_{i})}-\frac{1}{\mu }\right| ^{p}= E\left| \frac{1}{n}\sum \limits _{i=1}^{n}\left[ \frac{1}{\omega (X_{i},Y_{i})}-\frac{1}{\mu }\right] \right| ^{p}.\nonumber \\ \end{aligned}$$
(17)

Let \(\eta _{i}:=\frac{1}{\omega (X_{i},Y_{i})}-\frac{1}{\mu }\). Then \(E(\eta _{i})=0\) thanks to (10a). Furthermore, \(\eta _{1}, \ldots , \eta _{n}\) are strong mixing by the same property of \(\{(X_{i}, Y_{i})\}\) and the Borel measurability of the function \(\frac{1}{\omega (x,y)}-\frac{1}{\mu }\) (Guo 2016). On the other hand, Condition H2 implies \(|\eta _{i}|\lesssim 1\) and \(\Vert \eta _{i}\Vert _{\eta }^{p}\lesssim 1\). By Condition H4, \(\theta \) in Lemma 2.3 can be taken large enough so that \(\theta >\frac{p\eta }{2(\eta -p)}\) for fixed p and \(\eta \) with \(2<p<\eta <+\infty \). Then it follows from Lemma 2.3 and (17) that

$$\begin{aligned} E\left| \frac{1}{\mu }-\frac{1}{\widehat{\mu }_{n}}\right| ^{p}\lesssim n^{-\frac{p}{2}}. \end{aligned}$$

Finally, one needs only to show

$$\begin{aligned} Q_{n}:=E\left| \frac{\mu }{n}\sum \limits _{i=1}^{n}\frac{\rho (Y_{i})}{\omega (X_{i},Y_{i})h(X_{i})}\psi ^{\ell }_{j,k}(X_{i})-\beta ^{\ell }_{j,k}\right| ^{p}\lesssim 2^{\frac{jdp}{2}} n^{-\frac{p}{2}}. \end{aligned}$$
(18)

Define \(\xi _{i}:=\frac{\mu \rho (Y_{i})}{\omega (X_{i},Y_{i})h(X_{i})}\psi ^{\ell }_{j,k}(X_{i})-\beta ^{\ell }_{j,k}\). Then similar arguments to \(\eta _{i}\) conclude that \(E(\xi _{i})=0\), \(Q_{n}=\frac{1}{n^{p}}E|\sum \nolimits _{i=1}^{n}\xi _{i}|^{p}\) and \(\xi _{1}, \ldots , \xi _{n}\) are strong mixing with the mixing coefficients \(\alpha (k)\le \gamma e^{-ck}\). According to H1–H3 and \(\left| \psi ^{\ell }(x)\right| \lesssim 1\),

$$\begin{aligned} \left| \frac{\mu \rho (Y_{i})}{\omega (X_{i},Y_{i})h(X_{i})}\psi ^{\ell }_{j,k}(X_{i})\right| \lesssim 2^{\frac{jd}{2}}. \end{aligned}$$

This with \(\beta ^{\ell }_{j,k}=E\left[ \frac{\mu \rho (Y_{i})}{\omega (X_{i},Y_{i})h(X_{i})}\psi ^{\ell }_{j,k}(X_{i})\right] \) leads to \(E|\xi _{i}|^{\eta }\lesssim 2^{\frac{jd\eta }{2}}\) and \(\Vert \xi _{i}\Vert _{\eta }^{p}\lesssim 2^{\frac{jdp}{2}}\). Using Lemma 2.3 again, one obtains the desired conclusion (18). \(\square \)

To prove the last proposition in this section, we need the following Bernstein-type inequality (Liebscher 1996, 2001; Rio 1995).

Lemma 2.4

Let \((X_{i})_{i\in {\mathbb {Z}}}\) be a strong mixing process with mixing coefficient \(\alpha (k)\), \(EX_{i}=0\), \(|X_{i}|\le M<\infty \) and \(D_{m}=\max \limits _{1\le j\le 2m}\mathrm{var} \left( \sum \nolimits _{i=1}^{j}X_{i}\right) \). Then for \(\varepsilon >0\) and \( ~n,m\in \mathbb {N}\) with \(0<m\le \frac{n}{2}\),

$$\begin{aligned} {\mathbb {P}} \left( \left| \sum \limits _{i=1}^{n}X_{i}\right| \ge \varepsilon \right) \le 4\cdot \exp \left\{ -\frac{\varepsilon ^{2}}{16}\left( nm^{-1}D_{m}+\frac{1}{3}\varepsilon Mm\right) ^{-1}\right\} +32\frac{M}{\varepsilon }n\alpha (m). \end{aligned}$$

From the next proposition, we realize the reason for choosing \(2^{j_{1}}\sim \Big (\frac{n}{(\ln n)^{3}}\Big )^{\frac{1}{d}}\) in our Main Theorem. The classical choice is \(2^{j_{1}}\sim \Big (\frac{n}{\ln n}\Big )^{\frac{1}{d}}\), see Chesneau and Shirazi (2014).

Proposition 2.3

Let \(r\in B^{s}_{\widetilde{p},q}(H) ~(\widetilde{p},q\in [1,\infty ),~s>0)\), \(\widehat{\beta }^{\ell }_{j,k}\) be defined in (6) and \(t_{n}=\sqrt{\frac{\ln n}{n}}\). If H1–H5 hold and \(2^{jd}\le \frac{n}{(\ln n)^{3}}\), then for \(w>0\), there exists a constant \(\kappa >1\) such that

$$\begin{aligned} {\mathbb {P}} \left( \Big |\widehat{\beta }_{j,k}^{\ell }-\beta _{j,k}^{\ell }\Big |\ge \kappa t_{n}\right) \lesssim 2^{-wj}. \end{aligned}$$

Proof

According to the arguments of (16), \(\left| \widehat{\beta }_{j,k}^{\ell }-\beta _{j,k}^{\ell }\right| \lesssim \frac{1}{n} \left| \sum \nolimits _{i=1}^{n} \left[ \frac{1}{\omega (X_{i},Y_{i})}-\frac{1}{\mu } \right] \right| + \left| \frac{1}{n}\sum \nolimits _{i=1}^{n}\frac{\mu \rho (Y_{i})}{\omega (X_{i},Y_{i})h(X_{i})}\psi _{j,k}^{\ell }(X_{i})-\beta _{j,k}^{\ell }\right| .\) Hence, it suffices to prove

$$\begin{aligned}&{\mathbb {P}} \left( \frac{1}{n}\left| \sum \limits _{i=1}^{n} \left[ \frac{1}{\omega (X_{i},Y_{i})}-\frac{1}{\mu }\right] \right| \ge \frac{\kappa }{2}t_{n}\right) \lesssim 2^{-wj} ~~\mathrm{and}\nonumber \\&{\mathbb {P}} \left( \left| \frac{1}{n}\sum \limits _{i=1}^{n} \left[ \frac{\mu \rho (Y_{i})}{\omega (X_{i},Y_{i})h(X_{i})}\psi _{j,k}^{\ell }(X_{i})-\beta _{j,k}^{\ell }\right] \right| \ge \frac{\kappa }{2}t_{n}\right) \lesssim 2^{-wj}. \end{aligned}$$
(19)

One shows the second inequality only, because the first one is similar and even simpler.

Define \(\xi _{i}:=\frac{\mu \rho (Y_{i})}{\omega (X_{i},Y_{i})h(X_{i})}\psi _{j,k}^{\ell }(X_{i})-\beta _{j,k}^{\ell }\). Then \(E(\xi _{i})=0\) thanks to (10c), and \(\xi _{1}, \ldots , \xi _{n}\) are strong mixing with the mixing coefficients \(\alpha (k)\le \gamma e^{-ck}\) because of Condition H4. By H1–H3, \(\left| \frac{\mu \rho (Y_{i})}{\omega (X_{i},Y_{i})h(X_{i})}\psi _{j,k}^{\ell }(X_{i})\right| \lesssim 2^{\frac{jd}{2}}\) and

$$\begin{aligned} |\xi _{i}|\le \left| \frac{\mu \rho (Y_{i})}{\omega (X_{i},Y_{i})h(X_{i})}\psi _{j,k}^{\ell }(X_{i})\right| +E\left| \frac{\mu \rho (Y_{i})}{\omega (X_{i},Y_{i})h(X_{i})}\psi _{j,k}^{\ell }(X_{i})\right| \lesssim 2^{\frac{jd}{2}}. \end{aligned}$$

According to Proposition 2.1, \(D_{m}=\max \limits _{1\le j\le 2m}\mathrm{var} \left( \sum \nolimits _{i=1}^{j}\xi _{i}\right) \lesssim m\). Then it follows from Lemma 2.4 with \(m=u\ln n\) (the constant u will be chosen later on) that

$$\begin{aligned}&{\mathbb {P}} \left( \frac{1}{n}\left| \sum \limits _{i=1}^{n}\xi _{i}\right| \ge \frac{\kappa }{2}t_{n}\right) = {\mathbb {P}} \left( \left| \sum \limits _{i=1}^{n}\xi _{i} \right| \ge \frac{\kappa }{2}nt_{n} \right) \nonumber \\&\quad \lesssim \exp \left\{ -\frac{(\kappa ~n~t_{n})^{2}}{64} \left( nm^{-1}D_{m}+\frac{1}{6}\kappa ~n~t_{n} 2^{\frac{jd}{2}}m\right) ^{-1}\right\} \nonumber \\&\qquad +\,64~\frac{2^{\frac{jd}{2}}}{\kappa ~n~t_{n}}n\gamma e^{-cm}. \end{aligned}$$
(20)

Clearly, \(64~\frac{2^{\frac{jd}{2}}}{\kappa ~n~t_{n}}n\gamma e^{-cm}\lesssim n e^{-cu\ln n}\) holds due to \(t_{n}=\sqrt{\frac{\ln n}{n}}\), \(2^{jd}\le \frac{n}{(\ln n)^{3}}\) and \(m=u\ln n\). Choose u such that \(1-cu<-\frac{w}{d}\), then the second term of (20) is bounded by \(2^{-wj}\). On the other hand, the first one of (20) has the following upper bound

$$\begin{aligned} \exp \left\{ -\frac{\kappa ^{2}\ln n}{64} \left( 1+\frac{1}{6}\kappa \sqrt{\frac{\ln n}{n}}\left( \frac{n}{(\ln n)^{3}}\right) ^{\frac{1}{2}}m\right) ^{-1}\right\} \lesssim \exp \left\{ -\frac{\kappa ^{2}\ln n}{64}\left( 1+\frac{1}{6}\kappa u\right) ^{-1}\right\} \end{aligned}$$

thanks to \(D_{m}\lesssim m\), \(2^{jd}\le \frac{n}{(\ln n)^{3}}\) and \(m=u\ln n\). For this estimation, \(2^{jd}\le \frac{n}{(\ln n)^{3}}\) is essential, one can not replace that condition by \(2^{jd}\le \frac{n}{\ln n}\). Obviously, there exists sufficiently large \(\kappa >1\) such that \(\exp \left\{ -\frac{\kappa ^{2}\ln n}{64} \left( 1+\frac{1}{6}\kappa u\right) ^{-1}\right\} \lesssim 2^{-wj}\). Finally, the desired conclusion (19) follows. \(\square \)

3 Proof of main theorem

This section proves the Main Theorem. We rewrite it as Theorem 3.1. The main idea of the proof comes from Donoho et al. (1996). When \(p=2\), the corresponding estimates seems easier (see Chaubey et al. 2013; Chaubey and Shirazi 2015), because \(L^{2}({\mathbb {R}}^{d})\) is a Hilbert space.

Theorem 3.1

Consider the problem defined by (2) and (3) with the assumptions H1–H5. Let \(r\in B^{s}_{\widetilde{p},q}(H)(\widetilde{p},q\in [1,\infty ),~s>0)\), \(supp~r\subseteq [0,1]^{d}\) and either \(\widetilde{p}\ge p\) or \(\widetilde{p}\le p<\infty \) and \(s>\frac{d}{\widetilde{p}}\). Then for \(1\le p<+\infty \), the linear wavelet estimator \(\widehat{r}^{lin}_{n}\) defined in (7) with \(2^{j_{0}}\thicksim n^{\frac{1}{2s'+d+dI_{\{p>2\}}}}\) and \(s'=s-d\left( \frac{1}{\widetilde{p}}-\frac{1}{p}\right) _{+}\) satisfies

$$\begin{aligned} E\int _{[0,1]^{d}} \left| \widehat{r}^{lin}_{n}(x)-r(x)\right| ^{p}dx\lesssim n^{-\frac{s'p}{2s'+d+dI_{\{p>2\}}}}; \end{aligned}$$
(21a)

The nonlinear one in (8) with \(2^{j_{0}}\sim n^{\frac{1}{2m+d+dI_{\{p>2\}}}}~(m>s)\), \(2^{j_{1}}\sim \Big (\frac{n}{(\ln n)^{3}}\Big )^{\frac{1}{d}}\) and \(t_{n}=\left[ I_{\{1\le p\le 2\}}+2^{\frac{jd}{2}}I_{\{p>2\}}\right] \sqrt{\frac{\ln n}{n}}\) satisfies

$$\begin{aligned} E\int _{[0,1]^{d}} \Big |\widehat{r}^{non}_{n}(x)-r(x)\Big |^{p}dx \lesssim (\ln n)^{\frac{3p}{2}} {n}^{-\alpha p}, \end{aligned}$$
(21b)

where

$$\begin{aligned} \alpha =\left\{ \begin{array}{ll} \frac{s}{2s+d+dI_{\{p>2\}}}, &{} {\widetilde{p}>\frac{p(d+dI_{\{p>2\}})}{2s+d+dI_{\{p>2\}}},}\\ \frac{s-d/\widetilde{p}+d/p}{2(s-d/\widetilde{p})+d+dI_{\{p>2\}}}, &{} {\widetilde{p}\le \frac{p(d+dI_{\{p>2\}})}{2s+d+dI_{\{p>2\}}}.} \\ \end{array} \right. \end{aligned}$$
(21c)

Proof

When \(\widetilde{p}\le p\) and \(s>\frac{d}{\widetilde{p}}, ~s'-\frac{d}{p}=s-\frac{d}{\widetilde{p}}>0\). By the equivalence of (1) and (3) in Lemma 1.1, \(B^{s}_{\widetilde{p},q}(H)\subseteq B^{s'}_{p,q}(H')\) for some \(H'>0\). Then \(r\in B^{s'}_{p, q}(H')\) and \(\Vert P_{j_{0}}r-r\Vert _{p}=\Vert \sum \nolimits _{j=j_{0}}^{\infty }(P_{j+1}r-P_{j}r)\Vert _{p}\lesssim \sum \nolimits _{j=j_{0}}^{\infty }\Vert P_{j+1}r-P_{j}r\Vert _{p}\lesssim \sum \nolimits _{j=j_{0}}^{\infty }2^{-js'}\lesssim 2^{-j_{0}s'}\) thanks to (2) of Lemma 1.1. Moreover,

$$\begin{aligned} \Vert P_{j_{0}}r-r\Vert _{p}^{p}\lesssim 2^{-j_{0}s'p}. \end{aligned}$$
(22)

It is easy to see from Lemma 1.2 that

$$\begin{aligned} E\left\| \widehat{r}^{lin}_{n}-P_{j_{0}}r\right\| ^{p}_{p}=E\left\| \sum \limits _{k\in \varLambda _{j_{0}}}(\widehat{\alpha }_{j_{0},k}-\alpha _{j_{0},k})\varphi _{j_{0},k}\right\| ^{p}_{p} \lesssim 2^{pd\left( \frac{j_{0}}{2}-\frac{j_{0}}{p}\right) } \sum \limits _{k\in \varLambda _{j_{0}}} E\Big |\widehat{\alpha }_{j_{0},k}-\alpha _{j_{0},k}\Big |^{p}. \end{aligned}$$

According to Proposition 2.2 and \(|\varLambda _{j_{0}}|\thicksim 2^{j_{0}d}\),

$$\begin{aligned} E\left\| \widehat{r}^{lin}_{n}-P_{j_{0}}r\right\| _{p}^{p}\lesssim 2^{\frac{j_{0}dp}{2}}\left( I_{\{1\le p\le 2\}}+2^{\frac{j_{0}dp}{2}}I_{\{p>2\}}\right) n^{-\frac{p}{2}}. \end{aligned}$$
(23)

This with (22) shows that \(E\int _{[0,1]^{d}}\big |\widehat{r}^{lin}_{n}(x)-r(x)\big |^{p}dx\le E\int _{{\mathbb {R}}^{d}}\big |\widehat{r}^{lin}_{n}(x)-r(x)\big |^{p}dx \lesssim E\big \Vert \widehat{r}^{lin}_{n}-P_{j_{0}}r\big \Vert _{p}^{p}+\big \Vert P_{j_{0}}r-r\big \Vert _{p}^{p} \lesssim 2^{-j_{0}s'p}+2^{\frac{j_{0}dp}{2}}\big [I_{\{1\le p\le 2\}}+2^{\frac{j_{0}dp}{2}}I_{\{p>2\}}\big ] n^{-\frac{p}{2}}.\) To get a balance, one chooses \(2^{j_{0}}\thicksim n^{\frac{1}{2s'+d+dI_{\{p>2\}}}}\). Then

$$\begin{aligned} E\int _{[0,1]^{d}} \left| \widehat{r}^{lin}_{n}(x)-r(x)\right| ^{p}dx \lesssim n^{-\frac{s'p}{2s'+d+dI_{\{p>2\}}}}, \end{aligned}$$
(24)

which is the desired conclusion (21a) for \(\widetilde{p}\le p\) and \(s>\frac{d}{\widetilde{p}}\).

From the above arguments, one finds that when \(p=\widetilde{p}\), \(s'=s>0\) and the inequality (24) still holds without the assumption \(s>\frac{d}{\widetilde{p}}\). It remains to conclude (21a) for \(\widetilde{p}>p\ge 1\). By Hölder inequality,

$$\begin{aligned} \int _{[0,1]^{d}} \left| \widehat{r}^{lin}_{n}(x)-r(x)\right| ^{p}dx \lesssim \left[ \int _{[0,1]^{d}} \left| \widehat{r}^{lin}_{n}(x)-r(x)\right| ^{\widetilde{p}}dx\right] ^{\frac{p}{\widetilde{p}}}. \end{aligned}$$

Using Jensen inequality and (24) with \(p=\widetilde{p}\), one gets

$$\begin{aligned} E\int _{[0,1]^{d}} \left| \widehat{r}^{lin}_{n}(x)-r(x)\right| ^{p}dx\lesssim \left[ E\int _{[0,1]^{d}} \left| \widehat{r}^{lin}_{n}(x)-r(x)\right| ^{\widetilde{p}}dx\right] ^{\frac{p}{\widetilde{p}}}\lesssim n^{-\frac{s'p}{2s'+d+dI_{\{p>2\}}}}. \end{aligned}$$

This completes the proof of (21a).

Similar to the arguments of (21a), it suffices to prove (21b) for \(\widetilde{p}\le p\) and \(s>\frac{d}{\widetilde{p}}.\) In this case, (21c) can be rewritten as

$$\begin{aligned} \alpha =\min \left\{ \frac{s}{2s+d+dI_{\{p>2\}}},\quad \frac{s-d/\widetilde{p}+d/p}{2(s-d/\widetilde{p})+d+dI_{\{p>2\}}}\right\} . \end{aligned}$$

By the definitions of \(\widehat{r}^{lin}_{n}\) and \(\widehat{r}^{non}_{n}\), \(\widehat{r}^{non}_{n}(x)-r(x)=\Big [\widehat{r}^{lin}_{n}(x)-P_{j_{0}}r(x)\Big ]-\Big [r(x)-P_{j_{1}+1}r(x)\Big ] +\sum \limits _{j=j_{0}}^{j_{1}} \sum \limits _{\ell =1}^{M}\sum \limits _{k\in \varLambda _{j}}\Big [\widehat{\beta }_{j,k}^{\ell }I_{\{|\widehat{\beta }_{j,k}^{\ell }|\ge \kappa t_{n}\}}-\beta _{j,k}^{\ell }\Big ]\psi _{j,k}^{\ell }(x). \) Hence,

$$\begin{aligned} E\int _{[0,1]^{d}}\Big |\widehat{r}^{non}_{n}(x)-r(x)\Big |^{p}dx\lesssim I_{1}+I_{2}+Z, \end{aligned}$$
(25)

where \(I_{1}:=E\Big \Vert \widehat{r}^{lin}_{n}-P_{j_{0}}r\Big \Vert ^{p}_{p},~~I_{2}:=\Big \Vert r-P_{j_{1}+1}r\Big \Vert ^{p}_{p}\) and

$$\begin{aligned} Z:=E\left\| \sum \limits _{j=j_{0}}^{j_{1}} \sum \limits _{\ell =1}^{M} \sum \limits _{k\in \varLambda _{j}}\left[ \widehat{\beta }_{j,k}^{\ell } I_{\left\{ |\widehat{\beta }_{j,k}^{\ell }|\ge \kappa t_{n}\right\} }-\beta _{j,k}^{\ell }\right] \psi _{j,k}^{\ell }\right\| ^{p}_{p}. \end{aligned}$$

According to (23), \(2^{j_{0}}\sim n^{\frac{1}{2m+d+dI_{\{p>2\}}}}~(m>s)\) and the definition of \(\alpha \) in (21c),

$$\begin{aligned} I_{1}=E\Big \Vert \widehat{r}^{lin}-P_{j_{0}}r\Big \Vert _{p}^{p}\lesssim 2^{\frac{j_{0}dp}{2}}\Big [I_{\{1\le p\le 2\}}+2^{\frac{j_{0}dp}{2}}I_{\{p>2\}}\Big ] n^{-\frac{p}{2}}\le n^{-\alpha p}. \end{aligned}$$

The same arguments as (22) shows \(\big \Vert P_{j_{1}+1}r-r\big \Vert _{p}^{p}\lesssim 2^{-j_{1}s'p}\). On the other hand, \(\frac{s'}{d}=\frac{s}{d}-\frac{1}{\widetilde{p}}+\frac{1}{p} \ge \alpha \) thanks to \(\widetilde{p}\le p\) and \(s>\frac{d}{\widetilde{p}}\). Then it follows from \(2^{j_{1}}\sim \big (\frac{n}{(\ln n)^{3}}\big )^{\frac{1}{d}}\) and \(0<\alpha <\frac{1}{2}\) that

$$\begin{aligned} I_{2}=\Big \Vert P_{j_{1}+1}r-r\Big \Vert _{p}^{p}\lesssim (\ln n)^{\frac{3p}{2}} n^{-\alpha p}. \end{aligned}$$

The main work for the proof of (21b) is to show

$$\begin{aligned} Z=E\left\| \sum \limits _{j=j_{0}}^{j_{1}} \sum \limits _{\ell =1}^{M}\sum \limits _{k\in \varLambda _{j}}\left[ \widehat{\beta }_{j,k}^{\ell }I_{\left\{ |\widehat{\beta }_{j,k}^{\ell }|\ge \kappa t_{n}\right\} }-\beta _{j,k}^{\ell }\right] \psi _{j,k}^{\ell }\right\| ^{p}_{p}\lesssim (\ln n)^{\frac{3p}{2}}{n}^{-\alpha p}. \end{aligned}$$
(26)

Note that Lemma 1.2 tells

$$\begin{aligned} Z\lesssim (j_{1}-j_{0}+1)^{p-1}\sum \limits _{j=j_{0}}^{j_{1}}2^{pd \left( \frac{j}{2}-\frac{j}{p}\right) }\sum \limits _{\ell =1}^{M} \sum \limits _{k\in \varLambda _{j}}E\left| \widehat{\beta }_{j,k}^{\ell }I_{ \left\{ |\widehat{\beta }_{j,k}^{\ell }|\ge \kappa t_{n}\right\} }-\beta _{j,k}^{\ell }\right| ^{p}. \end{aligned}$$

Then the classical technique (see Donoho et al. 1996) gives

$$\begin{aligned} Z\lesssim (j_{1}-j_{0}+1)^{p-1}(Z_{1}+Z_{2}+Z_{3}), \end{aligned}$$
(27)

where

$$\begin{aligned} Z_{1}= & {} \sum \limits _{j=j_{0}}^{j_{1}}2^{pd\left( \frac{j}{2}-\frac{j}{p}\right) }\sum \limits _{\ell =1}^{M}\sum \limits _{k\in \varLambda _{j}}E\left[ \left| \widehat{\beta }_{j,k}^{\ell }-\beta _{j,k}^{\ell }\right| ^{p}I_ {\left\{ |\widehat{\beta }_{j,k}^{\ell }-\beta _{j,k}^{\ell }|>\frac{\kappa t_{n}}{2}\right\} }\right] , \\ Z_{2}= & {} \sum \limits _{j=j_{0}}^{j_{1}}2^{pd\left( \frac{j}{2}-\frac{j}{p}\right) }\sum \limits _{\ell =1}^{M}\sum \limits _{k\in \varLambda _{j}}E\left[ \left| \widehat{\beta }_{j,k}^{\ell }-\beta _{j,k}^{\ell }\right| ^{p}I_{\left\{ |\widehat{\beta }_{j,k}^{\ell }|\ge \kappa t_{n},~|\beta _{j,k}^{\ell }|\ge \frac{\kappa t_{n}}{2}\right\} }\right] ,\\ Z_{3}= & {} \sum \limits _{j=j_{0}}^{j_{1}}2^{pd\left( \frac{j}{2}-\frac{j}{p}\right) }\sum \limits _{\ell =1}^{M}\sum \limits _{k\in \varLambda _{j}}\left| \beta _{j,k}^{\ell }\right| ^{p}I_{\left\{ |\widehat{\beta }_{j,k}^{\ell }|<\kappa t_{n},~|\beta _{j,k}^{\ell }|\le 2\kappa t_{n}\right\} }. \end{aligned}$$

For \(Z_{1}\), one observes that

$$\begin{aligned} E\left[ \left| \widehat{\beta }_{j,k}^{\ell }-\beta _{j,k}^{\ell }\right| ^{p}I_{\left\{ |\widehat{\beta }_{j,k}^{\ell }-\beta _{j,k}^{\ell }|>\frac{\kappa t_{n}}{2}\right\} }\right] \le \left[ E\left| \widehat{\beta }_{j,k}^{\ell }-\beta _{j,k}^{\ell }\right| ^{2p}\right] ^{\frac{1}{2}}\left[ {\mathbb {P}} \left( |\widehat{\beta }_{j,k}^{\ell }-\beta _{j,k}^{\ell }|>\frac{\kappa t_{n}}{2}\right) \right] ^{\frac{1}{2}} \end{aligned}$$

thanks to Hölder inequality. By Proposition 2.3,

$$\begin{aligned} {\mathbb {P}}\left( \left| \widehat{\beta }_{j,k}^{\ell }-\beta _{j,k}^{\ell }\right|>\kappa 2^{\frac{jd}{2}}\sqrt{\frac{\ln n}{n}} \right) \le {\mathbb {P}}\left( \left| \widehat{\beta }_{j,k}^{\ell }-\beta _{j,k}^{\ell }\right| >\kappa \sqrt{\frac{\ln n}{n}} \right) \lesssim 2^{-wj}. \end{aligned}$$

On the other hand, Proposition 2.2 implies \(E\Big |\widehat{\beta }_{j,k}^{\ell }-\beta _{j,k}^{\ell }\Big |^{p}\lesssim 2^{\frac{jdp}{2}}n^{-\frac{p}{2}}\) for \(1\le p<+\infty \). Therefore,

$$\begin{aligned} E\left[ \left| \widehat{\beta }_{j,k}^{\ell }-\beta _{j,k}^{\ell }\right| ^{p}I_{\left\{ |\widehat{\beta }_{j,k}^{\ell }-\beta _{j,k}^{\ell }|>\frac{\kappa t_{n}}{2}\right\} }\right] \lesssim 2^{\frac{jdp}{2}}n^{-\frac{p}{2}}~2^{-\frac{wj}{2}}. \end{aligned}$$

Then for \(w>2pd\) in Proposition 2.3, \(Z_{1}\lesssim \sum \nolimits _{j=j_{0}}^{j_{1}}2^{pd(\frac{j}{2}-\frac{j}{p})}2^{jd}2^{\frac{jdp}{2}} n^{-\frac{p}{2}}~2^{-\frac{wj}{2}}\lesssim \lesssim \big (\frac{1}{n}\big )^{\frac{p}{2}}2^{-j_{0}(\frac{w}{2}-pd)}\lesssim \big (\frac{1}{n}\big )^{\frac{p}{2}}\big (\frac{1}{n}\big )^{\frac{\frac{w}{2}-pd}{2m+d+dI_{\{p>2\}}}}\le \big (\frac{1}{n}\big )^{\alpha p} \), where one uses \(\alpha <\frac{1}{2}\) and the choice \(2^{j_{0}}\sim n^{\frac{1}{2m+d+dI_{\{p>2\}}}}~(m>s)\). Hence,

$$\begin{aligned} Z_{1}\le n^{-\alpha p}. \end{aligned}$$
(28)

To estimate \(Z_{2}\), one rewrites

$$\begin{aligned} Z_{2}= & {} \left( \sum \limits _{j=j_{0}}^{j_{0}^{*}}+\sum \limits _{j=j^{*}_{0}+1}^{j_{1}}\right) \left\{ 2^{pd\left( \frac{j}{2}-\frac{j}{p}\right) }\sum \limits _{\ell =1}^{M}\sum \limits _{k\in \varLambda _{j}}E\left[ \left| \widehat{\beta }_{j,k}^{\ell }-\beta _{j,k}^{\ell }\right| ^{p}I_{\left\{ |\widehat{\beta }_{j,k}^{\ell }|\ge \kappa t_{n},~|\beta _{j,k}^{\ell }|\ge \frac{\kappa t_{n}}{2}\right\} }\right] \right\} \\:= & {} Z_{21}+Z_{22} \end{aligned}$$

with the integer \(j_{0}^{*}\in [j_{0}, j_{1}]\) being specified later on. By Proposition 2.2,

$$\begin{aligned} Z_{21}:= & {} \sum \limits _{j=j_{0}}^{j_{0}^{*}}2^{pd\left( \frac{j}{2}-\frac{j}{p}\right) }\sum \limits _{\ell =1}^{M}\sum \limits _{k\in \varLambda _{j}}E\left[ \left| \widehat{\beta }_{j,k}^{\ell }-\beta _{j,k}^{\ell }\right| ^{p}I_{\left\{ |\widehat{\beta }_{j,k}^{\ell }|\ge \kappa t_{n},\quad |\beta _{j,k}^{\ell }|\ge \frac{\kappa t_{n}}{2}\right\} }\right] \nonumber \\\lesssim & {} \sum \limits _{j=j_{0}}^{j_{0}^{*}}2^{pd\left( \frac{j}{2}-\frac{j}{p}\right) }\sum \limits _{\ell =1}^{M}\sum \limits _{k\in \varLambda _{j}}\left[ I_{\{1\le p\le 2\}}+2^{\frac{jdp}{2}}I_{\{p>2\}}\right] n^{-\frac{p}{2}}\nonumber \\\lesssim & {} \left[ 2^{\frac{j_{0}^{*}pd}{2}}I_{\{1\le p\le 2\}}+2^{j_{0}^{*}pd}I_{\{p>2\}}\right] \Big (\frac{1}{n}\Big )^{\frac{p}{2}}. \end{aligned}$$
(29)

On the other hand, it follows from Proposition 2.2, Lemma 1.1(3) and \(t_{n}=\left[ I_{\{1\le p\le 2\}}+2^{\frac{jd}{2}}I_{\{p>2\}}\right] \sqrt{\frac{\ln n}{n}}\) that

$$\begin{aligned} Z_{22}:= & {} \sum \limits _{j=j_{0}^{*}+1}^{j_{1}}2^{pd\left( \frac{j}{2}-\frac{j}{p}\right) }\sum \limits _{\ell =1}^{M}\sum \limits _{k\in \varLambda _{j}}E\left[ \left| \widehat{\beta }_{j,k}^{\ell }-\beta _{j,k}^{\ell }\right| ^{p}I_{\left\{ |\widehat{\beta }_{j,k}^{\ell }|\ge \kappa t_{n},\quad |\beta _{j,k}^{\ell }|\ge \frac{\kappa t_{n}}{2}\right\} }\right] \\\lesssim & {} \sum \limits _{j=j_{0}^{*}+1}^{j_{1}}2^{pd\left( \frac{j}{2}-\frac{j}{p}\right) }\sum \limits _{\ell =1}^{M}\sum \limits _{k\in \varLambda _{j}}E\left| \widehat{\beta }_{j,k}^{\ell }-\beta _{j,k}^{\ell }\right| ^{p}\left( \frac{|\beta _{j,k}^{\ell }|}{\kappa t_{n}/2}\right) ^{\widetilde{p}}\\\lesssim & {} \sum \limits _{j=j_{0}^{*}+1}^{j_{1}}(\ln n)^{-\frac{\widetilde{p}}{2}}\left( \frac{1}{n}\right) ^{\frac{p-\widetilde{p}}{2}}2^{-j\big (s\widetilde{p}+\frac{\widetilde{p}d}{2}+\frac{\widetilde{p}d}{2}I_{\{p>2\}}-\frac{pd}{2}-\frac{pd}{2}I_{\{p>2\}}\big )}. \end{aligned}$$

Define

$$\begin{aligned} \varepsilon :=s\widetilde{p}+\frac{\widetilde{p}d}{2}+\frac{\widetilde{p}d}{2}I_{\{p>2\}}-\frac{pd}{2}-\frac{pd}{2}I_{\{p>2\}}. \end{aligned}$$

Then for \(\varepsilon >0\), \(Z_{22}\lesssim (\ln n)^{-\frac{\widetilde{p}}{2}}(\frac{1}{n})^{\frac{p-\widetilde{p}}{2}}\sum \nolimits _{j=j_{0}^{*}+1}^{j_{1}}2^{-j\varepsilon }\lesssim (\ln n)^{-\frac{\widetilde{p}}{2}}(\frac{1}{n})^{\frac{p-\widetilde{p}}{2}}2^{-j_{0}^{*}\varepsilon }.\) To balance this with (29), one takes

$$\begin{aligned} 2^{j_{0}^{*}}\sim \Big (\frac{n}{\ln n}\Big )^\frac{1-2\alpha }{d+dI_{\{p>2\}}}. \end{aligned}$$

Note that \(0<\alpha \le \frac{s}{2s+d+dI_{\{p>2\}}}<\frac{1}{2}\) and \(2^{j_{0}}\thicksim n^{\frac{1}{2m+d+dI_{\{p>2\}}}}(m>s)\). Then \(2^{j_{0}^{*}}\le 2^{j_{1}}\thicksim \big (\frac{n}{(\ln n)^{3}}\big )^{\frac{1}{d}}\) and \(2^{j_{0}^{*}}\gtrsim \big (\frac{n}{\ln n}\big )^{\frac{1}{2s+d+dI_{\{p>2\}}}}\gtrsim n^{\frac{1}{2m+d+dI_{\{p>2\}}}}\thicksim 2^{j_{0}}~(m>s)\). Since \(\varepsilon >0\), \(\widetilde{p}>\frac{p(d+dI_{\{p>2\}})}{2s+d+dI_{\{p>2\}}}\) and \(\alpha =\frac{s}{2s+d+dI_{\{p>2\}}}\) thanks to (21c). Moreover, it can be checked that

$$\begin{aligned} \frac{p-\widetilde{p}}{2}+\frac{1-2\alpha }{d+dI_{\{p>2\}}}\left( s\widetilde{p}+\frac{\widetilde{p}d}{2}+\frac{\widetilde{p}d}{2}I_{\{p>2\}}-\frac{pd}{2}-\frac{pd}{2}I_{\{p>2\}}\right) =\alpha p \end{aligned}$$

by considering \(p>2\) and \(p\in [1, 2]\) respectively. This with the choice of \(2^{j^{*}_{0}}\) leads to

$$\begin{aligned} Z_{22}\lesssim \big (\ln n\big )^{-\frac{\widetilde{p}}{2}}\big (\frac{1}{n}\big )^{\frac{p-\widetilde{p}}{2}}\big (\frac{\ln n}{n}\big )^{\frac{1-2\alpha }{d+dI_{\{p>2\}}}\big (s\widetilde{p}+\frac{\widetilde{p}d}{2}+\frac{\widetilde{p}d}{2}I_{\{p>2\}}-\frac{pd}{2}-\frac{pd}{2}I_{\{p>2\}}\big )} \lesssim \big (\frac{1}{n}\big )^{\alpha p}. \end{aligned}$$
(30)

On the other hand, (29) with the choice of \(2^{j_{0}^{*}}\) implies

$$\begin{aligned} Z_{21}\lesssim \Big (\frac{1}{n}\Big )^{\alpha p} \end{aligned}$$
(31)

for each \(\varepsilon \in {\mathbb {R}}\).

For the case \(\varepsilon \le 0\), \(\widetilde{p}\le \frac{p(d+dI_{\{p>2\}})}{2s+d+dI_{\{p>2\}}}\) and \(\alpha =\frac{s-\frac{d}{\widetilde{p}}+\frac{d}{p}}{2(s-\frac{d}{\widetilde{p}})+d+dI_{\{p>2\}}}\) (see (21c)). Define \(p_{1}:=(1-2\alpha )p\). Then \(\alpha \le \frac{s}{2s+d+dI_{\{p>2\}}}\) and \(\widetilde{p}\le \frac{p(d+dI_{\{p>2\}})}{2s+d+dI_{\{p>2\}}}\le (1-2\alpha )p=p_{1}.\) Similar to the case \(\varepsilon >0\), \(Z_{22}\lesssim \sum \nolimits _{j=j_{0}^{*}+1}^{j_{1}}2^{pd(\frac{j}{2}-\frac{j}{p})}\sum \nolimits _{\ell =1}^{M}\sum \nolimits _{k\in \varLambda _{j}}E\left| \widehat{\beta }_{j,k}^{\ell }-\beta _{j,k}^{\ell }\right| ^{p}\left( \frac{|\beta _{j,k}^{\ell }|}{\kappa t_{n}/2}\right) ^{p_{1}}.\) Because \(\widetilde{p}\le p_{1}\) and \(r\in B^{s}_{\widetilde{p},q}(H)\), \(\big \Vert \beta _{j}\big \Vert _{p_{1}}^{p_{1}}\le \big \Vert \beta _{j}\big \Vert _{\widetilde{p}}^{p_{1}}\lesssim 2^{-j\big (s-\frac{d}{\widetilde{p}}+\frac{d}{2}\big )p_{1}}\) and

$$\begin{aligned} Z_{22}\lesssim \Big (\ln n\Big )^{-\frac{p_{1}}{2}}\left( \frac{1}{n}\right) ^{\frac{p-p_{1}}{2}}\sum \limits _{j=j_{0}^{*}+1}^{j_{1}}2^{-j\big (sp_{1}-\frac{dp_{1}}{\widetilde{p}}+\frac{dp_{1}}{2}+\frac{dp_{1}}{2}I_{\{p>2\}}-\frac{dp}{2}-\frac{dp}{2}I_{\{p>2\}}+d\big )}. \end{aligned}$$

By the definitions of \(p_{1}\) and \(\alpha \), \(sp_{1}-\frac{dp_{1}}{\widetilde{p}}+\frac{dp_{1}}{2}+\frac{dp_{1}}{2}I_{\{p>2\}}-\frac{dp}{2}-\frac{dp}{2}I_{\{p>2\}}+d=0\) and \(Z_{22}\lesssim \Big (\ln n\Big )^{-\frac{p_{1}}{2}}\Big (\frac{1}{n}\Big )^{\frac{p-p_{1}}{2}}(\ln n)\lesssim \Big (\ln n\Big )\Big (\frac{1}{n}\Big )^{\alpha p}\). This with (30) and (31) shows in both cases,

$$\begin{aligned} Z_{2}=Z_{21}+Z_{22}\lesssim \ln n~\left( \frac{1}{n}\right) ^{\alpha p}. \end{aligned}$$
(32)

Finally, one estimates \(Z_{3}\). Clearly,

$$\begin{aligned} Z_{31}:= & {} \sum \limits _{j=j_{0}}^{j_{0}^{*}}2^{pd\Big (\frac{j}{2}-\frac{j}{p}\Big )}\sum \limits _{\ell =1}^{M}\sum \limits _{k\in \varLambda _{j}}\left| \beta _{j,k}^{\ell }\right| ^{p}I_{\left\{ |\widehat{\beta }_{j,k}^{\ell }|<\kappa t_{n},\quad |\beta _{j,k}^{\ell }|\le 2\kappa t_{n}\right\} }\\\le & {} \sum \limits _{j=j_{0}}^{j_{0}^{*}}2^{pd\Big (\frac{j}{2}-\frac{j}{p}\Big )}\sum \limits _{\ell =1}^{M}\sum \limits _{k\in \varLambda _{j}}\Big |2\kappa t_{n}\Big |^{p} \lesssim \sum \limits _{j=j_{0}}^{j_{0}^{*}}2^{\frac{jdp}{2}}\Big [I_{\{1\le p\le 2\}}+2^{\frac{jdp}{2}}I_{\{p>2\}}\Big ]\left( \frac{\ln n}{n}\right) ^{\frac{p}{2}}\\\lesssim & {} 2^{\frac{j_{0}^{*}dp}{2}}\Big [I_{\{1\le p\le 2\}}+2^{\frac{j_{0}^{*}dp}{2}}I_{\{p>2\}}\Big ]\left( \frac{\ln n}{n}\right) ^{\frac{p}{2}}. \end{aligned}$$

This with the choice of \(2^{j^{*}_{0}}\) shows

$$\begin{aligned} Z_{31}\lesssim \Big (\ln n\Big )^{\frac{p}{2}}\Big (\frac{1}{n}\Big )^{\alpha p}. \end{aligned}$$
(33)

On the other hand,

$$\begin{aligned} Z_{32}:= & {} \sum \limits _{j=j_{0}^{*}+1}^{j_{1}}2^{pd\left( \frac{j}{2}-\frac{j}{p}\right) }\sum \limits _{\ell =1}^{M}\sum \limits _{k\in \varLambda _{j}}\left| \beta _{j,k}^{\ell }\right| ^{p}I_{\left\{ |\widehat{\beta }_{j,k}^{\ell }|<\kappa t_{n},\quad |\beta _{j,k}^{\ell }|\le 2\kappa t_{n}\right\} }\\\le & {} \sum \limits _{j=j_{0}^{*}+1}^{j_{1}}2^{pd\left( \frac{j}{2}-\frac{j}{p}\right) }\sum \limits _{\ell =1}^{M}\sum \limits _{k\in \varLambda _{j}}\left| \beta _{j,k}^{\ell }\right| ^{p}\left| \frac{2\kappa t_{n}}{\beta ^{\ell }_{j,k}}\right| ^{p-\widetilde{p}} \lesssim \left( \frac{\ln n}{n}\right) ^{\frac{p-\widetilde{p}}{2}}\sum \limits _{j=j_{0}^{*}+1}^{j_{1}}2^{-j\varepsilon }. \end{aligned}$$

The same arguments as (30) shows that for \(\varepsilon >0\),

$$\begin{aligned} Z_{32}\lesssim \Big (\ln n\Big )^{\frac{p}{2}}\left( \frac{1}{n}\right) ^{\alpha p}. \end{aligned}$$
(34)

To prove (34) true when \(\varepsilon \le 0\), one writes

$$\begin{aligned} Z_{32}= & {} \left( \sum \limits _{j=j_{0}^{*}+1}^{j_{1}^{*}}+ \sum \limits _{j=j^{*}_{1}+1}^{j_{1}}\right) \left\{ 2^{pd\left( \frac{j}{2}-\frac{j}{p}\right) }\sum \limits _{\ell =1}^{M}\sum \limits _{k\in \varLambda _{j}}\left| \beta _{j,k}^{\ell }\right| ^{p}I_{\left\{ |\widehat{\beta }_{j,k}^{\ell }|<\kappa t_{n},\quad |\beta _{j,k}^{\ell }|\le 2\kappa t_{n}\right\} }\right\} \\:= & {} Z_{321}+Z_{322}, \end{aligned}$$

where the integer \(j_{1}^{*}\in [j_{0}^{*}+1, j_{1}]\) will be determined in the following. Similar to the case \(\varepsilon >0\),

$$\begin{aligned} Z_{321}\lesssim \left( \frac{\ln n}{n}\right) ^{\frac{p-\widetilde{p}}{2}}2^{-j_{1}^{*}\varepsilon } \end{aligned}$$
(35)

holds for \(\varepsilon \le 0\). By \(s>\frac{d}{\widetilde{p}}\) and \(\big \Vert \beta _{j}\big \Vert _{\widetilde{p}}\lesssim 2^{-j(s-\frac{d}{\widetilde{p}}+\frac{d}{2})}\),

$$\begin{aligned} Z_{322}\le & {} \sum \limits _{j=j_{1}^{*}+1}^{j_{1}}2^{pd\Big (\frac{j}{2}-\frac{j}{p}\Big )} \sum \limits _{\ell =1}^{M}\sum \limits _{k\in \varLambda _{j}}\left| \beta _{j,k}^{\ell }\right| ^{p}\le \sum \limits _{j=j_{1}^{*} +1}^{j_{1}}2^{pd\Big (\frac{j}{2}-\frac{j}{p}\Big )}\Big \Vert \beta _{j} \Big \Vert _{\widetilde{p}}^{p}\\\lesssim & {} \sum \limits _{j=j_{1}^{*}+1}^{j_{1}}2^{-j\big (d+sp-pd/\widetilde{p}\big )} \lesssim 2^{-j_{1}^{*}\big (d+sp-pd/\widetilde{p}\big )}. \end{aligned}$$

To make a balance with (35), one takes \(\left( \frac{\ln n}{n}\right) ^{\frac{p-\widetilde{p}}{2}}2^{-j_{1}^{*}\varepsilon }\sim 2^{-j_{1}^{*}\big (d+sp-pd/\widetilde{p}\big )}\), which means

$$\begin{aligned} 2^{j_{1}^{*}}\sim \left( \frac{n}{\ln n}\right) ^{\frac{\alpha }{s-d/\widetilde{p}+d/p}}, \end{aligned}$$

where \(\varepsilon \le 0\) is equivalent to \(\widetilde{p}\le \frac{p(d+dI_{\{p>2\}})}{2s+d+dI_{\{p>2\}}}\) and \(\alpha =\frac{s-\frac{d}{\widetilde{p}}+\frac{d}{p}}{2(s-\frac{d}{\widetilde{p}})+d+dI_{\{p>2\}}}\). In that case, \(\Big (\frac{n}{\ln n}\Big )^{\frac{1-2\alpha }{d+dI_{\{p>2\}}}}\sim 2^{j_{0}^{*}}\le 2^{j_{1}^{*}}\le 2^{j_{1}}\thicksim \Big (\frac{n}{(\ln n)^{3}}\Big )^{\frac{1}{d}}\). Note that

$$\begin{aligned} \frac{p-\widetilde{p}}{2}+\frac{\alpha \varepsilon }{s-d/\widetilde{p}+d/p}=\frac{p-\widetilde{p}}{2}+\frac{s\widetilde{p}+\frac{\widetilde{p}d}{2}+\frac{\widetilde{p}d}{2}I_{\{p>2\}}-\frac{pd}{2}-\frac{pd}{2}I_{\{p>2\}}}{2\left( s-\frac{d}{\widetilde{p}}\right) +d+dI_{\{p>2\}}}=\alpha p. \end{aligned}$$

Then \(Z_{321}\lesssim \big (\ln n\big )^{\frac{p}{2}}\big (\frac{1}{n}\big )^{\alpha p}\) and \(Z_{322}\lesssim \big (\ln n\big )^{\frac{p}{2}}\big (\frac{1}{n}\big )^{\alpha p}\). Therefore, \(Z_{32}=Z_{321}+Z_{322}\lesssim \big (\ln n\big )^{\frac{p}{2}}\big (\frac{1}{n}\big )^{\alpha p}\) for \(\varepsilon \le 0\). Combining this with (33) and (34), one knows \(Z_{3}\lesssim \big (\ln n\big )^{\frac{p}{2}}\big (\frac{1}{n}\big )^{\alpha p}\) in both case. This with (27), (28) and (32) shows

$$\begin{aligned} Z\lesssim \Big (\ln n\Big )^{p-1}\left[ n^{-\alpha p}+\Big (\ln n\Big )\Big (\frac{1}{n}\Big )^{\alpha p}+\Big (\ln n\Big )^{\frac{p}{2}}\Big (\frac{1}{n}\Big )^{\alpha p}\right] \lesssim \Big (\ln n\Big )^{\frac{3p}{2}}\left( \frac{1}{n}\right) ^{\alpha p}, \end{aligned}$$

which is the desired conclusion. Although \((\ln n)^{\frac{3p}{2}}\) can be replaced by \((\ln n)^{\max {\{p, \frac{3p}{2}-1\}}}\) in the last inequality, it can not be in (21b) because one uses Jensen inequality for \(\widetilde{p}\ge p\). \(\square \)