1 Introduction

Over years ago, Parzen (1962) studied some properties of kernel density estimators introduced by Akaike (1954) and Rosenblatt (1956). Since then, nonparametric estimation of the density and regression functions received an intense investigation by statisticians for several years and a large variety of estimation methods were developed. Kernel-based nonparametric function estimation methods have received the interest of the statistics community and several theoretical results along with interesting applications, including economics, finance, biology and environmental science, were considered in the literature. For an exhaustive discussion on the topic the reader can be referred to the following pioneer papers Tapia and Thompson (1978), Wertz (1978), Devroye and Györfi (1985), Devroye (1987), Nadaraya (1989), Härdle (1990), Wand and Jones (1995), Eggermont and LaRiccia (2001), Devroye and Lugosi (2001) and the references therein.

The estimation of function derivatives is a versatile tool in statistical data analysis. For instance, Genovese et al. (2013) introduced a test statistics for the modes of a density based on the second order density derivative. Noh et al. (2018) showed that the optimal bandwidth of kernel density estimation depends on the second-order density derivative. Moreover, as discussed in Silverman (1986) and Wand and Jones (1995), the optimal choice of the bandwidth for a local constant estimator of the density depends on the second derivative of the density function. Notice also that the estimation of density derivatives is an instrumental tool in statistical data analysis in many applications. For example, the first-order density derivative is the fundamental feature for the mean shift clustering seeks modes of the data density, see Fukunaga and Hostetler (1975), Yizong (1995) and Comaniciu and Meer (2002). A statistical test for modes of the data density is based on the second order density derivative (Genovese et al. 2013). The second-order density derivative appears also in the bias of nearest-neighbor Kullback–Leibler divergence estimation, for details refer to Noh et al. (2018). Härdle et al. (1990) and Chacón and Duong (2013) consider the problem of estimating the the density derivative and obtained the optimal bandwidth selection. Sasaki et al. (2016) proposed a novel method that directly estimates density derivatives without going through density estimation. In short, the estimation of the density derivatives is a subject of great interest and received a lot of attention, we can refer to Meyer (1977), Silverman (1978), Cheng (1982), Karunamuni and Mehra (1990), Jones (1994), Abdous et al. (2002), Horová et al. (2002), Henderson and Parmeter (2012a, 2012b), Wu et al. (2014), Schuster (1969).

More applications in fundamental statistical problems such as regression, Fisher information estimation, parameter estimation, and hypothesis testing are discussed in Singh (1976, 1977, 1979). For instance the conditional bias, the conditional variance and the optimal local bandwidth selection of the local polynomial regression estimator depend on high order derivatives of the regression function [see Fan and Gijbels (1995) for more details]. Yu and Jones (1998) showed that the mean square error of the local linear quantile regression estimator, and consequently the choice of the optimal bandwidth, depend on the second derivative, with respect to the covariate, of the conditional cumulative distribution function. Most of the time, when it comes to the numerical implementation of the optimal choice of the bandwidth, we plug-in the derivative of the above mentioned quantities (density, regression or CDF) by their empirical versions without necessarily deeply studying the properties of the estimates of those derivatives.

Furthermore, it has been noted that the estimation of the first- or higher-order derivatives of the regression function is also important for practical implementations including, but not limited to, the modeling of human growth data (Ramsay and Silverman 2002), kidney function for a lupus nephritis patient (Ramsay and Silverman 2005), and Raman spectra of bulk materials (Charnigo et al. 2011). Derivative estimation is also needed in nonparametric regression to construct confidence intervals for regression functions (Eubank and Speckman 1993), to select kernel bandwidths (Ruppert et al. 1995), and to compare regression curves (Park and Kang 2008). Härdle and Gasser (1985) considered an homoscedastic regression model and proposed kernel M-estimators to estimate nonparametrically the first derivative of the regression function. They heuristically extend their proposal to higher order derivatives. The derivative of the regression function, that is used in modal regression, which is an alternative approach to the usual regression methods for exploring the relationship between a response variable and a predictor variable, we may refer to Herrmann and Ziegler (2004); Ziegler (2001, 2002, 2003) and to Bouzebda and Didi (2021) for recent references. The estimation of the regression function was considered from theoretical and practical point of view by Nadaraja (1969), Rice and Rosenblatt (1983), Gasser and Müller (1984), Georgiev (1984) and Delecroix and Rosa (1996). However, less attention was devoted to the study of the derivatives of the regression function.

In the present work, we are interested in studying the asymptotic properties of function derivatives nonparametric estimates. We do not assume anything beyond the stationarity and the ergodicity of the underlying process. For more details about ergodicity assumption, one can refer the reader to Bouzebda et al. (2015), Bouzebda et al. (2016), Bouzebda and Didi (2017a, 2017b) and Krebs (2019) among others. Notice that in the statistical literature, it is commonly assumed that the data are either independent or satisfy a certain form of mixing assumption. Mixing condition can be seen as some kind of asymptotic independence assumption which can be unrealistic and excludes several stochastic processes characterized by a strong dependence structure (such as long memory processes) or their mixing coefficient does not vanish asymptotically (for instance an autoregressive model with discrete innovation). Moreover, one of the arguments invoked by Leucht and Neumann (2013) motivating the usage of the ergodicity assumption is the existence of example of classes of processes where the ergodicity property is much easier to prove than the mixing one. Hence, the ergodicity condition seems to be more natural to adopt as far as it provides a more general dependence framework which includes non-mixing stochastic processes such those generated by noisy chaos.

In the following we illustrate the discussion above through an example of processes which are ergodic but do not necessarily satisfy the mixing condition. For this let \(\{(T_{i},\lambda _{i}):i\in {\mathbb {Z}}\}\) be a strictly stationary process such that \(T_{i}\mid {\mathcal {T}}_{i-1}\) is a Poisson process with parameter \(\lambda _{i}\), where \({\mathcal {T}}_i\) be the \(\sigma \)-field generated by \((T_{i}, \lambda _{i},T_{i-1},\lambda _{i-1},\ldots )\). Assume that \(\lambda _{i}=f(\lambda _{i-1}, T_{i-1})\), where \(f:[0,\infty )\times {\mathbb {N}}\rightarrow (0,\infty ) \quad \text{ is } \text{ a } \text{ given } \text{ function }.\) This process is not mixing in general (see Remark 3 in Neumann 2011). It is known that any sequence \((\varepsilon _i)_{i\in {\mathbb {Z}}}\) of i.i.d. random variables is ergodic. Consequently, one can observe that \(( Y_{i} )_{i \in {\mathbb {Z}}}\) with \(Y_{i} = \vartheta ((\ldots , \varepsilon _{i-1}, \varepsilon _{i} ), (\varepsilon _{i+1},\varepsilon _{i+2},\ldots )),\) for some Borel-measurable function \(\vartheta (\cdot )\), is also ergodic (see Proposition 2.10, page 54 in Bradley 2007 for more details).

To the best of our knowledge, the results presented here, respond to a problem that has not been studied systematically up to the present, which was the basic motivation of the paper. To prove our results, we base our methodology upon the martingale approximation which allows to provide an unified nonparametric time series analysis setting enabling one to launch systematic dependent data studies, which are quite different of existing procedures in the i.i.d. setting.

The remainder of the paper is organized as follows. General notation and definitions of the kernel derivatives estimators are given in Sect. 2. The assumptions and asymptotic properties of the kernel derivative estimators are given in Sect. 3, which includes the uniform strong convergence rates, the asymptotic normality and the AMISE of the family of nonparametric function derivative estimators. Section 4 is devoted to an application for the regression function derivatives. The performance of the proposed procedures is evaluated through simulations in the context of the regression derivatives in Sect. 5. In Sect. 6, we illustrate the estimation methodology on real data. Some concluding remarks and future developments are given in Sect. 7. To avoid interrupting the flow of the presentation, all mathematical developments are relegated to the Sect. 8.

2 Problem formulation and estimation

We start by giving some notation and definitions that are needed for the forthcoming sections. Let \(({\mathbf {X}},{\mathbf {Y}})\) be a random vector, where \({\mathbf {X}} = (X_{1},\ldots ,X_{p}) \in {\mathbb {R}}^{p}\) and \({\mathbf {Y}} = (Y_{1},\ldots ,Y_{q}) \in {\mathbb {R}}^{q}\). The joint distribution function [df] of \((\mathbf { X},{\mathbf {Y}})\) is defined as \(F({\mathbf {x}},{\mathbf {y}}) := {\mathbb {P}}({\mathbf {X}} \le {\mathbf {x}},{\mathbf {Y}}\le \mathbf { y}), ~\text{ for }~ {\mathbf {x}}\in {\mathbb {R}}^{p} ~\text{ and }~\mathbf { y}\in {\mathbb {R}}^{q}.\) In the sequel, for \({\mathbf {v}}^{\prime } =(v_{1}^{\prime },\ldots ,v_{r}^{\prime })\in {\mathbb {R}}^{r}\) and \({\mathbf {v}}^{\prime \prime } =(v^{\prime \prime },\ldots ,v^{\prime \prime }) \in {\mathbb {R}}^{r}\), we set \({\mathbf {v}}^{\prime } \le {\mathbf {v}}^{\prime \prime }\) whenever \(v^{\prime }_{j} \le v^{\prime \prime }_{j}\) for \(j = 1,\ldots ,r\). We denote by \({\mathbf {I}}\) and \({\mathbf {J}}\) two fixed subsets of \({\mathbb {R}}^{p}\) such that \( {\mathbf {I}}=\prod _{j=1}^{p}[a_{j},b_{j}]\subset \mathbf { J}=\prod _{j=1}^{p}[c_{j},d_{j}]\subset {\mathbb {R}}^{p}, \) where \( -\infty<c_{j}<a_{j}<b_{j}<d_{j}<\infty ~\text{ for }~j=1,\ldots ,p. \) We assume that \(({\mathbf {X}},{\mathbf {Y}})\) has a joint density function defined as

$$\begin{aligned} f_{{\mathbf {X}},{\mathbf {Y}}}(\mathbf { x},{\mathbf {y}}) := \frac{\partial ^{p+q}}{\partial x_{1}\ldots \partial x_{p}\partial y_{1}\ldots \partial y_{q}} F(\mathbf { x},{\mathbf {y}}) ~\text{ on }~{\mathbf {J}} \times {\mathbb {R}}^{q}, \end{aligned}$$

with respect to the Lebesgue measure \(\hbox {d}{\mathbf {x}} \times \hbox {d}\mathbf { y}\), and denote

$$\begin{aligned} f_{{\mathbf {X}}}({\mathbf {x}})=\int _{{\mathbb {R}}^{q}}f({\mathbf {x}},{\mathbf {y}})\hbox {d}{\mathbf {y}}, ~\text{ for }~{\mathbf {x}}\in {\mathbf {J}}, \end{aligned}$$

the marginal density of \({\mathbf {X}}\) (which is only assumed to exist on \({\mathbf {J}}\)). For a nonnegative integer vector \(\mathbf { s}= (s_{1}, \ldots , s_{p} ) \in (\{0\}\cup {\mathbb {N}})^p\), define \(|{\mathbf {s}}| := s_{1} +\cdots +s_{p}\) and

$$\begin{aligned} D^{|{\mathbf {s}}|}:=\frac{\partial ^{|{\mathbf {s}}|}}{\partial x_{1}^{s_{1}}\cdots \partial x_{d}^{s_{p}}}. \end{aligned}$$

The operator \(D^{|{\mathbf {s}}|}\) is assumed to be well defined and interchange with integration in our setting. Let \(\psi : \mathbb { R}^{q} \rightarrow {\mathbb {R}}\) be a measurable function. In this paper, we are primarily interested in the estimation of the following derivatives

$$\begin{aligned} D^{|{\mathbf {s}}|} r(\psi ; {\mathbf {x}}) :=D^{|{\mathbf {s}}|} {\mathbb {E}}(\psi ({\mathbf {Y}})\mid {\mathbf {X}}={\mathbf {x}})f_{{\mathbf {X}}}({\mathbf {x}})=\int _{{\mathbb {R}}^{q}}\psi ({\mathbf {y}})D^{|{\mathbf {s}}|}f({\mathbf {x}},{\mathbf {y}})\hbox {d}{\mathbf {y}}, ~ \text{ and } ~ D^{|{\mathbf {s}}|} r(1; {\mathbf {x}}) = D^{|{\mathbf {s}}|} f_{{\mathbf {X}}}({\mathbf {x}}). \end{aligned}$$

An extension to the derivative of the regression function \( m({\mathbf {x}},\psi )={\mathbb {E}}(\psi ({\mathbf {Y}})\mid \mathbf { X}={\mathbf {x}}), \) whenever it exists, will be considered.

2.1 Kernel-type estimation

Let \(\{{\mathbf {X}}_{i},{\mathbf {Y}}_{i}\}_{i\ge 1}\) be a \({\mathbb {R}}^{p}\times {\mathbb {R}}^{q}\)-valued strictly stationary ergodic process defined on a probability space \((\Omega , {\mathcal {A}}, {\mathbb {P}})\). We now introduce a kernel function \(\{K({\mathbf {x}}) : {\mathbf {x}}\in {\mathbb {R}}^{p}\}\), fulfilling the conditions below.

  1. (K.i)

    \(\int _{{\mathbb {R}}^{p}}K({\mathbf {t}})d{\mathbf {t}}=1.\)

  2. (K.ii)

    For given \({\mathbf {s}} \in (\{0\}\cup {\mathbb {N}})^{p}\), the partial derivative \(D^{|{\mathbf {s}}|}K : {\mathbb {R}}^{p} \rightarrow {\mathbb {R}}\) exists and

    $$\begin{aligned} \sup _{{\mathbf {t}}\in {\mathbb {R}}^{p}}|D^{|{\mathbf {s}}|}K({\mathbf {t}})|<\infty . \end{aligned}$$

We need the smoothness condition (K.ii) on the kernel function \(K(\cdot )\) in order to make the operator \(D^{|{\mathbf {s}}|}\) well defined and interchange with integration. The conditions (K.i) and (K.ii) will be assumed tacitly in the sequel. For each \(n \ge 1\), and for each choice of the bandwidth \(h_{n} > 0\), we define the kernel estimators

$$\begin{aligned} f_{{\mathbf {X}};n}({\mathbf {x}},h_{n})=\frac{1}{nh_{n}}\sum _{i=1}^{n}K\left( \frac{{\mathbf {x}}-{\mathbf {X}}_{i}}{h_{n}^{1/p}}\right) ,~~ r_{n}(\psi ;{\mathbf {x}},h_{n})=\frac{1}{nh_{n}}\sum _{i=1}^{n}\psi ({\mathbf {Y}}_{i})K\left( \frac{{\mathbf {x}}-{\mathbf {X}}_{i}}{h_{n}^{1/p}}\right) . \end{aligned}$$

Notice that \(h_{\mathrm {n}}\) is a positive sequence of real numbers such that

$$\begin{aligned} (i) \lim _{n \rightarrow \infty } h_{\mathrm {n}}=0,~ (i i) \lim _{n \rightarrow \infty } n h_{\mathrm {n}}=+\infty , ~ \text{ or }~ (iii) \lim _{n \rightarrow \infty } \frac{n h_{\mathrm {n}}}{\log n}=+\infty . \end{aligned}$$

The condition (i) is used to obtain the asymptotic unbiasedness of the kernel (density or regression) type estimators. We need more restrictive assumption on \(h_{n}\) for the consistency, this is given by the condition (ii), one can refer to Parzen (1962). In general, the strong consistency fails to hold when either (i) or (iii) is not satisfied.

Remark 1

For notational convenience, we have chosen the same bandwidth sequence for each margins. This assumption can be dropped easily. If one wants to make use of the vector bandwidths (see, in particular, Chapter 12 of Devroye and Lugosi 2001). With obvious changes of notation, our results and their proofs remain true when \(h_{n}\) is replaced by a vector bandwidth \({\mathbf {h}}_{n} = (h^{(1)}_{n}, \ldots , h^{(p)}_{n})\), where \(\min h^{(i)}_{n} > 0\). In this situation we set \(h_{n}=\prod _{i=1}^{p} h_{n}^{(i)}\) and, for any vector \({\mathbf {v}} = (v_{1} ,\ldots ,v_{p})\), we replace \(\mathbf { v}/h_{n}\) by \((v_{1}/h_{n}^{(1)},\ldots ,v_{1}/h_{n}^{(p)})\). For ease of presentation we chose to use real-valued bandwidths throughout.

Our aim is to provide estimators of the \(|{\mathbf {s}}|\)-th derivatives \(D^{|{\mathbf {s}}|} f_{{\mathbf {X}}}({\mathbf {x}})\) and \(D^{|{\mathbf {s}}|} r(\psi ;{\mathbf {x}})\), respectively, and to establish their asymptotic properties. The natural choices for these estimators are (for a suitable choice of \(h_{n}>0\)) the \(|\mathbf { s}|\)-th derivatives of \(f_{{\mathbf {X}};n}({\mathbf {x}},h_{n})\) and \(r_{n}(\psi ;{\mathbf {x}},h_{n})\), respectively defined as:

$$\begin{aligned} D^{|{\mathbf {s}}|}f_{{\mathbf {X}};n}({\mathbf {x}},h_{n})= & {} \frac{1}{nh_{n}^{1+|{\mathbf {s}}|/p}}\sum _{i=1}^{n}D^{|{\mathbf {s}}|}K\left( \frac{{\mathbf {x}}-{\mathbf {X}}_{i}}{h_{n}^{1/p}}\right) , \end{aligned}$$
(1)
$$\begin{aligned} D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_{n})= & {} \frac{1}{nh_{n}^{1+|{\mathbf {s}}|/p}}\sum _{i=1}^{n}\psi ({\mathbf {Y}}_{i})D^{|{\mathbf {s}}|}K\left( \frac{{\mathbf {x}}-{\mathbf {X}}_{i}}{h_{n}^{1/p}}\right) . \end{aligned}$$
(2)

Notice that the kernel density derivative estimators \(D^{|\mathbf { s}|}f_{{\mathbf {X}};n}({\mathbf {x}},h_{n})\) is a particular case of \(D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_{n})\), that is

$$\begin{aligned} D^{|{\mathbf {s}}|}f_{{\mathbf {X}};n}({\mathbf {x}},h_{n})=D^{|{\mathbf {s}}|}r_{n}(1;{\mathbf {x}},h_{n}). \end{aligned}$$
(3)

Remark 2

The general kernel-type estimator of \(m(\cdot ,\psi )={\mathbb {E}}(\psi (Y)\mid {\mathbf {X}}=\cdot )\) is given, for \({\mathbf {x}}\in {\mathbb {R}}^{p}\), by

$$\begin{aligned} {\widehat{m}}_{n;h_{n}}({\mathbf {x}}, \psi ):= \frac{\displaystyle \sum \nolimits _{i =1}^n \psi ({\mathbf {Y}}_i)K\left( \frac{{\mathbf {x}}-{\mathbf {X}}_{i}}{h_n^{1/p}}\right) }{\displaystyle \sum \nolimits ^n_{i =1} K\left( \frac{{\mathbf {x}}-{\mathbf {X}}_{i}}{h_n^{1/p}}\right) }. \end{aligned}$$
(4)

By setting, for \(q=1\), \(\psi (y)=y\) (or \(\psi (y)=y^{k}\) ) into (4) we get the classical Nadaraya–Watson (1964, 1964) kernel regression function estimator of \(m({\mathbf {x}}) :={\mathbb {E}}(Y\mid \mathbf { X}={\mathbf {x}})\) given by

$$\begin{aligned} {\widehat{m}}_{n;h_{n}}({\mathbf {x}}):= \frac{\displaystyle \sum \nolimits _{i =1}^n Y_i K\left( \frac{{\mathbf {x}}-{\mathbf {X}}_{i}}{h_n^{1/p}}\right) }{\displaystyle \sum \nolimits ^n_{i =1} K\left( \frac{{\mathbf {x}}-{\mathbf {X}}_{i}}{h_n^{1/p}}\right) }, ~ \text{ or } ~ {\widehat{m}}_{n;h_{n}}({\mathbf {x}}):= \frac{\displaystyle \sum \nolimits _{i =1}^n Y_i^{k} K\left( \frac{{\mathbf {x}}-{\mathbf {X}}_{i}}{h_n^{1/p}}\right) }{\displaystyle \sum \nolimits ^n_{i =1} K\left( \frac{{\mathbf {x}}-{\mathbf {X}}_{i}}{h_n^{1/p}}\right) }. \end{aligned}$$

Nadaraya (1964) established similar results to those of Parzen (1962) for \({\widehat{m}}_{n;h_{n}}({\mathbf {x}})\) as an estimator for \({\mathbb {E}}(Y\mid {\mathbf {X}}={\mathbf {x}})\).

Remark 3

By setting \(\psi _{\mathbf {t}}( {\mathbf {y}})=\mathbbm {1}[\mathbf { y}\le {\mathbf {t}}]\), for \({\mathbf {t}} \in {\mathbb {R}}^{q}\), into (4) we obtain the kernel estimator of the conditional distribution function \(F({\mathbf {t}} |{\mathbf {x}}) :={\mathbb {P}}({\mathbf {Y}}\le {\mathbf {t}} |{\mathbf {X}}={\mathbf {x}}),\) given by

$$\begin{aligned} {\widehat{F}}_{n;h_{n}}({\mathbf {t}}|{\mathbf {x}}):= \frac{\displaystyle \sum \nolimits _{i =1}^n \mathbbm {1}({\mathbf {Y}}_i \le {\mathbf {t}}) K\left( \frac{{\mathbf {x}}-{\mathbf {X}}_{i}}{h_n^{1/p}} \right) }{\displaystyle \sum \nolimits ^n_{i =1} K\left( \frac{{\mathbf {x}}-{\mathbf {X}}_{i}}{h_n^{1/p}}\right) }. \end{aligned}$$

These examples motivate the introduction of the function \(\psi (\cdot )\) in our setting, refer to Deheuvels (2011) for more discussion.

Remark 4

Local polynomial regression has emerged as a dominant method for nonparametric estimation and inference. The local linear variant was proposed by Stone (1977) and Cleveland (1979), see Fan (1992) and Fan and Gijbels (1996) for an extensive treatment of the local polynomial estimator. However, Racine (2016) mentioned that one feature of local polynomial estimators that may not be widely appreciated is that the local polynomial derivative estimator does not, in general, coincide with the analytic derivative of the local polynomial regression estimator infinite-sample settings. This can cause problems, particularly in the context of shape constrained estimation. The problem arises when the object of interest is the regression function itself and constraints are to be imposed on derivatives of the regression function, however the regression estimate and derivative estimate are not internally consistent, i.e., the derivative of the local polynomial regression estimate does not coincide with the local polynomial derivative estimate.

3 Assumptions and asymptotic properties

We will denote by \({\mathcal {F}}_i\) the \(\sigma -\)field generated by \(({\mathbf {X}}_1,\ldots , {\mathbf {X}}_i)\). For any \(i=1,\ldots , n\) define \(f^{{{\mathcal {F}}}_{i-1}}_{{\mathbf {X}}_i}(\cdot )\) as the conditional density of \({\mathbf {X}}_i\) given the \(\sigma -\)field \({\mathcal {F}}_{i-1}\). Let \({\mathcal {G}}_n\) be the \(\sigma \)-field generated by \(\{({\mathbf {X}}_i,{\mathbf {Y}}_i): 1\le i\le n\}\), and let \(f_{{\mathbf {X}},{\mathbf {Y}}}^{{\mathcal {G}}_{i-1}}(\cdot )\) be the conditional density of \(({\mathbf {X}},{\mathbf {Y}})\) given the \(\sigma -\)field \({\mathcal {G}}_{i-1}\). Let us define the \(\sigma -\)field \({{\mathcal {S}}}_n = \sigma \left( ({\mathbf {X}}_k,{\mathbf {Y}}_k); ({\mathbf {X}}_{n+1}): 1\le k \le n \right) ,\) and the projection operator

$$\begin{aligned} {\mathcal {P}}_k\xi ={\mathbb {E}}\left( \xi \mid {\mathcal {G}}_k\right) -{\mathbb {E}}\left( \xi \mid {\mathcal {G}}_{k-1}\right) . \end{aligned}$$

Moreover, if \(\zeta (\cdot )\) is a real-valued random function which satisfies \(\zeta (u) / u \rightarrow 0\) a.s. as \(u \rightarrow 0,\) we write \(\zeta (u)=o_{\text{ a.s. } }(u)\). In the same way, we say that \(\zeta (u)\) is \(O_{\text{ a.s. } }(u)\) if \(\zeta (u) / u\) is a.s. bounded as \(u \rightarrow 0 .\)

The following assumptions will be needed throughout the paper.

  1. (K.1)

       

    1. (i)

      The kernel \(K(\cdot )\) is a symmetric compactly supported probability density function,

    2. (ii)

      The kernel derivatives \(D^{|{\mathbf {s}}|}K(\cdot )\), \(s=0,1,\ldots \), are assumed to be Lipschitz function with ratio \(C_{K,s}<\infty \) and order \(\gamma \), i.e., \(|D^{|{\mathbf {s}}|}K({\mathbf {x}})-D^{|{\mathbf {s}}|}K({\mathbf {x}}^{'}) | \le C_{K,s}\Vert {\mathbf {x}}-{\mathbf {x}}^{'}\Vert ^{\gamma }, \quad \text{ for }~~ ({\mathbf {x}},{\mathbf {x}}^{'})\in {\mathbb {R}}^{2p};\)

    3. (iii)

      \(\int _{{\mathbb {R}}^p} \Vert {\mathbf {x}}\Vert D^{|{\mathbf {s}}|}K({\mathbf {x}}) \hbox {d}{\mathbf {x}} <\infty ,~ \text{ for } ~s=0,1,\ldots \)

    4. (iv)

      \( \int _{{{\mathbb {R}}}^p} \left( D^{|{\mathbf {s}}|}K\left( {\mathbf {v}}\right) \right) ^{2}\hbox {d}{\mathbf {v}}<\infty .\)

  2. (C.1)
    1. (i)

      The conditional density \(f_{{\mathbf {X}}_i}^{{\mathcal {G}}_{i-1}}(\cdot )\) exists and belongs to the space \({\mathcal {C}}^{|{\mathbf {s}}|}({\mathbb {R}})\), here \({\mathcal {C}}^{|{\mathbf {s}}|}({\mathbb {R}}^{p})\) denotes the space of all continuous real-valued functions that are \(|{\mathbf {s}}|\)-times continuously differentiable on \({\mathbb {R}}^{p}\);

    2. (ii)

      The partial derivative \(D^{|{\mathbf {s}}|} f_{{\mathbf {X}}_i}^{{\mathcal {G}}_{i-1}}(\cdot )\) is continuous and has bounded partial derivatives of order \({\mathbb {k}}\), that is, there exists a constant \(0<{\mathfrak {C}}_1<\infty \) such that

      $$\begin{aligned} \sup _{{\mathbf {x}} \in {\mathbf {J}}}\left| \frac{\partial ^{\mathbb {k}} D^{|{\mathbf {s}}|} f^{{\mathcal {G}}_{i-1}}({\mathbf {x}})}{\partial x^{k_1}_1\ldots \partial x^{k_p}_d}\right| \le {\mathfrak {C}}_1,~~k_1,\ldots ,k_p\ge 0,~~0<k_1+\cdots +k_p={\mathbb {k}}; \end{aligned}$$
  3. (C.2)

    For any \(\mathbf{x} \in {\mathbb {R}}^p\),

    $$\begin{aligned} \underset{n \rightarrow \infty }{\lim } \frac{1}{n} \sum _{i=1}^n f_{{\mathbf {X}}_i}^{{{\mathcal {G}}}_{i-1}}({\mathbf {x}}) = f({\mathbf {x}}), \quad ~~\text{ in } \text{ the } ~~a.s.~~ \text{ and }~~ L^2~~\text{ sense }. \end{aligned}$$
  4. (C.3)
    1. (i)

      The density \(f_{{\mathbf {X}}}(\cdot )\) is continuous and has bounded partial derivatives of order \({\mathbf {r}}\), that is, there exists a constant \(0<{\mathfrak {C}}_2<\infty \) such that

      $$\begin{aligned} \sup _{{\mathbf {x}} \in {\mathbf {J}}}\left| \frac{\partial ^r f_{{\mathbf {X}}}({\mathbf {x}})}{\partial x^{k_1}_1\ldots \partial x^{k_p}_d}\right| \le {\mathfrak {C}}_2,~~k_1,\ldots ,k_p\ge 0,~~0<k_1+\cdots +k_p={\mathbf {r}}; \end{aligned}$$
    2. (ii)

      The density \(f_{{\mathbf {X}},{\mathbf {Y}}}(\cdot ,\cdot )\) is continuous and has bounded partial derivatives of order \(\ell \), that is, there exists a constant \(0<{\mathfrak {C}}_3<\infty \) such that

      $$\begin{aligned} \sup _{{\mathbf {x}} \in {\mathbf {J}}}\left| \frac{\partial ^\ell f_{{\mathbf {X}},{\mathbf {Y}}}({\mathbf {x}},{\mathbf {y}})}{\partial x^{k_1}_1\ldots \partial x^{k_p}_d}\right| \le {\mathfrak {C}}_3,~~k_1,\ldots ,k_p\ge 0,~~0<k_1+\cdots +k_p=\ell ; \end{aligned}$$
  5. (C.4)

    There exists a positive constant \( f_\star <\infty \) such that

    $$\begin{aligned} \sup _{{\mathbf {x}}\in {{\mathbb {R}}}^p}D^{|{\mathbf {s}}|}f_{{\mathbf {X}}_1}^{{\mathcal {G}}_{0}}\left( {\mathbf {x}}\right) \le f_\star , \end{aligned}$$

    holds with probability 1.

  6. (C.5)

    \( \sup _{{\mathbf {x}}} \sum _{i=1}^\infty \left\| {{\mathcal {P}}}_1 D^{|{\mathbf {s}}|} f_{{\mathbf {X}}_i}^{{\mathcal {G}}_{i-1}}({\mathbf {x}}) \right\| ^2 < \infty . \)

  7. (R.1)
    1. (i)

      \({\mathbb {E}}(|\psi ({\mathbf {Y}}_i)|\vert {\mathcal {S}}_{i-1})={\mathbb {E}}(|\psi ({\mathbf {Y}}_r)|\mid {\mathbf {X}}_i)=m({\mathbf {X}}_i,|\psi | )\);

    2. (ii)

      there exist constants \(C_\psi >0\) and \(\beta >0\) such that, for any couple \(({\mathbf {x}},{\mathbf {x}}^\prime )\in {\mathbb {R}}^{2p}\),

      $$\begin{aligned} \left| m({\mathbf {x}},{\psi })-m({\mathbf {x}}^\prime ,{\psi })\right| \le C_\psi \left\| {\mathbf {x}}-{\mathbf {x}}^\prime \right\| ^\beta ; \end{aligned}$$
    3. (iii)

      For any \(k\ge 2\), \({\mathbb {E}}(|\psi ^k({\mathbf {Y}}_i)|\vert {\mathcal {S}}_{i-1})={\mathbb {E}}(|\psi ^k({\mathbf {Y}}_i)|\vert {\mathbf {X}}_i)\), and the function

      $$\begin{aligned} \Psi _k({\mathbf {x}},\psi )={\mathbb {E}}(|\psi ^k({\mathbf {Y}})|\vert {\mathbf {X}}={\mathbf {x}}), \end{aligned}$$

      is continuous in the neighborhood of \({\mathbf {x}}\).

  8. (H)
    1. (i)

      \(h_n \rightarrow 0\), \(nh_{n}^{1+2\left( \frac{|{\mathbf {s}}|}{p}\right) } \rightarrow \infty ;\)

    2. (ii)

      \(h_n \rightarrow 0\), \(\frac{nh^{1+s/p}}{\log n}\rightarrow \infty \).

3.1 Comments on the conditions

Conditions (K.1) are very common in nonparametric function estimation literature. They set some kind of regularity upon the kernels used in our estimates. In particular, by imposing the condition (K.1)(iii), the kernel function exploits the smoothness of the function \(D^{|{\mathbf {s}}|} r(\psi ; {\mathbf {x}})\). Notice that the transformation of the stationary ergodic process \(({\mathbf {X}}_i,{\mathbf {Y}}_i)_{i\ge 1}\) into the process \((\psi ^2({\mathbf {Y}}_i))_{i\ge 1}\) is a measurable function. Therefore, making use of Proposition 4.3 of Krengel (1985) and then the ergodic theorem, we obtain \(\underset{n\rightarrow \infty }{\lim } n^{-1} \sum _{i=1}^n \psi ^2({\mathbf {Y}}_i) = {\mathbb {E}}\left[ \psi ^2({\mathbf {Y}}_1)\right] \) almost surely. Conditions (C.1) and (C.3) impose the needed regularity upon the joint, marginal and the conditional densities to reach the rates of convergence given below. Conditions (C.2) involves the ergodic nature of the data as given, for instance, see Proposition 4.3 and Theorem 4.4 of Krengel (1985) and Delecroix (1987) (Lemma 4 and Corollary 1 together with their proofs). The assumption (C.5) is assumed by Wu (2003) which is satisfied by various processes including linear as well as many nonlinear ones. For more details and examples, see Wu (2003) and Wu et al (2010). We refer also to the recent paper of Wu et al (2010) for more details on conditions (C.4). The conditions (R.1)(i) and (R.1)(iii) is usual in the literature dealing with the study of ergodic processes. The condition (R.1)(ii) is a regularity condition upon the regression function.

Remark 5

Our results remain valid when replacing the condition that the kernel function \(K(\cdot )\) has compact support in (K.1)(i) with another condition (K.1)(i)’ whose content is as follows:

  1. (K.1)(i)’

    There exists a sequence of positive real numbers \(a_{n}\) such that \(a_{n} h_n^d\) tends to zero when n tends to infinity, and

    $$\begin{aligned} \sqrt{n} \int _{\left\{ \Vert {\mathbf {v}}\Vert >a_{n}\right\} }|K({\mathbf {v}})| \mathrm {d} {\mathbf {v}} \rightarrow 0. \end{aligned}$$

3.2 Almost sure uniform consistency rates

In the following theorems, we will give the uniform convergence with rate of \(D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_{n})\) defined in (2).

Theorem 1

Assume that the assumptions H(ii), K(i)–(ii), (K.1)(i)–(iii), (C.1), (C.4), (C.5) and (R.1) are fulfilled. We have, as  \(n\rightarrow \infty \),

$$\begin{aligned} \sup _{{\mathbf {x}}\in {\mathbf {J}}}\left| D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)-{\mathbb {E}}D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)\right| = O_{a.s.}\left( \sqrt{\frac{\log n}{nh^{1+s/p}}}\right) . \end{aligned}$$

Theorem 2

Suppose that the assumptions  H(ii), K(i)–(ii), (K.1)(i)–(iii), (C.1), (C.3)(ii), (C.4), (C.5) and (R.1)  are satisfied. We have, as  \(n\rightarrow \infty \),

$$\begin{aligned} \sup _{{\mathbf {x}}\in {\mathbf {J}}}\left| D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)-D^{|{\mathbf {s}}|} r(\psi ; {\mathbf {x}})\right| = O_{a.s.}\left( \sqrt{\frac{\log n}{nh^{1+s/p}}}\right) +O\left( h_n^{\ell /p}\right) . \end{aligned}$$

3.3 Asymptotic distribution

Let us now state the following theorem, which gives the weak convergence rate of the estimator \(D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_{n})\) defined in (2). Below, we write \(Z {\mathop {=}\limits ^{{\mathcal {D}}}} N(\mu , \sigma ^{2} )\) whenever the random variable Z follows a normal law with expectation \(\mu \) and variance \(\sigma ^{2}\).

Theorem 3

Assume that the conditions  H(i), K(i)–(ii), (K.1), (C.1), (C.2) (C.3), (C.4), (C.5) and (R.1) hold. We have, as  \(n \rightarrow \infty \)

$$\begin{aligned} \sqrt{nh_{n}^{1+2\left( \frac{|{\mathbf {s}}|}{p} \right) }}\left( D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)-{\mathbb {E}}D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)\right) \rightarrow N(0,\sigma _{\psi }^{2}({\mathbf {x}})), \end{aligned}$$

where

$$\begin{aligned} \sigma _{\psi }^{2}({\mathbf {x}})= \Psi _2({\mathbf {x}}) f({\mathbf {x}})\left( \int _{{{\mathbb {R}}}^p} \left( D^{|{\mathbf {s}}|}K\left( {\mathbf {v}}\right) \right) ^{2} \mathrm{d}{\mathbf {v}}\right) , ~ \text{ and } ~ \Psi _2({\mathbf {x}},\psi )={\mathbb {E}}(\psi ^2({\mathbf {Y}})\vert {\mathbf {X}}={\mathbf {x}}). \end{aligned}$$

Theorem 4

Assume that the conditions  H(i), K(i)–(ii), (K.1), (C.1), (C.2) (C.3), (C.4), (C.5) and (R.1) hold. In addition we assume

$$\begin{aligned} n^{1/2} h_n^{(|{\mathbf {s}}|+\ell )/p+1/2}\rightarrow 0\quad \text{ as } \quad n\rightarrow \infty . \end{aligned}$$

Then, we have, as  \(n \rightarrow \infty \)

$$\begin{aligned} \sqrt{nh_{n}^{1+2\left( \frac{|{\mathbf {s}}|}{p} \right) }}\left( D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)-D^{|{\mathbf {s}}|} r(\psi ; {\mathbf {x}})\right) \rightarrow N(0,\sigma _{\psi }^{2}({\mathbf {x}})). \end{aligned}$$

3.4 Asymptotic mean square error

In the following, we will give asymptotic mean integrated squared error (AMISE) of the estimator \(D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)\).

Theorem 5

Assume that the conditions K(i)–(ii), (K.1), (C.1), (C.2), (C.3) (ii), (C.4), (C.5) and  (R.1)(i)–(ii) hold. We have, as  \(n \rightarrow \infty \)

$$\begin{aligned} \mathrm{AMISE}\left( D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)\right)= & {} \int _{{\mathbb {R}}^{p}}\mathrm{Bias}\left\{ D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)\right\} ^{2} \mathrm{d}{\mathbf {x}}+ \int _{{\mathbb {R}}^{p}} \mathrm{Var}\left( D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)\right) \mathrm{d}{\mathbf {x}}\\ {}= & {} O\left( h^{2\ell /p}_{n}\right) +O\left( \frac{1}{nh_{n}^{1+2\frac{|{\mathbf {s}}|}{p}}}\right) . \end{aligned}$$

Remark 6

Keeping in mind the relation (3), one can easily deduce the following results concerning the density function derivative, that is,

$$\begin{aligned}&\sup _{{\mathbf {x}}\in {\mathbf {J}}}\left| D^{|{\mathbf {s}}|}f_{{\mathbf {X}};n}({\mathbf {x}},h_{n})-D^{|{\mathbf {s}}|} f_{{\mathbf {X}}}({\mathbf {x}})\right| = O_{a.s.}\left( \sqrt{\frac{\log n}{nh^{1+s/p}}}\right) +O\left( h_n^{\ell /p}\right) ,\\&\quad \sqrt{nh_{n}^{1+2\left( \frac{|{\mathbf {s}}|}{p} \right) }}\left( D^{|{\mathbf {s}}|}f_{{\mathbf {X}};n}({\mathbf {x}},h_{n})-D^{|{\mathbf {s}}|} f_{{\mathbf {X}}}({\mathbf {x}})\right) \rightarrow N\left( 0,f({\mathbf {x}})\left( \int _{{{\mathbb {R}}}^p} \left( D^{|{\mathbf {s}}|}K\left( {\mathbf {v}}\right) \right) ^{2} \hbox {d}{\mathbf {v}}\right) \right) , \end{aligned}$$

and

$$\begin{aligned} \mathrm{AMISE}\left( D^{|{\mathbf {s}}|}f_{{\mathbf {X}};n}({\mathbf {x}},h_{n})\right)= & {} \int _{{\mathbb {R}}^{p}}\mathrm{Bias}\left\{ D^{|{\mathbf {s}}|}f_{{\mathbf {X}};n}({\mathbf {x}},h_{n})\right\} ^{2} \hbox {d}{\mathbf {x}}+ \int _{{\mathbb {R}}^{p}} \mathrm{Var}\left( D^{|{\mathbf {s}}|}f_{{\mathbf {X}};n}({\mathbf {x}},h_{n})\right) \hbox {d}{\mathbf {x}}\\ {}= & {} O\left( h^{2\ell /p}_{n}\right) +O\left( \frac{1}{nh_{n}^{1+2\frac{|{\mathbf {s}}|}{p}}}\right) . \end{aligned}$$

3.5 Confidence intervals

The asymptotic variance in the central limit theorem depends on the unknown functions, which should be estimated in practice. Let us introduce \({\widehat{\Psi }}_{2,n}({\mathbf {x}},\psi )\) a kernel estimator of \(\Psi _2({\mathbf {x}},\psi )\) defined by

$$\begin{aligned} {\widehat{\Psi }}_{2,n}({\mathbf {x}},\psi ):= \frac{\displaystyle \sum \nolimits _{i =1}^n \psi ^{2}({\mathbf {Y}}_i)K\left( \frac{{\mathbf {x}}-{\mathbf {X}}_{i}}{h_n^{1/p}} \right) }{\displaystyle \sum \nolimits ^n_{i =1} K\left( \frac{{\mathbf {x}}-{\mathbf {X}}_{i}}{h_n^{1/p}}\right) }. \end{aligned}$$

This permits to estimate asymptotic variance \(\sigma _{\psi }^{2}({\mathbf {x}})\) by

$$\begin{aligned} {\widehat{\sigma }}_{\psi }^{2}({\mathbf {x}})= {\widehat{\Psi }}_{2,n}({\mathbf {x}},\psi ) f_{{\mathbf {X}};n}({\mathbf {x}},h_{n})\left( \int _{{{\mathbb {R}}}^p} \left( D^{|{\mathbf {s}}|}K\left( {\mathbf {v}}\right) \right) ^{2} \hbox {d}{\mathbf {v}}\right) . \end{aligned}$$

Furthermore, from Theorem 4, the approximate confidence interval of \(D^{|{\mathbf {s}}|} r(\psi ; {\mathbf {x}})\) can be obtained as

$$\begin{aligned} D^{|{\mathbf {s}}|} r(\psi ; {\mathbf {x}})\in \left[ D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n) \pm c_\alpha \frac{{\widehat{\sigma }}_{\psi }({\mathbf {x}})}{\sqrt{nh_{n}^{1+2\left( \frac{|{\mathbf {s}}|}{p} \right) }}}\right] , \end{aligned}$$

where \(c_\alpha \), denotes the \((1-\alpha )-\)quantile of the normal distribution.

Remark 7

An alternative approach based on resampling techniques might be used to estimate confidence intervals. In contrast to the asymptotic confidence intervals, the main advantage of such approach is that avoids the estimation of the variance of estimators. Below, we give a brief description of the bootstrap-based confidence intervals approach. Let \(\{Z_{i}\}\) be a sequence of random variables satisfying the following assumption:

  1. B.

    The \(\{Z_{i}\}\) are independent and identically distributed, with distribution function \(P_{Z}\), mean zero and variance 1.

We assume that the bootstrap weights \(Z_{i}\)’s are independent from the data \((X_i,Y_i)\), \(i=1,\ldots ,n\). Define,

$$\begin{aligned} D^{|{\mathbf {s}}|}r_{n}^*(\psi ;{\mathbf {x}},h_{n})= & {} \frac{1}{nh_{n}^{1+|{\mathbf {s}}|/p}}\sum _{i=1}^{n}Z_i\psi ({\mathbf {Y}}_{i})D^{|{\mathbf {s}}|}K\left( \frac{{\mathbf {x}}-{\mathbf {X}}_{i}}{h_{n}^{1/p}}\right) . \end{aligned}$$

Let

$$\begin{aligned} \alpha _n^*=\sqrt{nh_{n}^{1+2\left( \frac{|{\mathbf {s}}|}{p} \right) }}\left( D^{|{\mathbf {s}}|}r_{n}^*(\psi ;{\mathbf {x}},h_{n})- D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)\right) . \end{aligned}$$

Let \({\mathfrak {N}}\), be a large integer and \(Z_{1}^{k},\ldots ,Z_{n}^{k}\), \(k=1,\ldots ,{\mathfrak {N}}\) independent copies of Z. Let \(\alpha _n^{*(k)}\) be the bootstrapped copies of \(\alpha _n^*\). In order to approximate \(c_\alpha \), one can use the sampling estimator \({\widehat{c}}_\alpha \), of \(c_\alpha \), as the smallest \(z\ge 0\) such that \( \frac{1}{{\mathfrak {N}}}\sum _{k=1}^{{\mathfrak {N}}}\mathbbm {1}_{\left\{ \alpha _n^{*(k)}\le z\right\} } \ge 1-\alpha . \)

4 Application to the regression derivatives

In this section, we will follow the same notation as in Deheuvels and Mason (2004). We will consider especially the conditional expectation of \(\psi (Y)\) given \(X=x\), for \(p=q=1\). Recall that

$$\begin{aligned} m(x, \psi )={\mathbb {E}}(\psi (Y)\mid X=x)=\frac{1}{f_{X}(x)}\int _{{\mathbb {R}}} \psi (y)f_{X,Y}(x,y)\hbox {d}y=\frac{\displaystyle r(\psi , x)}{\displaystyle f_{X}(x)}. \end{aligned}$$

The kernel estimator is given by

$$\begin{aligned} m_{n}(x,\psi )=\left\{ \begin{array}{lcc} \frac{ \displaystyle r_{n}(\psi ;x,h_{n})}{\displaystyle f_{X;n}({ x},h_{n})}&{} \text{ if } &{} \displaystyle f_{X;n}({ x},h_{n})\ne 0,\\ &{}&{}\\ \displaystyle \frac{1}{n}\sum _{i=1}^{n}\psi (Y_{i}) &{} \text{ if } &{}\displaystyle f_{X;n}({ x},h_{n})=0. \end{array}\right. \end{aligned}$$

Recall the following derivatives

$$\begin{aligned} m_{\psi }^{{\prime }}(x)=\frac{\displaystyle r^{\prime }(\psi , x)}{\displaystyle f_{X}(x)}-\frac{\displaystyle r(\psi , x) f_{X}^{\prime }(x)}{\displaystyle f_{X}^{2}(x)}, \end{aligned}$$
(5)

and

$$\begin{aligned} m_{\psi }^{{\prime \prime }}(x)= & {} \frac{\displaystyle r^{\prime \prime }(\psi , x)}{\displaystyle f_{X}(x)}-\frac{\displaystyle 2 r^{\prime }(\psi , x)f_{X}^{\prime }(x)}{\displaystyle f_{X}^{2}(x)}+\frac{\displaystyle r(\psi , x)\{2(f^{\prime }_{X}(x))^2-f_{X}(x)f_{X}^{\prime \prime }(x)\}}{\displaystyle f_{X}^{3}(x)}. \end{aligned}$$

In order to estimate the derivatives of \(m_{\psi }^{{\prime }}(x)\) in (5) and \(m_{\psi }^{{\prime \prime }}(x)\) in (6) by replacing \(f_{X}(\cdot ),\) \(f_{X}^{\prime }(\cdot )\), \(f_{X}^{\prime \prime }(\cdot )\), \(r(\psi ;\cdot )\), \(r^{\prime }(\psi ;\cdot )\) and \(r^{\prime \prime }(\psi ;\cdot )\) by \(f_{X;n}(\cdot ,h_{n})\), \(f_{X;n}^{\prime }(\cdot ,h_{n})\), \(f_{X;n}^{\prime \prime }(\cdot ,h_{n})\), \(r_{n}(\psi ;\cdot ,h_{n})\), \(r_{n}^{\prime }(\psi ;\cdot ,h_{n})\) and \(r_{n}^{\prime \prime }(\psi ;\cdot ,h_{n})\). We so define \(m_{\psi ,n}^{{\prime }}(x;h_{n})\) and \(m_{\psi ,n}^{{\prime \prime }}(x;h_{n})\) when \(f_{X;n}(x,h_{n})\ne 0\). The definition of \(m_{\psi ,n}^{{\prime }}(x;h_{n})\) and \(m_{\psi ,n}^{{\prime \prime }}(x;h_{n})\) is completed by setting \(m_{\psi ,n}^{{\prime }}(x;h_{n})=m_{\psi ,n}^{{\prime \prime }}(x;h_{n})=0\) when \(f_{X;n}(x,h_{n})=0\).

The following theorem is more or less a straightforward consequence of Theorem 2.

Corollary 1

Under the assumptions of Theorem 2, we have

$$\begin{aligned} \sup _{x\in J}\left| m_{\psi ,n}^{{\prime }}(x;h_{n})-m_{\psi }^{{\prime }}(x)\right| =o_{{\mathbb {P}}}(1),\\ \sup _{x\in J}\left| m_{\psi ,n}^{{\prime \prime }}(x;h_{n})-m_{\psi }^{{\prime \prime }}(x)\right| =o_{{\mathbb {P}}}(1). \end{aligned}$$

Remark 8

We note that, when \(|{\mathbf {s}}|\ge 2\), \(m^{(|{\mathbf {s}}|)}_{\psi ,n}({x}, h_{n})=D^{|{\mathbf {s}}|}(m_{\psi ,n}({{ x}}, h_{n}))\) may be obtained likewise through the usual Leibniz expansion of derivatives of products given by

$$\begin{aligned} m^{(|{\mathbf {s}}|)}_{\psi ,n}({{ x}}, h_{n})=\sum _{|{\mathbf {j}}|=0}^{|{\mathbf {s}}|}C_{|{\mathbf {s}}|}^{|{\mathbf {j}}|}r^{(|{\mathbf {j}}|)}_{n}(\psi ;{{ x}}, h_{n})\left\{ f_{{X};n}^{-1}({{ x}}, h_{n})\right\} ^{(|{\mathbf {s}}|-|{\mathbf {j}}|)},\;\;\;f_{{{ X}};n}({{ x}}, h_{n})\ne 0. \end{aligned}$$

5 Simulation study

The first part of this section investigates the estimation of the first derivative of the density function whenever X is, respectively, a unidimensional and bidimensional stochastic process. Then, we focus on the study of the estimation of the first derivative of the regression function when the data is generated according to a specific stochastic regression model. Motivated by the extension of numerical results obtained in Blanke and Pumo (2003) and Chaouch and Laïb (2019), we suppose in the sequel that X is an Ornstein-Uhlenbeck (OU) process (it can be unidimensional or bidimensional as will be discussed below) solution of the following stochastic differential equation (SDE):

$$\begin{aligned} \hbox {d}X_t = -a\,X_t\, \hbox {d}t + b\,\hbox {d}W_t, \quad \text{ for }\quad a>0, b>0, \end{aligned}$$
(6)

where \((W_t)_{t\ge 0}\) is a standard Wiener process. Thus, for \(0\le t\le T\), the solution of the SDE given in (6) can be expressed as \(X_t = e^{-at}X_0 + b\int _0^t e^{-a(t-s)} \hbox {d}W_s\), where \(X_0\sim N(0,1)\) independent of W. In the sequel, and following Blanke and Pumo (2003), we consider \(b=\sqrt{2}\) and \(a=1\), since then \(X_t\) has a density of a N(0, 1). As one can observe the OU process \(\{X_t; 0\le t\le T\}\) is a continuous time process used to model the dynamic of several random variable. For instance OU processes are widely used in finance to model and predict asset prices. In real life the process \(X_t\) cannot be observed at any time between [0, T], it is rather observed at a specific grid, say \(0=\tau _0< \tau _1< \dots <\tau _n=T\), of time representing a discretization of the interval [0, T]. Therefore, the simulation of an OU process can be achieved by considering, for instance, the iterative Euler–Maruyama scheme, see for instance Kloeden and Platen (1992), which allows to build an approximate solution \(\{ {\widetilde{X}}_t; t=\tau _0, \tau _1, \dots , \tau _n\}\) of the original process \(\{X_t, 0\le t\le T \}.\) The discretized version of the above SDE in (6) is given as follows:

$$\begin{aligned} {\widetilde{X}}_{\tau _{j+1}} = {\widetilde{X}}_{\tau _{j}} - {\widetilde{X}}_{\tau _{j}}\left( \tau _{j+1}-\tau _j \right) + \sqrt{2}\left( W_{\tau _{j+1}}-W_{\tau _j} \right) , \quad \quad j=0, \dots , n. \end{aligned}$$
(7)

In this simulation, study a deterministic equidistant discretization scheme, i.e., \(\tau _{j+1}-\tau _j=T/n=:\delta _n\), called the sampling mesh, is considered. Figure 1 displays an example of OU sample path for \(n=105\) and \(\delta _n=0.4\). Notice that, as discussed in Blanke and Pumo (2003), the sampling mesh \(\delta _n\) plays an important role in the estimation of the density function on an OU process. Indeed Blanke and Pumo (2003) discussed, theoretically as well as via simulations, the selection of the optimal mesh that minimizes the mean integrated square error (MISE). Chaouch and Laïb (2019) discussed the selection of the sampling mesh for the estimation of the regression function when the response variable is affected by a missing at random phenomena. In this section, we are interested in extending the numerical results obtained by Blanke and Pumo (2003) and Chaouch and Laïb (2019) to the first derivative of the density and the regression function, respectively. More precisely, we will discuss the numerical selection of the optimal mesh, say \(\delta _n^\star \), which allows to obtain a consistent (in the sense of minimizing the MISE) estimator of the first derivative of the density and the regression function. This section contains two parts: in the first one we study the optimal selection of the sampling mesh of the first derivative of the density function of a one-dimensional OU process then we extend the study to the bidimensional OU processes. The second part of the simulation deals with the first derivative of the univariate regression function.

5.1 Estimation of the first derivative of density function

5.2 Case of the one-dimensional discretized diffusion processes

Now, we consider that \(X_t\) is a one-dimensional OU process (see Fig. 1 for an example of a sample path obtained with Euler–Maruyama discretization scheme). As discussed in Blanke and Pumo (2003), the OU process \((X_t)_{t\ge 0}\), as a solution of the SDE (6), has a Gaussian density function N(0, 1). Our purpose in this subsection is to find the optimal (in terms of minimizing the Mean Integrated Square Error, MISE) sampling mesh needed to accurately estimate the first derivative of the density function of \(X_t.\) For this, we consider a sequence of sampling mesh \(\delta \), a grid of 50 values of x taken between \([-4,4]\) where the density is locally estimated. Based on \(N=1000\) independent replications, we define the MISE as follows:

$$\begin{aligned} \text {MISE}(\delta ) := \dfrac{1}{N}\sum _{k=1}^N \int \left( f_{X,n,k}^\prime (x, \delta ) - f_X^\prime (x) \right) ^2 \hbox {d}x, \end{aligned}$$

where \(\delta :=\delta _n =\tau _{j+1}-\tau _j\), \(n=105\) and \(f_{X,n,k}^\prime (x, \delta )\) is the estimator of first derivative of the density function \(f^\prime _X(x)\) at the point x in the grid obtained with the kth simulated sample path with a specific mesh \(\delta .\)

Fig. 1
figure 1

An example of a univariate OU sample path where \(\delta _n=0.4\) and \(n=105\)

To estimate nonparametrically the first derivative of density function of the OU process \(X_t\), we consider as a kernel the Gaussian density function and the cross-validation technique is used as a tool to select the optimal bandwidth that is:

$$\begin{aligned} h_{opt} = \arg \min _{h}\sum _{x\in {{{\mathcal {S}}}}}\left( f_{X,n,h}^\prime (x, \delta ) - f_X^\prime (x) \right) ^2, \end{aligned}$$
(8)

where \({{{\mathcal {S}}}}\) is a grid of randomly fixed values of x where the first derivative of the density is estimated. Fig. 2 displays the evolution of the MISE for different values of \(\delta \). One can observe that the optimal sampling mesh minimizing the MISE is \(\delta ^\star = 0.4.\) In other words, in practice, one should sample the underlying OU process with a frequency 0.4 to obtain an estimate of the first derivative of the density with a minimum MISE. Moreover, Fig. 2 tells that sampling the OU process with a frequency less than 0.4 will lead to an inaccurate estimate of \(f^\prime _X(x)\) because of the high correlation between the observations \({\widetilde{X}}_{\tau _j+1}\) and \({\widetilde{X}}_{\tau _j}\), for \(j=0, \dots , n\). Whereas sampling the underlying process \(X_t\) with a frequency higher than 0.4 will not improve the quality of the estimate since one can see that the MISE becomes stable at some level. One notices that the discussion made about the interpretation of the MISE plot remains valid for the similar graphs in this paper.

Figure 3a displays the shape the density function of the OU process X and Fig. 3b shows the true first derivative of the density function as well as its nonparametric estimate based on the optimal sampling mesh \(\delta ^\star = 0.4.\) It is worth noting that this first simulation study generalizes some of the results obtained by Blanke and Pumo (2003) to the case of the first derivative of the univariate density function. The following simulation aims to extend the results to the bivariate case.

Fig. 2
figure 2

MISE(\(\delta \)) for a one-dimensional OU process where \(n=105\)

Fig. 3
figure 3

a the Gaussian density of an OU process. b Dark bold line displays the first derivative of the density function and dotted line its estimation

5.3 Case of the two-dimensional discretized diffusion processes

In this simulation, we are interested in studying the estimation of the first derivative of the density function of a bidimensional OU process \(\mathbf{X}_t:= (X_{1,t}, X_{2,t}),\) where \(X_1\) and \(X_2\) are generated independently. Following the same description made above \(X_{1,t}\) and \(X_{2,t}\) are solutions of the SDE (6) and numerically simulated according to the discretization scheme given in (7). An example of simulated sample path of the vector of OU processes \(\mathbf{X}\) is displayed in Fig. 4. Because of the independence between \(X_{1,t}\) and \(X_{2,t}\) the density function of \(\mathbf{X}\) will be the product of the marginals which are both Gaussian. Therefore

$$\begin{aligned} f_\mathbf{X}(x_1,x_2) = \dfrac{1}{2\pi } \exp (-(x_1^2+x^2_2)/2), ~\text{ for }~(x_{1},x_{2})\in {\mathbb {R}}^{2}. \end{aligned}$$

Figure 5 displays the joint distribution of the pair \((X_1, X_2).\) The true first derivative of the joint density function is given as follows:

$$\begin{aligned} f^{\prime \prime }_{X_1X_2}(x_1,x_2):=\dfrac{\partial ^2 f_\mathbf{X}}{\partial x_1 \partial x_2}(x_1,x_2) = \dfrac{x_1x_2}{2\pi } \exp (-(x_1^2+x^2_2)/2), ~\text{ for }~(x_{1},x_{2})\in {\mathbb {R}}^{2}. \end{aligned}$$

Figure 6a displays the shape of \(\partial ^2 f_\mathbf{X}/\partial x_1 \partial x_2.\) The kernel considered in the formula of the nonparametric estimator (2) is a product kernel where

$$\begin{aligned} K\left( \dfrac{\mathbf{x} - \mathbf{X}}{h_n^{1/2}}\right) = K\left( \dfrac{x_1 - X_1}{h_n^{1/2}} \right) \times K\left( \dfrac{x_2 - X_2}{h_n^{1/2}} \right) . \end{aligned}$$

For simplicity and without lack of generalization, the same kernel (Gaussian in this case) and the same bandwidth are considered for \(X_1\) and \(X_2.\) The bandwidth is selected according to the cross-validation criterion given in (8) and adapted to the two dimensional case, that is

$$\begin{aligned} h_{opt} = \arg \min _{h}\sum _{x_1\in {{{\mathcal {S}}}}}\sum _{x_2\in {{{\mathcal {S}}}}}\left( f_{X_1X_2,n,h}^{\prime \prime }(x_1,x_2, \delta ) - f_{X_1X_2}^{\prime \prime }(x_1,x_2) \right) ^2. \end{aligned}$$
(9)

Moreover, the selection of the optimal sampling mesh is based on the following definition of the MISE:

$$\begin{aligned} \text {MISE}(\delta ) := \dfrac{1}{N}\sum _{k=1}^N \int \int \left( f_{X_1X_2,n,k}^{\prime \prime }(x_1,x_2, \delta ) - f_{X_1X_2}^{\prime \prime }(x_1,x_2) \right) ^2 \hbox {d}x_1\hbox {d}x_2. \end{aligned}$$

Figure 6 displays the evolution of the MISE as a function of \(\delta \) and one can observe that the optimal sampling mesh is \(\delta ^\star = 0.074.\) Compared to the estimation of the first derivative of the density function for a one-dimensional OU, high frequency sampling of the bidimensional OU process is required to perform a consistent estimate of the first derivative of the joint density function \(f_{X_1X_2}^{\prime \prime }(x_1,x_2)\). Figure 7b displays the nonparametric estimate of \(f_{X_1X_2}^{\prime \prime }(x_1,x_2)\) based on the obtained optimal mesh \(\delta ^\star = 0.074.\) Moreover, Fig. 7c (resp. (d)) shows the estimation of the first derivative of the marginal of \(X_1\) (resp. \(X_2\)).

Fig. 4
figure 4

An example of a bidimensional OU path

Fig. 5
figure 5

The joint density function of a bidimensional OU process

Fig. 6
figure 6

MISE(\(\delta \)) for bidimensional OU processes with \(n=105\)

Fig. 7
figure 7

a The true first derivative of the joint density function \(f^\prime (x_1,x_2)\). b The estimate of the first derivative of the joint density function \(f_n^\prime (x_1,x_2)\). c the solid line for \(x_1 \rightarrow f^\prime (x_1,x_2)\) versus the estimator \(x_1\rightarrow f_n^\prime (x_1,x_2)\) in red dotted line for a fixed \(x_2\). d the solid line for \(x_2 \rightarrow f^\prime (x_1,x_2)\) versus the estimator \(x_2\rightarrow f_n^\prime (x_1,x_2)\) in red dotted line for a fixed \(x_1\) (color figure online)

5.4 Estimation of the first derivative of the regression function

In this subsection, we are interested in the estimation of the first derivative of the regression function. For this, let us consider \(X_t\) an OU process solution of equation (6) and numerically generated as per equation (7). We also suppose that \(\psi (Y_t) = Y_{t}\), where the responses \(Y_t\) are generated following regression model: \(Y_{\tau _j} = m(X_{\tau _j}) + \epsilon _{\tau _j}, \; j=0,1,\dots , n,\) where \(\epsilon \)’s are generated from a standard normal distribution and \(m(x):= \dfrac{1}{1+x^2}.\) As discussed in (5), the true first derivative of the regression function is define as:

$$\begin{aligned} m^\prime (x) = \dfrac{r^\prime (x)}{f(x)} - \dfrac{r(x) f^\prime (x)}{f^2(x)}. \end{aligned}$$
(10)

A natural estimator of \(m^\prime (x)\), say \(m_n^\prime (x)\), can be defined by plugging-in the above formula (10) \(r^\prime _n(x), f_n^\prime (x), f_n(x)\) and \(r_n(x)\) nonparametric estimators of \(r^\prime (x), f^\prime (x), f(x)\) and r(x), respectively. The calculation of \(r^\prime _n(x), f_n^\prime (x), f_n(x)\) and \(r_n(x)\) can be obtained as described in Sect. 2. One can easily notice that the case of estimating the first derivative if the regression function is more complicated than the estimation of the first derivative of univariate or bivariate density function. Indeed, in the last case we have to select only one bandwidth (or two in the bivariate case), whereas four different bandwidths should be selected for the estimation of the first derivative of the regression function. This makes the estimation task harder. In this simulation a separate cross-validation technique is used to select the optimal bandwidth for \(r^\prime _n(x), f_n^\prime (x), f_n(x)\) and \(r_n(x)\).

Remark 9

Another approach of selecting a global bandwidth for \(m_n^\prime (x)\) could be obtained by considering the following cross-validation criterion:

$$\begin{aligned} h_{opt} = \arg \min _{h} \sum _{x\in {{{\mathcal {S}}}}} \left( m_{n,h}^\prime (x) - m^\prime (x) \right) ^2. \end{aligned}$$
(11)

In contrast to the estimation of the regression function, where the cross-validation criterion is expressed as a function of the observed values of the response variable \(Y_i\) and the estimator of the regression function \(m_{n,h}(X_i)\), the true value of the gradient is typically not observed. this makes the problem of bandwidth selection more difficult. Rice (1986) suggested the use of a differencing operator and a criterion which was shown to be a nearly unbiased estimator of the Mean Integrated Square Error (MISE) between the estimated derivative and the oracle. Müller et al. (1987) used Rice’s noise-corrupted suggestion to select the bandwidth based on the natural extension of the least squares cross-validation. More recently, Henderson et al. (2015) generalized the previous approaches to the multivariate setting where local polynomial estimator was used.

Figure 8 shows that the optimal (in the sense of minimizing the mean squared error) sampling mesh for the estimator of the first derivative of the regression function is \(\delta ^\star = 0.64\) and the corresponding \(\text {MISE}(\delta )\) for the first derivative regression is defined as follows:

$$\begin{aligned} \text {MISE}(\delta ) := \dfrac{1}{N}\sum _{k=1}^N \int \left( m_{n,k}^\prime (x, \delta ) - m^\prime (x) \right) ^2 \hbox {d}x. \end{aligned}$$

Figure 9a displays the shape of the regression function, whereas Fig. 9b shows the true first derivative as well as its estimate based on the obtained optimal sampling mesh.

Fig. 8
figure 8

MISE(\(\delta \)) for the first derivative of the regression function with \(n=105\)

Fig. 9
figure 9

a The true regression function m(x). b Dark bold line displays the first derivative of the regression function and dotted line its estimation

6 Application to real data

In this section, we are interested in illustrating the estimation methodology on real data. For this one considers two asset prices which are the oil price (WTI) and the gold price. Figure 10 displays the daily time series of those asset prices from 02/01/1986 to 28/02/2018. One can observe a high correlation between the price of oil and the price of gold which is translated by a correlation coefficient equal to 0.8. It is well known that in most of the financial market analysis one can be interested in the log-return of the asset price rather that the price itself. For this reason we consider:

$$\begin{aligned} X_{1,t} = \ln (\text {oil}_t) - \ln (\text {oil}_{t-1})\quad \quad \text {and}\quad \quad X_{2,t} = \ln (\text {gold}_t) - \ln (\text {gold}_{t-1}), \quad \text {for}\quad t=2, \dots , n, \end{aligned}$$

where n is the number of days from 02/01/1986 to 28/02/2018. Observe that the log-return processes of oil and gold are stationary.

We are interested in estimating the first derivative of the density function of oil and gold separately. Then, one considers the estimation of their joint density function. In this application to real data section, we consider a Gaussian kernel and select the bandwidth according to the cross-validation criterion, as tuning parameters. Figure 11a shows the kernel-type estimation of the density function of \(X_{1,t}\) and Fig. 11b displays the estimation of its first derivative. Similarly, Fig. 12a, b correspond to the nonparametric estimate of the density function of \(X_{2,t}\) and its first derivative respectively. Finally, one considers the pair of log-return of oil and gold \((X_{1,t}, X_{2,t})\) and we are interested in the nonparametric estimation of the first derivative of the joint density function. Figure 13 displays the shape of the first derivative, with respect to \(x_1\) and \(x_2\), of the joint pdf of log-return of oil and gold prices. One can observe that the derivative of the joint density is positive high values of oil and gold log-returns or whenever the log-return of oil is around zero and the log-return of gold is negative. In the contrary, the first derivative of the joint pdf is negative for negative log-returns of oil and gold or oil log-return is null and gold log-return is positive. Otherwise, the first derivative of the joint pdf is around zero.

Fig. 10
figure 10

Oil and Gold prices

Fig. 11
figure 11

a The estimate of density function of the log-return of oil price. b The estimate of the first derivative of its density function. The red dotted line corresponds to the y-coordinate equal zero (color figure online)

Fig. 12
figure 12

a The estimate of density function of the log-return of gold price. b The estimate of the first derivative of its density function. The red dotted line corresponds to the y-coordinate equal zero (color figure online)

Fig. 13
figure 13

The kernel-type estimate of the first derivative of the joint density function \(f^{\prime \prime }_{x_1x_2}(x_1,x_2).\)

7 Concluding remarks

In the present paper, we have considered kernel type derivative estimators. We have extended and completed the existing work by relaxing the dependence assumption by assuming only the ergodicity of the process. We have obtained the almost sure convergence rate that is close the i.i.d. framework. We have established the limiting distribution of the proposed estimators. An application concerning the regression derivatives is discussed theoretically as well as numerically. It would be interesting to extend our work to the case of censored data, which requires non trivial mathematics, this would go well beyond the scope of the present paper. Another direction of research is to enrich our results by considering the uniformity in terms of the bandwidth, that is an important question arising in some practical applications.

8 Mathematical developments

This section is devoted to the proofs of our results. The previously presented notation continues to be used in the following. The following technical lemma will be instrumental in the proof of our theorems.

Lemma 1

Let \((Z_n)_{n\ge 1}\) be a sequence of real martingale differences with respect to the sequence of \(\sigma \)-fields  \(({{{\mathcal {F}}}}_n=\sigma (Z_1,\ldots ,Z_n))_{n\ge 1}\), where  \(\sigma (Z_1,\ldots ,Z_n)\) is the \(\sigma \)-field generated by the random variables \(Z_1,\ldots ,Z_n\). Set  \(S_n=\sum _{i=1}^nZ_i\). For any  \(\nu \ge 2\) and any \(n\ge 1\), assume that there exist some nonnegative constants C and \(d_n\) such that \( {\mathbb {E}}\left( |Z_n|^\nu |{{{\mathcal {F}}}}_{n-1}\right) \le C^{\nu -2}\nu !d_n^2,\ \ \hbox {almost surely}\). Then, for any  \(\epsilon >0\), we have

$$\begin{aligned} {\mathbb {P}}\left( |S_n|>\epsilon \right) \le 2\exp \left\{ -\frac{\epsilon ^2}{2(D_n+C\epsilon )}\right\} , \end{aligned}$$

where  \(D_n=\sum _{i=1}^nd_i^2\).

The proof follows as a particular case of Theorem 8.2.2 due to de la Peña and Giné (1999).

Lemma 2

For any \({\mathbf {x}}\in {{\mathbb {R}}}^p\), we let

$$\begin{aligned} H_n({\mathbf {x}})=\sum _{i=1}^n D^{|{\mathbf {s}}|}f_{{\mathbf {X}}_i}^{ {\mathcal {G}}_{i-1}}({\mathbf {x}})-n D^{|{\mathbf {s}}|}f({\mathbf {x}}). \end{aligned}$$

Under assumptions (C.4) and (C.5), we have

$$\begin{aligned} \sup _{{\mathbf {x}}\in {{\mathbb {R}}}^p} \Vert H_n({\mathbf {x}})\Vert ^2= O(n). \end{aligned}$$

Proof

Following Wu (2003) and Wu et al (2010), and making use of the Cauchy-Schwarz inequality, one obtains

$$\begin{aligned} \left\| H_n({\mathbf {x}}) \right\| ^2 \le n\left( \sum _{i=1}^n \left\| {{\mathcal {P}}}_1 D^{|{\mathbf {s}}|}f_{{\mathbf {X}}_i}^{ {\mathcal {G}}_{i-1}}({\mathbf {x}}) \right\| ^2\right) . \end{aligned}$$

Making use of the assumption (C.5), one infer that

$$\begin{aligned} \underset{{\mathbf {x}}\in {\mathbb {R}}^{p}}{\sup } \left\| H_n({\mathbf {x}}) \right\| ^2\le n \left( \underset{{\mathbf {x}}\in {\mathbb {R}}^{p}}{\sup } \sum _{i=1}^\infty \left\| {{\mathcal {P}}}_k D^{|{\mathbf {s}}|}f_{{\mathbf {X}}_i}^{ {\mathcal {G}}_{i-1}}({\mathbf {x}}) \right\| \right) =O\left( n\right) . \end{aligned}$$

Hence the proof is complete. \(\square \)

Proposition 1

Under the assumptions  (K.1)(i)–(ii), (C.1), (R.1), (R.2), we have

$$\begin{aligned} \sup _{{\mathbf {x}}\in {\mathbf {J}}}\left| D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)-{\mathbb {E}}D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)\right| = O\left( \sqrt{\frac{\log n}{nh^{1+s/p}}}\right) . \end{aligned}$$
(12)

8.1 Proof of Proposition  1.

Let us introduce the following notation

$$\begin{aligned} {\widetilde{D}}^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)= \frac{1}{nh_{n}^{1+|{\mathbf {s}}|/p}}\sum _{i=1}^{n}{\mathbb {E}}\left[ \psi ({\mathbf {Y}}_{i})D^{|{\mathbf {s}}|}K\left( \frac{{\mathbf {x}}-{\mathbf {X}}_{i}}{h_{n}^{1/p}}\right) \mid {\mathcal {G}}_{i-1}\right] . \end{aligned}$$

We next consider the following decomposition

$$\begin{aligned}&\sup _{{\mathbf {x}}\in {\mathbf {J}}}\left| D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)-{\mathbb {E}}\left( D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)\right) \right| \\&\le \sup _{{\mathbf {x}}\in {\mathbf {J}}}\left| D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)-{\widetilde{D}}^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)\right| \\&\quad +\sup _{{\mathbf {x}}\in {\mathbf {J}}}\left| {\widetilde{D}}^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)-{\mathbb {E}}D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)\right| = D_{n,1}+D_{n,2}. \end{aligned}$$

Let \(\{{\mathbf {x}}_k,k=1,\ldots ,l\} \subset {\mathbf {J}}\). Consider a the partition \( \{{\mathcal {S}}_k\}_{1\le k\le \ell }\) of the compact set \({\mathbf {J}}\) by a finite number l of spheres \({\mathcal {S}}_k\) centered upon by \({\mathbf {x}}_k\), with radius, for a positive constant a, \(\mathbf{r} =a\left( \frac{h_n^{1/p}}{n}\right) ^{1/\gamma }.\) We have then \(\displaystyle {\mathbf {J}}\subset \bigcup \nolimits _{k=1}^{l}{\mathcal {S}}_k.\) We readily infer that

$$\begin{aligned} D_{n,1}= & {} \sup _{{\mathbf {x}}\in {\mathbf {J}}}\left| D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)-{\widetilde{D}}^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)\right| \nonumber \\\le & {} \underset{1\le k\le l}{\max }\underset{{\mathbf {x}} \in {\mathcal {S}}_k}{\sup } \left| D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)- D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}}_k,h)\right| \nonumber \\&+ \underset{1\le k\le l}{\max }\left| D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}}_k,h)-{\widetilde{D}}^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}}_k,h)\right| \nonumber \\&+ \underset{1\le k\le l}{\max } \underset{{\mathbf {x}} \in {\mathcal {S}}_k}{\sup } \left| {\widetilde{D}}^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}}_k,h)-{\widetilde{D}}^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h)\right| \nonumber \\= & {} D_{n,1,1}+ D_{n,1,2} + D_{n,1,3}. \end{aligned}$$
(13)

Consider the first term of (13). Making use of the Cauchy-Schwarz inequality we readily obtain

$$\begin{aligned}&\left| D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)- D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}}_k,h)\right| \\&\quad = \left| \frac{1}{nh_{n}^{1+|{\mathbf {s}}|/p}}\sum _{i=1}^{n}\psi ({\mathbf {Y}}_{i})\left( D^{|{\mathbf {s}}|}K\left( \frac{{\mathbf {x}}-{\mathbf {X}}_{i}}{h_{n}^{1/p}}\right) -D^{|{\mathbf {s}}|}K\left( \frac{{\mathbf {x}}_k-{\mathbf {X}}_{i}}{h_{n}^{1/p}}\right) \right) \right| \\&\quad = \frac{1}{\sqrt{n} h_{n}^{1+|{\mathbf {s}}|/p}} \left( \frac{1}{n} \sum _{i=1}^{n} \psi ^2({\mathbf {Y}}_i) \right) ^{1/2} \left( \sum _{i=1}^{n} \left( D^{|{\mathbf {s}}|} K\left( \frac{{\mathbf {x}}-{\mathbf {X}}_{i}}{h_{n}^{1/p}}\right) -D^{|{\mathbf {s}}|} K\left( \frac{{\mathbf {x}}_k-{\mathbf {X}}_{i}}{h_{n}^{1/p}}\right) \right) ^{2} \hbox {d}t \right) ^{1/2}. \end{aligned}$$

Keeping in mind the condition (K.1)(ii), we obtain that, almost surely,

$$\begin{aligned}&\left| D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)- D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}}_k,h)\right| \\&\quad \le \frac{1}{\sqrt{n} h_{n}^{1+|{\mathbf {s}}|/p}} \left( {\mathbb {E}}\left[ \psi ^2({\mathbf {Y}}_1)\right] \right) ^{1/2} \left( C_{K,s}^2\sum _{i=1}^n \frac{\Vert {\mathbf {x}}-{\mathbf {x}}_k \Vert ^{2\gamma }}{h_{n}^{2/p}}\right) ^{1/2}\\&\quad \le \frac{C_{K,s}}{\sqrt{n} h_{n}^{1+|{\mathbf {s}}|/p}} \left( {\mathbb {E}}\left[ \psi ^2({\mathbf {Y}}_1)\right] \right) ^{1/2} \left( \frac{n\left( a\left( h_n^{1/p}/n\right) ^{1/\gamma }\right) ^{2\gamma }}{h_{n}^{2/p}}\right) ^{1/2}\\&\quad \le \frac{a^\gamma C_{K,s}}{nh_{n}^{1+|{\mathbf {s}}|/p}} \left( {\mathbb {E}}\left[ \psi ^2({\mathbf {Y}}_1)\right] \right) ^{1/2}. \end{aligned}$$

Therefore, by considering the following choice \(\epsilon _n=\left( \dfrac{\log n}{nh_{n}^{1+|{\mathbf {s}}|/p}}\right) ^{1/2}\), we have

$$\begin{aligned} \epsilon _n^{-1} D_{n,1,1}= O_{a.s}\left( \frac{1}{nh_{n}^{1+|{\mathbf {s}}|/p}\log n} \right) ^{1/2}. \end{aligned}$$
(14)

In view of the condition (K.1)(ii), we infer that, almost surely,

$$\begin{aligned}&\left| {\widetilde{D}}^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)- {\widetilde{D}}^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}}_k,h)\right| \\&\quad \le \frac{1}{h_{n}^{1+|{\mathbf {s}}|/p}} \left( \frac{1}{n}\sum _{i=1}^n{\mathbb {E}}\left[ \psi ({\mathbf {Y}}_i)\mid {\mathcal {G}}_{i-1}\right] \right) \left( C_{K,s} \frac{\Vert {\mathbf {x}}-{\mathbf {x}}_k \Vert ^{\gamma }}{h_{n}^{1/p}}\right) \\&\quad \le \frac{1}{ h_{n}^{1+|{\mathbf {s}}|/p}} \left( {\mathbb {E}}\left[ \psi ({\mathbf {Y}}_1)\right] \right) \left( C_{K,s} \frac{\left( a\left( h_n^{1/p}/n\right) ^{1/\gamma }\right) ^{\gamma }}{h_{n}^{1/p}}\right) \le \frac{a^\gamma C_{K,s}}{nh_{n}^{1+|{\mathbf {s}}|/p}} \left( {\mathbb {E}}\left[ \psi ({\mathbf {Y}}_1)\right] \right) . \end{aligned}$$

We have then

$$\begin{aligned} \epsilon _n^{-1} D_{n,1,3}= O_{a.s}\left( \frac{1}{nh_{n}^{1+|{\mathbf {s}}|/p}\log n} \right) ^{1/2}. \end{aligned}$$
(15)

We now deal with the term \(D_{n,1,2}\) of the decomposition give, in equation (13). We first observe that we have

$$\begin{aligned} {D_{n,1,2}}= & {} \frac{1}{nh_{n}^{1+|{\mathbf {s}}|/p}}\sum _{i=1}^{n}\left( \psi ({\mathbf {Y}}_{i})D^{|{\mathbf {s}}|}K\left( \frac{{\mathbf {x}}_k-{\mathbf {X}}_{i}}{h_{n}^{1/p}}\right) -{\mathbb {E}}\left[ \psi ({\mathbf {Y}}_{i})D^{|{\mathbf {s}}|}K\left( \frac{{\mathbf {x}}_k-{\mathbf {X}}_{i}}{h_{n}^{1/p}}\right) \mid {\mathcal {G}}_{i-1}\right] \right) \\= & {} \frac{1}{nh_{n}^{1+|{\mathbf {s}}|/p}}\sum _{i=1}^{n}R_{i}({\mathbf {x}}_k) , \end{aligned}$$

where

$$\begin{aligned} R_{i}({\mathbf {x}}_k)= & {} \psi ({\mathbf {Y}}_{i}) D^{|{\mathbf {s}}|} K\left( \frac{{\mathbf {x}}_k-{\mathbf {X}}_i}{h_{n}^{1/p}}\right) - {\mathbb {E}}\left[ \psi ({\mathbf {Y}}_{i})D^{|{\mathbf {s}}|} K\left( \frac{{\mathbf {x}}_k-{\mathbf {X}}_i}{h_{n}^{1/p}}\right) \mid {\mathcal {G}}_{i-1} \right] . \end{aligned}$$

We observe that the sequence \(\big \{ R_{i}({\mathbf {x}}_k)\big \}_{0\le i \le n}\) is a sequence of martingale differences. For \(\nu \ge 2\), we have

$$\begin{aligned} R_i^\nu ({\mathbf {x}}_k)= & {} \sum _{j=0}^\nu C_\nu ^j \left( \psi ({\mathbf {Y}}_{i})D^{|{\mathbf {s}}|} K\left( \frac{{\mathbf {x}}_k-{\mathbf {X}}_i}{h_{n}^{1/p}}\right) \right) ^{j}(-1)^{\nu -j} \left( {\mathbb {E}}\left[ \psi ({\mathbf {Y}}_{i})D^{|{\mathbf {s}}|} K\left( \frac{{\mathbf {x}}_k-{\mathbf {X}}_i}{h_{n}^{1/p}}\right) \mid {\mathcal {G}}_{i-1} \right] \right) ^{\nu -j}, \end{aligned}$$

thus, we have

$$\begin{aligned}&\left| {\mathbb {E}}\left[ R_{i}^\nu ({\mathbf {x}}_k)\mid {\mathcal {G}}_{i-1}\right] \right| \\&\quad = \sum _{j=0}^\nu C_\nu ^j {\mathbb {E}}\left[ \left| \psi ({\mathbf {Y}}_{i})D^{|{\mathbf {s}}|} K\left( \frac{{\mathbf {x}}_k-{\mathbf {X}}_i}{h_{n}^{1/p}}\right) \right| ^{j}\mid {\mathcal {G}}_{i-1} \right] \left( {\mathbb {E}}\left[ \left| \psi ({\mathbf {Y}}_{i})D^{|{\mathbf {s}}|} K\left( \frac{{\mathbf {x}}_k-{\mathbf {X}}_i}{h_{n}^{1/p}}\right) \right| \mid {\mathcal {G}}_{i-1} \right] \right) ^{\nu -j}. \end{aligned}$$

Making use of Jensen’s inequality, we have

$$\begin{aligned}&{\mathbb {E}}\left[ \left| \psi ({\mathbf {Y}}_{i})D^{|{\mathbf {s}}|} K\left( \frac{{\mathbf {x}}_k-{\mathbf {X}}_i}{h_{n}^{1/p}}\right) \right| ^{j}\mid {\mathcal {G}}_{i-1} \right] \left( {\mathbb {E}}\left[ \left| \psi ({\mathbf {Y}}_{i})D^{|{\mathbf {s}}|} K\left( \frac{{\mathbf {x}}_k-{\mathbf {X}}_i}{h_{n}^{1/p}}\right) \right| \mid {\mathcal {G}}_{i-1} \right] \right) ^{\nu -j}\\&\quad \le {\mathbb {E}}\left[ \left| \psi ({\mathbf {Y}}_{i})D^{|{\mathbf {s}}|} K\left( \frac{{\mathbf {x}}_k-{\mathbf {X}}_i}{h_{n}^{1/p}}\right) \right| ^{j}\mid {\mathcal {G}}_{i-1} \right] {\mathbb {E}}\left[ \left| \psi ({\mathbf {Y}}_{i})D^{|{\mathbf {s}}|} K\left( \frac{{\mathbf {x}}_k-{\mathbf {X}}_i}{h_{n}^{1/p}}\right) \right| ^{\nu -j} \mid {\mathcal {G}}_{i-1} \right] . \end{aligned}$$

Observe that for any \(m\ge 1\), under assumption (R.1)(iii), we have

$$\begin{aligned}&{\mathbb {E}}\left[ \left| \psi ({\mathbf {Y}}_{i})D^{|{\mathbf {s}}|} K\left( \frac{{\mathbf {x}}_k-{\mathbf {X}}_i}{h_{n}^{1/p}}\right) \right| ^{m}\mid {\mathcal {G}}_{i-1} \right] \\&\quad = {\mathbb {E}}\left[ {\mathbb {E}}\left[ \left| \psi ({\mathbf {Y}}_{i})\right| ^{m}\left( D^{|{\mathbf {s}}|} K\left( \frac{{\mathbf {x}}_k-{\mathbf {X}}_i}{h_{n}^{1/p}}\right) \right) ^{m}\big | {\mathcal {S}}_{i} \right] \mid {\mathcal {G}}_{i-1} \right] \\&\quad = {\mathbb {E}}\left[ {\mathbb {E}}\left[ \left| \psi ({\mathbf {Y}}_{i})\right| ^{m}\big | {\mathcal {S}}_{i} \right] \left( D^{|{\mathbf {s}}|} K\left( \frac{{\mathbf {x}}_k-{\mathbf {X}}_i}{h_{n}^{1/p}}\right) \right) ^{m}\mid {\mathcal {G}}_{i-1} \right] \\&\quad = {\mathbb {E}}\left[ \Psi _m({\mathbf {X}}_{i})\left( D^{|{\mathbf {s}}|} K\left( \frac{{\mathbf {x}}_k-{\mathbf {X}}_i}{h_{n}^{1/p}}\right) \right) ^{m}\mid {\mathcal {G}}_{i-1} \right] \\&\quad \le {\mathbb {E}}\left[ \left| \Psi _m({\mathbf {X}}_{i})-\Psi _m({\mathbf {x}})\right| \left( D^{|{\mathbf {s}}|} K\left( \frac{{\mathbf {x}}_k-{\mathbf {X}}_i}{h_{n}^{1/p}}\right) \right) ^{m}\mid {\mathcal {G}}_{i-1} \right] \\&\qquad + {\mathbb {E}}\left[ \Psi _m({\mathbf {x}})\left( D^{|{\mathbf {s}}|} K\left( \frac{{\mathbf {x}}_k-{\mathbf {X}}_i}{h_{n}^{1/p}}\right) \right) ^{m}\mid {\mathcal {G}}_{i-1} \right] \\&\quad \le {\mathbb {E}}\left[ \left( D^{|{\mathbf {s}}|} K\left( \frac{{\mathbf {x}}_k-{\mathbf {X}}_i}{h_{n}^{1/p}}\right) \right) ^{m}\mid {\mathcal {G}}_{i-1} \right] \left( \Psi _m({\mathbf {x}})+\underset{{\mathbf {u}}\in B({\mathbf {x}},h_n^{1/p})}{\sup } \left| \Psi _m({\mathbf {u}})-\Psi _m({\mathbf {x}})\right| \right) \\&\quad = C_{0,\psi } {\mathbb {E}}\left[ \left( D^{|{\mathbf {s}}|} K\left( \frac{{\mathbf {x}}_k-{\mathbf {X}}_i}{h_{n}^{1/p}}\right) \right) ^{m}\mid {\mathcal {G}}_{i-1} \right] , \end{aligned}$$

where \(C_{0,\psi }\) is a positive constant. By a simple change of variable, we obtain that

$$\begin{aligned} {\mathbb {E}}\left[ \left( D^{|{\mathbf {s}}|} K\left( \frac{{\mathbf {x}}_k-{\mathbf {X}}_i}{h_{n}^{1/p}}\right) \right) ^{m}\mid {\mathcal {G}}_{i-1} \right]= & {} \int _{{\mathbb {R}}^p} \left( D^{|{\mathbf {s}}|} K\right) ^{m}\left( \frac{{\mathbf {x}}_k-{\mathbf {u}}}{h_{n}^{1/p}}\right) f^{{\mathcal {G}}_{i-1}}_{{\mathbf {X}}_i}({\mathbf {u}}) \hbox {d}{\mathbf {u}}\\= & {} h_{n}\int _{{\mathbb {R}}^p} \left( D^{|{\mathbf {s}}|} K\right) ^{m}\left( {\mathbf {v}}\right) f^{{\mathcal {G}}_{i-1}}_{{\mathbf {X}}_i}({\mathbf {x}}_k-h_{n}^{1/p}{\mathbf {v}}) \hbox {d}{\mathbf {v}}. \end{aligned}$$

In the light of the assumption (K.1)(i), we know that the kernel \(K(\cdot )\) is a compactly supported, this implies that \( D^{|{\mathbf {s}}|} K({\mathbf {x}})\le \Gamma _K. \) Making use of the assumption (C.1) in combination with an integration by parts repeated \(|\mathbf{s} |\) times and a Taylor’s expansion of order 1, implies that we have

$$\begin{aligned}&{\mathbb {E}}\left[ \left( D^{|{\mathbf {s}}|} K\left( \frac{{\mathbf {x}}_k-{\mathbf {X}}_i}{h_{n}^{1/p}}\right) \right) ^{m}\mid {\mathcal {G}}_{i-1} \right] \\&\quad \le h_{n}\Gamma _K^{m-1} \int _{{\mathbb {R}}^p} D^{|{\mathbf {s}}|} K \left( {\mathbf {v}}\right) f^{{\mathcal {G}}_{i-1}}_{{\mathbf {X}}_i}({\mathbf {x}}_k-h_{n}^{1/p}{\mathbf {v}}) \hbox {d}{\mathbf {v}}\\&=\quad \Gamma _K^{m-1} h_{n}^{1+|{\mathbf {s}}|/p} \int _{{\mathbb {R}}^p} K \left( {\mathbf {v}}\right) D^{|{\mathbf {s}}|}f^{{\mathcal {G}}_{i-1}}_{{\mathbf {X}}_i}({\mathbf {x}}_k-h_{n}^{1/p}{\mathbf {v}}) \hbox {d}{\mathbf {v}}\\&\quad = \Gamma _K^{m-1} h_{n}^{1+|{\mathbf {s}}|/p} \int _{{\mathbb {R}}^p} K \left( {\mathbf {v}}\right) \left( D^{|{\mathbf {s}}|} f^{{\mathcal {G}}_{i-1}}_{{\mathbf {X}}_i}({\mathbf {x}}_k)+O\left( h_{n}^{{\mathbb {k}}/p}\right) \right) \hbox {d}{\mathbf {v}}\\&\quad = \Gamma _K^{m-1} h_{n}^{1+|{\mathbf {s}}|/p} \left( D^{|{\mathbf {s}}|} f^{{\mathcal {G}}_{i-1}}_{{\mathbf {X}}_i}({\mathbf {x}}_k)+O\left( h_{n}^{{\mathbb {k}}/p}\right) \right) . \end{aligned}$$

Since \(h_{n}^{2(1+|{\mathbf {s}}|/p)} \le h_{n}^{1+|{\mathbf {s}}|/p} \), we readily obtain that

$$\begin{aligned} \left| {\mathbb {E}}\left[ R_{I}^\nu ({\mathbf {x}}_k)\mid {\mathcal {G}}_{i-1}\right] \right|= & {} C_{0,\psi }^2 \Gamma _K^{\nu -2} h_{n}^{2(1+|{\mathbf {s}}|/p)} \left( \left( D^{|{\mathbf {s}}|} f^{{\mathcal {G}}_{i-1}}_{{\mathbf {X}}_i} \right) ^2({\mathbf {x}}_k) +O\left( h_{n}^{{\mathbb {k}}/p}\right) \right) \sum _{j=0}^\nu C_\nu ^j \\= & {} 2^\nu C_{0,\psi }^2 \Gamma _K^{\nu -2} h_{n}^{1+|{\mathbf {s}}|/p} \left( \left( D^{|{\mathbf {s}}|} f^{{\mathcal {G}}_{i-1}} _{{\mathbf {X}}_i}\right) ^2({\mathbf {x}}_k) +O\left( h_{n}^{{\mathbb {k}}/p}\right) \right) \\= & {} \nu ! \Gamma _K^{\nu -2}h_{n}^{1+|{\mathbf {s}}|/p} \left[ \left( D^{|{\mathbf {s}}|} f^{{\mathcal {G}}_{i-1}}_{{\mathbf {X}}_i} \right) ^2({\mathbf {x}}_k) +O\left( h_{n}^{{\mathbb {k}}/p}\right) \right] \\= & {} \nu ! C^{\nu -2}h_{n}^{1+|{\mathbf {s}}|/p} \left[ \left( D^{|{\mathbf {s}}|} f^{{\mathcal {G}}_{i-1}}_{{\mathbf {X}}_i} \right) ^2({\mathbf {x}}_k) +O\left( h_{n}^{{\mathbb {k}}/p}\right) \right] , \end{aligned}$$

where \(C=\Gamma _K\). By choosing that

$$\begin{aligned} d_i^2=h_{n}^{1+|{\mathbf {s}}|/p} \left[ \left( D^{|{\mathbf {s}}|} f^{{\mathcal {G}}_{i-1}}_{{\mathbf {X}}_i} \right) ^2({\mathbf {x}}_k) +O\left( h_{n}^{{\mathbb {k}}/p}\right) \right] , \end{aligned}$$

we have

$$\begin{aligned} D_n= & {} \sum ^{n}_{i=1} d_i^2= n h_{n}^{1+|{\mathbf {s}}|/p} \left[ \dfrac{1}{n}\sum ^{n}_{i=1} \left( D^{|{\mathbf {s}}|} f^{{\mathcal {G}}_{i-1}}_{{\mathbf {X}}_i} \right) ^2({\mathbf {x}}_k) +O\left( h_{n}^{{\mathbb {k}}/p}\right) \right] , \end{aligned}$$

where \(\left( \left( D^{|{\mathbf {s}}|} f^{{\mathcal {G}}_{i-1}}_{{\mathbf {X}}_i} \right) ^2({\mathbf {x}}_k) \right) _{1\le i \le n}\) is a sequence of stationary and ergodic sequence. Hence we obtain that

$$\begin{aligned} \lim _{n\rightarrow \infty }\dfrac{1}{n}\sum ^{n}_{i=1} \left( D^{|{\mathbf {s}}|} f^{{\mathcal {G}}_{i-1}}_{{\mathbf {X}}_i} \right) ^2({\mathbf {x}})= \left( D^{|{\mathbf {s}}|} f \right) ^2({\mathbf {x}}). \end{aligned}$$

The use of the assumption (C.3) implies that

$$\begin{aligned} D_n= & {} n h_{n}^{1+|{\mathbf {s}}|/p} \left[ \left( D^{|{\mathbf {s}}|} f \right) ^2({\mathbf {x}}_k) +O\left( h_{n}^{{\mathbb {k}}/p}\right) \right] =O\left( n h_{n}^{1+|{\mathbf {s}}|/p} \right) . \end{aligned}$$

Now, taking \(\epsilon _n=\epsilon _0\left( \dfrac{\log n}{nh_{n}^{1+|{\mathbf {s}}|/p}}\right) ^{1/2}\) and an application of Lemma 1 gives that

$$\begin{aligned} {\mathbb {P}} \left\{ \left| D_{n,1,2} \right|> \epsilon _n \right\}= & {} {\mathbb {P}} \left\{ \left| \underset{1\le k\le l}{\max } \sum _{i=1}^n R_{i}({\mathbf {x}}_k) \right| > \epsilon _n (nh_{n}^{1+|{\mathbf {s}}|/p}) \right\} \\\le & {} \sum _{i=1}^\ell \exp \{ - \frac{ \epsilon _n^{2} \left( nh_{n}^{1+|{\mathbf {s}}|/p}\right) ^{2} }{2(D_n+C \left( nh_{n}^{1+|{\mathbf {s}}|/p}\right) \epsilon _0\left( \dfrac{\log n}{nh_{n}^{1+|{\mathbf {s}}|/p}}\right) ^{1/2}}\} \\\le & {} 2\ell \exp \{ - \frac{\epsilon _0^2 \left( nh_{n}^{1+|{\mathbf {s}}|/p}\right) \log n}{O\left( nh_{n}^{1+|{\mathbf {s}}|/p}\right) \left( 1+ C \epsilon _0\left( \dfrac{\log n}{nh_{n}^{1+|{\mathbf {s}}|/p}}\right) ^{1/2}\right) } \} \\ {}= & {} 2\ell \exp \left\{ \log {n^{-\epsilon _0^2 C_1}} \right\} = 2\ell n^{-\epsilon _0^2 C_1 }, \end{aligned}$$

where \(C_1\) is a positive constant. Hence, for \(\epsilon \) large enough, we obtain

$$\begin{aligned} \sum _{n=1}^\infty {\mathbb {P}} \left\{ \underset{1\le k\le l}{\max } \left| \sum _{i=1}^n R_{i}({\mathbf {x}}_k) \right| >\epsilon _n (nh_{n}^{1+|{\mathbf {s}}|/p}) \right\} < \infty . \end{aligned}$$

The proof is completed by a routine application of Borel–Cantelli lemma. This, in turn, implies that

$$\begin{aligned} \frac{1}{nh_{n}^{1+|{\mathbf {s}}|/p}}\underset{1\le k\le l}{\max } \left| \sum _{i=1}^n R_{i}({\mathbf {x}}_k) \right| =O\left( \left( \dfrac{\log n}{nh_{n}^{1+|{\mathbf {s}}|/p}}\right) ^{1/2}\right) , \quad \text{ a.s }. \end{aligned}$$

Hence, we have

$$\begin{aligned} D_{n,1,2}= O_{a.s}\left( \dfrac{\log n}{nh_{n}^{1+|{\mathbf {s}}|/p}}\right) ^{1/2}. \end{aligned}$$
(16)

By combining the statements (14), (15) and (16), we obtain

$$\begin{aligned} D_{n,1}= O_{a.s}\left( \dfrac{\log n}{nh_{n}^{1+|{\mathbf {s}}|/p}}\right) ^{1/2}. \end{aligned}$$
(17)

Consider now, the second term of the decomposition given in equation (13), under assumption (R.1)(i), we infer that

$$\begin{aligned} D_{n,2}= & {} \sup _{{\mathbf {x}}\in {\mathbf {J}}}\left| {\widetilde{D}}^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)-{\mathbb {E}}D^{|{\mathbf {s}}|}r_{n}(\psi ;{\mathbf {x}},h_n)\right| \\= & {} \frac{1}{nh_{n}^{1+|{\mathbf {s}}|/p}} \sup _{{\mathbf {x}}\in {\mathbf {J}}}\left| \sum _{i=1}^{n} \left( {\mathbb {E}}\left[ {\mathbb {E}}\left[ \psi ({\mathbf {Y}}_{i})D^{|{\mathbf {s}}|}K\left( \frac{{\mathbf {x}}-{\mathbf {X}}_{i}}{h_{n}^{1/p}}\right) \mid {\mathcal {S}}_{i-1}\right] \mid {\mathcal {G}}_{i-1}\right] \right. \right. \\&\left. \left. -{\mathbb {E}}\left[ {\mathbb {E}}\left[ \psi ({\mathbf {Y}}_{i})D^{|{\mathbf {s}}|}K\left( \frac{{\mathbf {x}}-{\mathbf {X}}_{i}}{h_{n}^{1/p}}\right) \mid {\mathcal {S}}_{i-1}\right] \right] \right) \right| \\= & {} \frac{1}{nh_{n}^{1+|{\mathbf {s}}|/p}} \sup _{{\mathbf {x}}\in {\mathbf {J}}}\left| \sum _{i=1}^{n} \left( {\mathbb {E}}\left[ {\mathbb {E}}\left[ \psi ({\mathbf {Y}}_{i})\mid {\mathcal {S}}_{i-1}\right] D^{|{\mathbf {s}}|}K\left( \frac{{\mathbf {x}}-{\mathbf {X}}_{i}}{h_{n}^{1/p}}\right) \mid {\mathcal {G}}_{i-1}\right] \right. \right. \\&\left. \left. -{\mathbb {E}}\left[ {\mathbb {E}}\left[ \psi ({\mathbf {Y}}_{i})\mid {\mathcal {S}}_{i-1}\right] D^{|{\mathbf {s}}|}K\left( \frac{{\mathbf {x}}-{\mathbf {X}}_{i}}{h_{n}^{1/p}}\right) \right] \right) \right| \\= & {} \frac{1}{nh_{n}^{1+|{\mathbf {s}}|/p}} \sup _{{\mathbf {x}}\in {\mathbf {J}}}\left| \sum _{i=1}^{n} \left( {\mathbb {E}}\left[ m({\mathbf {X}}_i) D^{|{\mathbf {s}}|}K\left( \frac{{\mathbf {x}}-{\mathbf {X}}_{i}}{h_{n}^{1/p}}\right) \mid {\mathcal {G}}_{i-1}\right] \right. \right. \\&-\left. \left. {\mathbb {E}}\left[ m({\mathbf {X}}_i)D^{|{\mathbf {s}}|}K\left( \frac{{\mathbf {x}}-{\mathbf {X}}_{i}}{h_{n}^{1/p}}\right) \right] \right) \right| . \end{aligned}$$

A simple change of variables and making use of the assumption (R.1)(ii), we obtain

$$\begin{aligned} D_{n,2}= & {} \frac{1}{nh_{n}^{1+|{\mathbf {s}}|/p}} \sup _{{\mathbf {x}}\in {\mathbf {J}}}\left| \sum _{i=1}^{n}\int _{{{\mathbb {R}}}^p}m({\mathbf {u}},\psi ) D^{|{\mathbf {s}}|}K\left( \frac{{\mathbf {x}}-{\mathbf {u}}}{h_{n}^{1/p}}\right) \left( f^{ {\mathcal {G}}_{i-1}}_{{\mathbf {X}}_i}({\mathbf {u}}) -f({\mathbf {u}})\right) \hbox {d}{\mathbf {u}} \right| \\\le & {} \frac{1}{nh_{n}} \sup _{{\mathbf {x}}\in {\mathbf {J}}}\left| \sup _{\Vert {\mathbf {u}}-{\mathbf {x}}\Vert \le \lambda h_n} \left| m({\mathbf {u}},\psi )- m({\mathbf {x}},\psi )\right| +m({\mathbf {x}},\psi )\right| \\&\sup _{{\mathbf {x}}\in {\mathbf {J}}}\left| \sum _{i=1}^{n}\int _{{{\mathbb {R}}}^p} K\left( \frac{{\mathbf {x}}-{\mathbf {u}}}{h_{n}^{1/p}}\right) \left( D^{|{\mathbf {s}}|}f^{ {\mathcal {G}}_{i-1}}_{{\mathbf {X}}_i}({\mathbf {u}}) -D^{|{\mathbf {s}}|}f({\mathbf {u}})\right) \hbox {d}{\mathbf {u}} \right| \\= & {} \frac{1}{nh_{n}} \left( \sup _{{\mathbf {x}}\in {\mathbf {J}}}\left| m({\mathbf {x}},\psi )\right| +O(h_n)\right) \\&\times \sup _{{\mathbf {x}}\in {\mathbf {J}}}\left| \sum _{i=1}^{n}\int _{{{\mathbb {R}}}^p} K\left( \frac{{\mathbf {x}}-{\mathbf {u}}}{h_{n}^{1/p}}\right) \left( D^{|{\mathbf {s}}|}f^{ {\mathcal {G}}_{i-1}}_{{\mathbf {X}}_i}({\mathbf {u}}) -D^{|{\mathbf {s}}|}f({\mathbf {u}})\right) \hbox {d}{\mathbf {u}} \right| \\= & {} \frac{1}{nh_{n}} {\mathbf {C}}_{{\mathbf {m}}} \sup _{{\mathbf {x}}\in {\mathbf {J}}}\left| \int _{{{\mathbb {R}}}^p} K\left( \frac{{\mathbf {x}}-{\mathbf {u}}}{h_{n}^{1/p}}\right) \left( \sum _{i=1}^{n} \left( D^{|{\mathbf {s}}|}f^{ {\mathcal {G}}_{i-1}}_{{\mathbf {X}}_i}({\mathbf {u}}) -D^{|{\mathbf {s}}|}f({\mathbf {u}})\right) \right) \hbox {d}{\mathbf {u}} \right| \\\le & {} \frac{1}{nh_{n}} C_{{\mathbf {m}}} \sup _{{\mathbf {x}}\in {\mathbf {J}}}\left| \int _{{{\mathbb {R}}}^p} K\left( \frac{{\mathbf {x}}-{\mathbf {u}}}{h_{n}^{1/p}}\right) \Vert H_n({\mathbf {x}})\Vert \hbox {d}{\mathbf {u}} \right| \\\le & {} O(n^{-1/2}) C_{{\mathbf {m}}} \sup _{{\mathbf {x}}\in {\mathbf {J}}}\left| \frac{1}{h_{n}} \int _{{{\mathbb {R}}}^p} K\left( \frac{{\mathbf {x}}-{\mathbf {u}}}{h_{n}^{1/p}}\right) \hbox {d}{\mathbf {u}} \right| , \end{aligned}$$

where \(C_{{\mathbf {m}}}=\left( \underset{{\mathbf {x}}\in {\mathbf {J}}}{\sup }\left| m({\mathbf {x}},\psi )\right| +O(h_n)\right) .\) Hence we have

$$\begin{aligned} D_{n,2}=O(n^{-1/2}). \end{aligned}$$
(18)

Combining the statements (17) and (18), we obtain the desired result given in (12). \(\square \)