1 Introduction

Let (XZ) be a \(E\times F\)-valued random elements, where E and F are some semi-metric abstract spaces. Denote by \(d_E\) and \(d_F\) semi-metrics associated to spaces E and F respectively. Let \(\mathcal{C}\) be a class of real functions defined upon F. Obviously, for any \(\varphi \in {\mathcal {C}}\), \(\varphi (Z)\) is a real random variable. Suppose now that we observe a sequence \((X_i,Z_i)_{i\ge 1}\) of copies of (XZ) that we assume to be stationary and ergodic. For any \(x\in E\) and any \(\varphi \in { \mathcal {C}}\), let \(g_{\varphi }(.|x)\) be the conditional density of \(\varphi (Z)\) given \(X=x\). We assume that \(g_{\varphi }(.|x)\) is unimodal on some compact \(S_\varphi \subset \mathbb {R}\). The conditional mode is defined, for any fixed \(x\in E\), by

$$\begin{aligned} \varTheta _{\varphi }(x)=\text{ arg }\sup _{y \in S_\varphi } g_{\varphi }(y|x). \end{aligned}$$

Note that, if there exists \(\xi >0\) such that for any \(\varphi \in \mathcal {C}\)

$$\begin{aligned} g_{\varphi }(.|x)\uparrow \ \text{ on }\ (\varTheta _{\varphi }(x)-\xi ,\ \varTheta _{\varphi }(x))\ \text{ and }\ g_{\varphi }(.|x)\downarrow \ \text{ on }\ (\varTheta _{\varphi }(x),\ \varTheta _{\varphi }(x)+\xi ), \end{aligned}$$
(1)

and if we choose \( S_\varphi = [\varTheta _{\varphi }(x)-\xi ,\ \varTheta _{\varphi }(x)+\xi ]\), then the mode \(\varTheta _{\varphi }(x)\) is uniquely defined for any \(\varphi \). The kernel estimator, say \(\hat{\varTheta }_{\varphi ,n}(x)\), of \(\varTheta _{\varphi }(x)\) may be defined as the value maximizing the kernel estimator \(g_{\varphi ,n}(y|x)\) of \(g_{\varphi }(y|x)\), that is,

$$\begin{aligned} g_{\varphi ,n}\left( \hat{\varTheta }_{\varphi ,n}(x)|x\right) =\sup _{y\in S_\varphi }g_{\varphi ,n}(y|x). \end{aligned}$$
(2)

Here,

$$\begin{aligned} g_{\varphi ,n}(y|x)=\frac{{f}_{\varphi ,n}(x,y)}{l_n(x)}, \end{aligned}$$

where

$$\begin{aligned} f_{\varphi ,n}(x,y)= & {} \frac{1}{nh_H\mathbb {E}[\varDelta _1(x)]}\sum _{i=1}^{n}\left[\varDelta _i(x)H\left( \frac{y-\varphi (Z_i)}{h_H}\right) \right], \\ l_n(x)= & {} \frac{1}{n\mathbb {E}[\varDelta _1(x)]}\sum _{i=1}^{n} \varDelta _i(x) \end{aligned}$$

and \(\displaystyle {\varDelta _i(x)=K\left( \frac{d(x,X_i)}{h_K}\right) }\), with K and H are two real valued kernels and \((h_K, h_H):=(h_{K,n}, h_{H,n})\) a sequence of positive real numbers tending to zero as \(n\rightarrow \infty \).

The aim of this paper is to establish the uniform consistency, with respect to the function parameter \(\varphi \in {\mathcal {C}}\), of the conditional mode estimator \(\hat{\varTheta }_{\varphi ,n}(x)\) when data are assumed to be sampled from a stationary and ergodic process. More precisely, under suitable conditions upon the entropy of the class \({\mathcal {C}}\) and the rate of convergence of the smoothing parameters \(h_K\) and \(h_H\) together with some regularity conditions on the distribution of the random element (XZ), we obtain results of type

$$\begin{aligned} \sup _{\varphi \in {\mathcal {C}}}|\hat{\varTheta }_{\varphi ,n}(x)-\varTheta _{\varphi }(x)|=O(\alpha _n), \ \text{ a.s. } \end{aligned}$$

where \(\alpha _n\) is a quantity to be specified later on. Notice that, besides the infinite dimensional character of the data, the ergodic framework avoid the widely used strong mixing condition and its variants to measure the dependency and the very involved probabilistic calculations that it implies [see, for instance, Masry (2005)]. Further motivations to consider ergodic data are discussed in Laïb (2005) and Laïb and Louani (2010) where details defining the ergodic property of processes together with examples of such processes are also given.

Indexing by a function \(\varphi \) allows to consider simultaneously various situations related to model fitting and time series forecasting. Whenever \(Z:=\{Z(t) : t\in T\}\) denotes a process defined on some real set T, one may consider the following functionals \(\varphi _1(Z)=\sup _{t\in T}Z(t)\) and \(\varphi _2(Z)=\inf _{t\in T}Z(t)\) giving extremes of the process Z that are of interest in various domains as, for example, the finance, hydraulics and the weather forecasting. For some weight function W defined on T and some \(p>0\), one may consider the functional \(\varphi _{p,W}\) defined by \(\varphi _{p,W}(Z)=\int _TW(t)Z^p(t)dt\). Further situation is to consider, for some subset A of T, the functional \(Z\rightarrow \varphi _\rho (Z) =\inf \{t\in A : Z(t)\ge \rho \}\) for some threshold \(\rho \). Such a case is very useful in threshold and barrier crossing problems encountered in various domains as finance, physical chemistry and hydraulics. Moreover, indexing by a class of functions \(\mathcal {C}\) is a step towards modelling a functional response random variable. Indeed, the quantity \(Z(\varphi ):=\{\varphi (Z) : \varphi \in \mathcal {C}\}\) may be viewed as a functional random variable offering, in this respect, a device for such investigations.

The modelization of the functional variable is becoming more and more popular since the publication of the monograph of Ramsay and Silverman (1997) on functional data analysis. Note however that the first results dealing with nonparametric models (mainly the regression function) were obtained by Ferraty and Vieu (2000). Since then, an increasing number of papers on this topic has been published. One may refer to the monograph by Ferraty and Vieu (2006) for an overview on the subject and the references therein. Extensions to other regression issues as the time series prediction have been carried out in a number of publications, see for instance Delsol (2009). The general framework of ergodic functional data has been considered by Laïb and Louani (2010, 2011) who stated consistencies with rates together with the asymptotic normality of the regression function estimate.

Asymptotic properties of the conditional mode estimator have been investigated in various situations throughout the literature. Ferraty et al. (2006) studied asymptotic properties of kernel-type estimators of some characteristics of the conditional cumulative distribution with particular applications to the conditional mode and conditional quantiles. Ezzahrioui and Ould-Saïd (2008), Ezzahrioui and Ould-Saïd (2010) established the asymptotic normality of the kernel conditional mode estimator in both i.i.d. and strong mixing cases. Dabo-Niang and Laksaci (2007) provided a convergence rate in \(L^p\) norm sense of the kernel conditional mode estimator whenever functional \(\alpha \)-mixing observations are considered. Demongeot et al. (2010) have established the pointwise and uniform almost complete convergences with rates of the local linear estimator of the conditional density. They used their results to deduce some asymptotic properties of the local linear estimator of the conditional mode. Attaoui et al. (2011) have established the pointwise and uniform almost complete convergence, with rates, of the kernel estimate of the conditional density when the observations are linked with a single-index structure. They applied their results to the prediction problem via the conditional mode estimate. Notice also that, considering a scalar response variable Y with a covariate X taking values in a semi-metric space, Ferraty et al. (2010) studied, in the i.i.d. case, the nonparametric estimation of some functionals of the conditional distribution including the regression function, the conditional cumulative distribution, the conditional density together with the conditional mode. They established the uniform almost complete convergence, with rates, of kernel estimators of these quantities.

It is well-known that the conditional mode provides an alternative prediction method to the classical approach based on the usual regression function. Since there exist cases where the conditional density is such that the regression function vanishes everywhere, then it makes no sense to use this approach in problems involving prediction. An example in a finite-dimensional space is given in Ould Saïd (1997) to illustrate this situation. Moreover, a simulation study in infinite-dimensional spaces carried out by Ferraty et al. (2006), shows that the conditional mode approach gives slightly better results than the usual regression approach.

In this paper, two applications to energy data are provided to illustrate some examples of the proposed approach in time series forecasting framework. The first real case consists in forecasting the daily peak of electricity demand in France (measured in Giga-Watt). Let us denote by \((Z_i(t))_{t\in [0,T]}\) the curve of electricity demand (called also load curve) measured over an interval [0,T]. If we have hourly (reps. half-hour) measures then \(T=24\) (resp. \(T=48\)). The peak demand observed for any day i is defined as \(\mathcal {P}_i = \sup _{t\in [0,T]} Z_i(t)\). In such case \(\varphi (\cdot )\) is fixed to be the supremum function, over [0, T], of the function Z(t). Accurate prediction of daily peak load demand is very important for decision in the energy sector. In fact, short-term load forecasts enable effective load shifting between transmission substations, scheduling of startup times of peak stations, load flow analysis and power system security studies. Figure 1 provides a sample of seven daily load curves (from 07/01/2002 to 13/01/2002). Vertical dotted lines separate days and the star points correspond to the peak demand for each day.

It is well-known that, in addition to peak demand, some other characteristics of the load curve may be of interest from an operational point of view. In fact the prediction of the electrical energy (measured in Giga-Watt per Hour) consumed over an interval of 3 h around the peak demand may helps in the determination of consistent and reliable supply schedules during peak period. Therefore, the second application in this paper deals with the short-term forecasting of the electrical energy that may be consumed between 6 p.m. and 9 p.m. in winter and between 12 a.m. and 3 p.m. in summer. Those time intervals cover the peak demand which happens around 7pm in winter and 2pm in summer. Formally, if we consider \(Z_i(t)\) the load curve of some day i, then the electrical energy consumed between two instants \(t_1\) and \(t_2\) is defined as \(\mathcal {E}_i = \int _{t_1}^{t_2} Z_i(t) dt\). In this case, \(\varphi (\cdot )\) is the integral function. An example of half hour daily load curve is plotted in Fig. 2. Solid line is the daily load curve and the grey surface corresponds to the electrical energy consumed over an interval of 3 h around the peak.

Fig. 1
figure 1

Half hour electricity consumption in France from 07/01/2002 to 13/01/2002 (7 days). The vertical dotted lines separate days and the star points correspond to the peak demand for each day (in Giga-Watt)

Fig. 2
figure 2

Half hour daily load curve in solid line and the grey surface corresponds to the electrical energy (in Giga-Watt per Hour) consumed over an interval of 3 h around the peak

2 Results

In order to state our results, we introduce some notations. Let \({\mathcal F}_i\) be the \(\sigma \)-field generated by \(( (X_1,Z_1), \ldots , (X_i, Z_i))\) and \({\mathcal G}_i\) the one generated by \(( (X_1,Z_1), \ldots ,\) \((X_i, Z_i), X_{i+1})\). Let B(xu) be a ball centered at \(x\in E\) with radius u. Let \(D_i(x):=d(x, X_i)\) so that \(D_i(x)\) is a nonnegative real-valued random variable. Working on the probability space \((\varOmega , \mathcal{A}, \mathbb {P})\), let \(F_x(u)=\mathbb {P}(D_i(x) \le u) :=\mathbb {P}(X_i\in B(x, u))\) and \(F_x^{\mathcal{F}_{i-1}}(u)=\mathbb {P}(D_i(x)\le u \ | \mathcal{F}_{i-1})=\mathbb {P}(X_i\in B(x, u)\; | \mathcal{F}_{i-1})\) be the distribution function and the conditional distribution function, given the \(\sigma \)-field \(\mathcal{F}_{i-1}\), of \((D_i(x))_{i\ge 1}\) respectively. Denote by \(o_{\text{ a.s. }}(u)\) a real random function l such that l(u) / u converges to zero almost surely as \(u\rightarrow 0\). Similarly, define \(O_{\text{ a.s. }}(u)\) as a real random function l such that l(u) / u is almost surely bounded.

Our results are stated under some assumptions we gather hereafter for easy reference.

A1 :

For \(x \in E\), there exist a sequence of nonnegative random functional \((f_{i,1})_{i\ge 1}\) almost surely bounded by a sequence of deterministic quantities \((b_i(x))_{i\ge 1}\) accordingly, a sequence of random functions \((\psi _{i,x})_{i\ge 1}\), a deterministic nonnegative bounded functional \(f_1\) and a nonnegative nondecreasing real function \(\phi \) tending to zero as its argument goes to zero, such that

(i):

\(\displaystyle {F_x(u)=\phi (u) f_1(x)+o(\phi (u))}\), as \(u\rightarrow 0\),

(ii):

For any \(i\in \mathbb {N}\), \(\displaystyle {F_x^{\mathcal {F}_{i-1}}(u)=\phi (u) f_{i,1}(x)+\psi _{i,x}}(u)\) with \(\psi _{i,x}(u)=o_{a.s.}(\phi (u))\) as \(u\rightarrow 0\), \(\displaystyle {\frac{\psi _{i,x}(u)}{\phi (u)}}\) is almost surely bounded and \(\displaystyle {\frac{1}{n}\sum _{i=1}^{n}\psi _{i,x}(u)=o_{a.s.}(\phi (u))}\) as \(n\rightarrow \infty \), \(u\rightarrow 0\).

(iii):

\(\displaystyle {n^{-1}\sum _{i=1}^{n}f_{i,1}(x)}\rightarrow f_1(x)\) almost surely as \(n\rightarrow \infty \).

(iv):

There exists a nondecreasing bounded function \(\tau _0\) such that, uniformly in \(u\in [0,\ 1]\), \(\displaystyle {\frac{\phi (h_Ku)}{\phi (h_K)}=\tau _0(u)+o(1)}\), as \(h\downarrow 0\) and, for \(1\le j\le 2\), \(\displaystyle {\int _{0}^{1}(K^j(u))^{\prime }\tau _0(u)du<\infty }\).

(v):

\(n^{-1}\sum ^n_{i=1}b_i(x)\rightarrow D(x) < \infty \) as \(n\rightarrow \infty \).

A2 :

K is a nonnegative bounded kernel of class \(\mathcal {C}^1\) over its support \([0,1]\), with \(K(1)>0\) and the derivative \(K^{\prime }\) is such that \(K^{\prime }(t)<0\), for any \(t\in [0,1]\).

A3 :
(i):

For any \(\epsilon > 0\), there exists \(\eta > 0\) such that for any \((\varphi _1,\ \varphi _2)\in \mathcal {C}\times \mathcal {C}\), \(d_{\mathcal {C}}(\varphi _1,\ \varphi _2)<\eta \) implies that \(|\varTheta _{\varphi _1}(x)- \varTheta _{\varphi _2}(x)|< \epsilon \).

(ii):

Uniformly in \(\varphi \in \mathcal {C}\), \(g_\varphi (.|x)\) is uniformly continuous on \(S_\varphi \).

(iii):

\(g_{\varphi }(.|x)\) is differentiable up to order 2 and \(\lim _{n\rightarrow \infty }\displaystyle \sup _{\varphi \in \mathcal {C}}|g_\varphi ^{(2)}(\hat{\varTheta }_{\varphi ,n}(x)|x)|:=\varPhi (x)\ne 0\).

(iv):

For any \(x\in E\), there exist V(x) a neighborhood of x, some constants \(C_x>0\), \(\beta >0\) and \(\nu \in (0,\ 1]\), independent of \(\varphi \), such that for any \(\varphi \in \mathcal {C}\), we have \(\forall \ (y_1,\ y_2) \in S_\varphi \times S_\varphi \), \(\forall (x_1,\ x_2) \in V(x)\times V(x)\),

\(|g^{(j)}_{\varphi }(y_1|x_1)-g^{(j)}_{\varphi }(y_2|x_2)|\le C_x (|y_1- y_2|^{\nu }+d(x_1, \ x_2)^{\beta }),\ j=0,2. \)

A4 :

Consider the space of functions \(\mathcal{D}=\{\psi =t-\varphi \) where \(t\in \mathbb {R}\) and \(\varphi \in \mathcal {C}\}\) on which we define the distance \(\rho \) given, for any \((t_1-\varphi _1,t_2-\varphi _2)\in \mathcal{D}^2\) by, \(\rho (t_1-\varphi _1,t_2-\varphi _2)=|t_2-t_1|+d_\mathcal{{C}}(\varphi _1,\varphi _2)\). The kernel H is such that

(i):

\(\displaystyle {\int _{\mathbb {R}}|t|^{\nu }H(t)dt<\infty }\) and \(\displaystyle {\int _{\mathbb {R}}tH(t)dt=0}\),

(ii):

For all \((t_1-\varphi _1,t_2-\varphi _2)\in \mathcal{D}^2 \;\text{ and }\; \forall z\in F,\)

$$\begin{aligned} |H(t_1-\varphi _1(z))-H(t_2-\varphi _2(z))|\le C_H \rho (t_1-\varphi _1,t_2-\varphi _2), \end{aligned}$$

where \(C_H\) is a positive constant.

A5 :

For \(j=0,1,2\) and any \(\varphi \in \mathcal{C}\),

$$\begin{aligned} \mathbb {E}\left[H^{(j)}\left( \frac{y-\varphi (Z_i)}{h_H}\right) \mid \mathcal {G}_{i-1}\right]=\mathbb {E}\left[H^{(j)}\left( \frac{y-\varphi (Z_i)}{h_H}\right) \mid \ X_i\right]. \end{aligned}$$

Comments on the hypotheses As to discuss the conditions A1, it is worth noticing that the fundamental hypothesis A1(ii) involves the functional nature of the data together with their dependency. As usually in such a framework, small balls techniques are used to handle the probabilities on infinite dimension spaces where the Lebesgue measure does not exist. Several examples of processes fulfilling this condition are given in Laïb and Louani (2011). Note that the hypothesis A1(i) stands as a particular case of A1(ii) while conditioning by the trivial \(\sigma \)-field. A number of processes satisfying this condition are given through out the literature, see, for instance, Ferraty and Vieu (2006). Conditions A1(iii) and A1(v) are set basically to meet the ergodic Theorem which may be expressed as the classical law of large numbers. Conditions A2 and A4 impose some regularity conditions upon the kernels used in our estimates. When indexing by a class of functions \(\mathcal {C}\), it is natural to consider regularity conditions as the continuity of the mode with respect to the index function \(\varphi \) assumed in A3(i). Defined as an argmax and, furthermore, indexed by the class \(\mathcal {C}\), the conditional mode is sensitive to fluctuations. The differentiability of the conditional density \(g_\varphi \) with some kind of smoothness of its derivatives is needed to reach the rates of the convergence obtained in our results. All these conditions are summarised in the assumption A3. Hypothesis A5 is of Markov’s nature.

Remark 1

In order to check the condition (A4)(ii), let T be an index set and d a distance over T. Suppose that \(\mathcal{C}=\{ \varphi _u: u\in T\}\) is a class of functions defined on F, that are Lipschitz with respect to the index parameter u in the sense that for any \(z\in F\)

$$\begin{aligned} |\varphi _u(z)-\varphi _v(z)|\le d(u,v) \kappa (z), \end{aligned}$$

where \(\kappa \) is a function defined on F such that \(\int _F \kappa ^2(z) d\mathbb {P}(z)<\infty \). Let \(d_\mathcal{C}\) be the \(L_2\)-distance defined on \(\mathcal{C}\) by

$$\begin{aligned} d_\mathcal{C}(\varphi _u, \varphi _v) :=\left[ \int _F (\varphi _u(z)-\varphi _v(z))^2d\mathbb {P}(z) \right] ^{1/2}. \end{aligned}$$

Then, taking \(d(\cdot , \cdot )\) as the absolute distance, we have

$$\begin{aligned} d_\mathcal{C}(\varphi _u, \varphi _v) \le d(u,v) \left[ \int _F \kappa ^2 (z) d\mathbb {P}(z)\right] ^{1/2}:= c_0 |u-v|. \end{aligned}$$

Therefore, taking H the Epanechnikov kernel given by \(H(u)=\frac{3}{4}(1-u^2)\mathbbm {1}_{[-1,1]}(u)\) and using the condition A4(ii), we obtain

$$\begin{aligned} |H(t_1-\varphi _u(z))-H(t_2-\varphi _v(z))|\le & {} \frac{3}{2}(|t_1-t_2|+ c_0|u-v|). \end{aligned}$$

Before establishing the uniform convergence with rate, with respect to the class of functions \(\mathcal{C}\), of the conditional mode estimator, we introduce the following notation. For any \(\epsilon >0\), set

$$\begin{aligned} \mathcal{N}(\epsilon ,\mathcal{C},d_{\mathcal {C}})= & {} \min \{n:\ \text{ there } \text{ exist }\ c_1\cdots ,c_n \ \text{ in }\ \mathcal{C}\ \text{ such } \text{ that }\ \forall \ \varphi \in \mathcal{C} \\&\quad \quad \text{ there } \text{ exists }\ 1\le k\le n \ \text{ such } \text{ that }\ d_{\mathcal {C}}(\varphi ,c_k)<\epsilon \}. \end{aligned}$$

This number measures how full is the class \(\mathcal{C}\). Obviously, conditions upon the number \(\mathcal{N}(\epsilon ,\mathcal{C},d_{\mathcal {C}})\) have to be set to state uniform over the class \(\mathcal{C}\) results.

The following proposition establishes the uniform asymptotic behavior (with rate) of the conditional density estimator g(y | x) with respect to y and the function \(\varphi \in \mathcal{C}\). This proposition which is of interest by itself may be used, as an intermediate result, to state uniform results over the class \(\mathcal{C}\).

Proposition 1

Assume that the conditions A1A5 hold true and that

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{\log (n)}{n\phi (h_K)}=0. \end{aligned}$$
(3)

Furthermore, for a sequence of positive real numbers \(\lambda _n\) tending to zero, as \(n\rightarrow \infty \), and for \(\eta :=\eta _n\), suppose that

$$\begin{aligned} (i) \ \ \lim _{n\rightarrow \infty }\frac{\log \mathcal{N}(\eta ,\mathcal{C},d_{\mathcal {C}})}{\eta \lambda _n^2n h_H\phi (h_K)}=0 \ \ \text{ and }\ \ \ (ii) \ \ \sum _{n\ge 1}\exp \{-\lambda _n^2O(n h_H\phi (h_K))\}<\infty , \end{aligned}$$
(4)

we have

$$\begin{aligned} \sup _{\varphi \in \mathcal {C}}\sup _{y\in S_\varphi }|g_{\varphi ,n}(y|x)-g_{\varphi }(y|x)|= & {} O_{a.s}(h^\beta _K+{h_H}^\nu )+O_{a.s.}\left( \eta {h_H}^{-2}\right) +O_{a.s.}\left( \lambda _n\right) \\&\ \ +\,O_{a.s}\left( \left( \frac{\log n}{n\phi (h_K)}\right) ^{1/2}\right) . \end{aligned}$$

Comment In the statement (7), the deviation between \(g_{\varphi ,n}(\cdot |\cdot )\) and \(g_{\varphi }(\cdot |\cdot )\) is decomposed as to introduce the conditional bias \(B_{\varphi , n}(\cdot , \cdot )\) and the pseudo-variances \(R_{\varphi , n}(\cdot , \cdot )\) and \(Q_{\varphi , n}(\cdot , \cdot )\). The first element on the right hand side of the result of Proposition 1 corresponds to the rate of convergence of \(B_{\varphi , n}(\cdot , \cdot )\) while the three following ones give the convergence rate of \(Q_{\varphi , n}(\cdot , \cdot )\). Notice that the element \(R_{\varphi , n}(\cdot , \cdot )\) is negligible.

Our principal result considers the pointwise in \(x\in \mathcal{E}\) and uniform over the class \(\mathcal {C}\) convergence of the kernel estimate \(\hat{\varTheta }_{\varphi ,n}(x)\) of the conditional mode \(\varTheta _\varphi (x)\).

Theorem 1

Under the same conditions of Proposition 1, we have

$$\begin{aligned} \sup _{\varphi \in \mathcal {C}}|\hat{\varTheta }_{\varphi ,n}(x)-\varTheta _\varphi (x)|= & {} O_{a.s}(h^{\beta /2}_K+{h_H}^{\nu /2})+O_{a.s}\left( \lambda _n^\frac{1}{2}\right) + O_{a.s}\left( \eta ^{1/2} h_H^{-1}\right) \\&\quad +\,O_{a.s}\left( \left( \frac{\log n}{n\phi (h_K)}\right) ^{1/4}\right) . \end{aligned}$$

Remark 2

Replacing the condition (4)(i) by \(\displaystyle {\lim _{n\rightarrow \infty }\frac{\log \mathcal{N}(\eta ,\mathcal{C},d_{\mathcal {C}})}{\eta \lambda _n^2n h_H\phi (h_K)}=\delta }\), for some \(\delta >0\) small enough, with \(\displaystyle {\lambda _n=O\left( \left( \frac{\log n}{n h_H\phi (h_K)}\right) ^{\frac{1}{2}}\right) }\), the condition (4)(ii) is clearly satisfied inducing the uniform consistency, with respect to \(\varphi \in \mathcal {C}\), of \(\hat{\varTheta }_{\varphi ,n}(x)\) with the rate

$$\begin{aligned} \displaystyle {O_{\text{ a.s. }} \left( h^{\beta /2}_K+{h_H}^{\nu /2}\right) +O_{a.s}\left( \eta ^{1/2} h_H^{-1}\right) +O_{\text{ a.s. }}\left( \left( \frac{\log n}{n h_H \phi (h_K)}\right) ^{\frac{1}{4}}\right) } \end{aligned}$$

.

Remark 3

Note, whenever \(\eta =O({h_H}^{2+\nu })\) and \(\lambda _n^{-1}=O((n{h_H}^{5+2\nu }\phi (h_K))^{\frac{1}{2}})\), that the condition (4)(i) takes the form

$$\begin{aligned} \lim _{\eta \rightarrow 0}\eta \log \mathcal{N}(\eta ,\mathcal{C},d_{\mathcal {C}})=0. \end{aligned}$$
(5)

The condition (5) is very usual in defining Vapnik–Chervonenkis classes. Examples of classes fulfilling the condition (5) are given throughout the literature, see, for instance, Laïb and Louani (2011) and Vaart and Wellner (1996).

The main application of Theorem 1 is devoted to prediction of time series when considering the conditional mode estimates.

For \(n\in \mathbb {N}^\star \), let \(Z_i(t)\) and \(X_i(t)\), \(i=1, \dots , n\), be two functional random variables with \(t\in [0,T).\) For each curve \(X_i(t)\) (the covariate), we have a real response \(Y_{\varphi ,i} = \varphi (Z_i(t))\), a transformation of some functional variable \(Z_i(t)\). Given a new curve \(X_{n+1}=x_{\text{ new }}\), our purpose is to predict the corresponding response \(y_{\varphi ,\text{ new }} := \varTheta _\varphi (x_{\text{ new }})\) using as predictor the conditional mode, say \(\widehat{y}_{\varphi ,\text{ new }} := \widehat{\varTheta }_{\varphi , n}(x_{\text{ new }})\). The following Corollary based on Theorem 1 gives the asymptotic behavior with rate of the empirical error prediction.

Corollary 1

Assume that conditions of Theorem 1 hold. Then we have

$$\begin{aligned} \Big | \widehat{y}_{\varphi ,\text{ new }} - y_{\varphi ,\text{ new }}\Big | = O_{a.s}\left( h_K^{\beta /2}+h_H^{\nu /2}\right) +O_{a.s}\left( \lambda _n^\frac{1}{2}\right) +O_{a.s}\left( \left( \frac{\log n}{n\phi (h_K)}\right) ^{1/4}\right) . \end{aligned}$$

Proof

The proof of Corollary 1 is a direct consequence of Theorem 1.

Fig. 3
figure 3

Half hour electricity consumption in France from 07/01/2002 to 31/12/2005 generated according to the process \(\xi (t)\)

3 Application to real data

The data-set analyzed in this paper contains half hourly observations of a stochastic process \(\xi (t)\), \(t\in \mathbb {R}^+\). Here \(\xi (t)\) represents the electricity demand at time t in France. This process has been observed at each half hour from 01 January 2002 to 31 December 2005 (which corresponds to a total of 1461 days). Figure 3 shows the evolution of the process \(\xi (t)\) over time. One can easily see a high seasonality since the variation of the electricity consumption is due to the climatic conditions in France. In fact, the winter and autumn are rather cold, whereas the climate in summer and spring is relatively warm. This remark is confirmed by Fig. 4 which displays the half hourly electricity consumption in France in four selected weeks. We can clearly mark out the intra-daily periodical pattern and can note also the difference in terms of level of consumption from a one season to another. The repetitiveness of the daily shape is due some inertia in the demand that reflects the aggregated behavior of consumers. The evolution of the energy data described by the process \(\xi (t)\). In order now to construct our functional data Z(t) and to get its transformation \(\varphi (Z(t))\), we proceed by slicing the original process \(\xi (t)\) into segments of similar length. Since our target is a day-ahead short term forecasting, we divide the observed original time series (\(\xi (t)\)) of half hourly electricity consumption into \(n=1461\) segments (Z(t)) of length 48 which correspond to the functional observations. Each segment coincide with a specific daily load curve. Formally, let [0, T] be the time interval on which the process \(\xi (t)\) is observed. We divide this interval into subintervals of length 48, say \([\ell \times 48, (\ell +1)\times 48]\), \(\ell = 0,1, \dots , n-1\), with \(n=T/48 = 1461\). Denoting by \(Z_i(t)\) the functional-valued discrete-time stochastic process defined by

$$\begin{aligned} Z_i(t) = \xi (t+(i-1)\times 48); \quad \quad i=1,\dots , n, \quad \forall t\in [0,48). \end{aligned}$$
(6)

Figure 5 shows a randomly chosen sample of 20 realizations of the functional random variable \((Z(t))_{t\in [0,48)}\) which corresponds to a daily load curves.

Fig. 4
figure 4

Half hour electricity demand in four selected weeks (the panel contains 1 week data from January, April, August and October 2004)

Fig. 5
figure 5

A sample of 20 daily load curves randomly chosen generated according to the process Z(t)

Once the original time series is transformed into functional data-type, as explained in the introduction, we can start to deal with the short term forecasting of the daily peak demand and the electrical energy consumed over an interval using the conditional mode as a predictor.

3.1 Short-term daily peak load forecasting

Let us now consider the observed daily peak of the electricity consumption defined, for any day \(i=1, \dots , n\), as

$$\begin{aligned} \mathcal {P}_i = \sup _{t\in [0,48)} Z_i(t). \end{aligned}$$

The goal of this subsection is to forecast the peak \(\mathcal {P}_i\) on the basis of the load curve of the previous day, \(Z_{i-1}(t)\). Forecasting peak load demand is one of the most relevant issues in electricity companies. In fact, the electricity market is more and more open to competition and companies take care on the quality of their services in order to increase the number of their customers. On the other hand, because of the electrification of appliances (e.g. electric heating, air conditioning, ...) and mobility applications (e.g. electric vehicle, ...), the peak demand is increasing which can lead to a serious issue in the electric network. It is important, therefore, to produce very accurate short-term peak demand forecasts for the day-to-day operation, scheduling and load-shedding plans of power utilities. Forecasting peak load toke a lot of interest in the statistical literature. For instance Goia et al. (2010) used a functional linear regression model and Sigauke and Chikobvu (2010) a multivariate adaptive regression splines model. These methods are based on the regression function as a predictor. We propose here the mode regression as an alternative.

In this paper, we compare two predictors based on different choices of the covariable X(t). Since the peak electricity demand is highly correlated to the electricity consumption of the previous day and also to the temperature measures, we have then two possibilities to chose the covariable X(t):

  1. (a)

    \(X_i(t) = Z_{i-1}(t)\), the curve of electricity consumption of the previous day (Prev.Day) and

  2. (b)

    \(X_i(t) = \widehat{T}_i(t)\), the predicted half hourly temperature curve of the target day (Pred.Temp.).

To evaluate the proposed approach, we split the sample of \(n=1461\) days into:

  • learning sample, say \(\mathcal {L}=\{(Z_{i-1}(t), \mathcal {P}_{i}) \}_{i=2, \ldots , 1096}\), containing the first 1096 days (corresponding to the period from 01/01/2002 to 31/12/2004) and,

  • test sample, say \(\mathcal {T}= \{(Z_{i-1}(t), \mathcal {P}_{i}) \}_{1, \ldots , 365}\), with the last 365 days (corresponding to the period between 01/01/2005 and 31/12/2005).

Remark 4

When the functional covariate is fixed to be the predicted temperature curve, say \(\widehat{T}(t)\), the notations for the learning and test sample can be changed as follow: \(\mathcal {L}=\{(\widehat{T}_{i}(t), \mathcal {P}_{i}) \}_{i=1, \dots , 1096}\) and \(\mathcal {T}= \{(\widehat{T}_{i}(t), \mathcal {P}_{i}) \}_{1, \dots , 365}\).

The learning sample is used to build the proposed estimator given by (2) and to find the “optimal” smoothing parameter. To estimate the conditional mode, some tuning parameters should be fixed. For both the covariate and the response variable, the quadratic kernel function defined by \(K(u) = H(u):= 1.5(1-u^2)\mathbbm {1}_{[0,1]}\) is considered. It is proved in the nonparametric estimation literature that the choice of the kernel does not significantly impact the accuracy of estimation.

Another important tuning parameter, that ensures a good behavior of the functional nonparametric estimation, is the semi-metric \(d(\cdot , \cdot )\). Several possible choices of semi-metric have been discussed in Ferraty and Vieu (2006), p. 28. Usually, the choice of the semi-metric is motivated by the shape of the curves. Here, it is clear that the load curves of the previous day as well as the temperature curves are smooth. Consequently, the \(L_2\)-distance between the second derivative of the curves seems to be the appropriate choice of the semi-metric \(d(\cdot , \cdot )\).

In contrast to the kernel, an optimal choice of the smoothing parameters \(h_K\) and \(h_H\) is crucial. Here, we adopt the local cross-validation method on the \(\kappa \)-nearest neighbors introduced in Ferraty and Vieu (2006), p. 116. Explicitly, for each curve \(X_i\) in the test sample,

$$\begin{aligned} \widehat{\varTheta }_{n}(h_K, h_H, X_i) = \arg \min _{y} g_{n,h_K^{i,\star }, h_H^{i,\star }}(y|X_i^\star ), \end{aligned}$$

where \(X_i^\star = \arg \min _{X_j\in \hbox {learning sample}} d(X_i, X_j)\) and

$$\begin{aligned} \left( h_K^{i,\star }, h_H^{i,\star }\right) = \arg \min _{(h_K, h_H)\in H_n(X_i^\star ,y)}|Y_i^\star - \widehat{\varTheta }_{n}(h_K, h_H, X_i^\star )|, \end{aligned}$$

where \(H_n(x,y)\) is the set of pairs \((h_K(x), h_H(y))\) such that, for \(h_K(x)\) (respectively, for \(h_H(y)\)) the ball centred at x (respectively, the interval centred at y) with radius \(h_K(x)\) (respectively, with radius \(h_H(y)\)) contains exactly \(\kappa \) neighbors of x (respectively, of y). Here, the Ferraty and Vieu’s R-routine called funopare.mode.lcv Footnote 1 is used to compute the conditional mode.

Fig. 6
figure 6

Results for 1 year day-ahead forecasting of the daily peak electricity demand using as predictor the mode regression and as functional covariate: a the curve of electricity consumption of the previous day (Prev.Day), b the predicted half hourly temperature curve of the target day (Pred.Temp.)

Table 1 Distribution (by month) of the RAE of the peak load obtained by using as predictor the conditional mode and as covariate the previous day Prev.Day and predicted temperature Pred.Temp
Fig. 7
figure 7

Examples of daily peak load forecast using conditional mode, conditional median and regression function for eight consecutive days

The test sample will be used to compare our forecasts to the observed daily peak electricity demand for the year 2005. Figure 6a (resp. (b)) displays the observed and the predicted values of the daily peak electricity demand using as covariable the load curve of the previous day (resp. the predicted temperature curve). Since cross-points, \((\widehat{\mathcal {P}}_i,\mathcal {P}_i)_{i=1, \dots , 365}\), represented in Fig. 6a are more concentrated on the diagonal line than those in Fig. 6b, one can deduce that the first approach provides better results than the second one. Moreover, Table 1 provides a numerical summary of the RAE obtained by using as covariate the predicted temperature curve or the last observed daily load curve. One can observe that monthly errors obtained by the second approach are usually less than those given by the first one. Therefore, one can conclude that the peak electricity demand might be better modelized by the previous daily load curve rather than by the predicted temperature curve.

Following the above analysis, the last observed daily load curve will be considered as the suitable covariate to forecast the peak demand in the remainder of this section. Our goal, now, consists in comparing the conditional mode predictor to the conditional median and the regression function (conditional mean) [see Ferraty and Vieu (2006) for more details about the properties of these last two predictors]. Let us mention here that for the nonparametric estimation of the regression operator (respectively, the conditional median), the same semi-metric and the optimal bandwidth, \(h_K\) (respectively, \(h_K\) and \(h_H\)) are chosen using the same arguments as for the conditional mode estimation (see details above). Moreover, to estimate the regression operator, only the quadratic kernel function \(K(\cdot )\) is used. For the conditional median model, notice that the quadratic kernel is used to smooth the covariate variable X while the distribution function \(\int _{-\infty }^x \frac{3}{4}(1-t^2) \mathbbm {1}_{[-1,1]}(t) dt\) is considered to smooth the response Y. The computation devoted to the regression operator (respectively, the conditional median) is performed through the R-routine funopare.knn.lcv Footnote 2 (respectively, funopare.quantile.lcv).

For a deeper analysis and evaluation of the accuracy of the proposed approach, we use the Relative Absolute Errors (RAE) and the monthly Mean Absolute Prediction Error (\(\texttt {MAPE}_m\)) as validation criteria. They are defined in the test sample, for any day \(i = 1, \dots , 365\), respectively by

$$\begin{aligned} \texttt {RAE}_i = \frac{|\mathcal {P}_i - \widehat{\mathcal {P}}_i|}{\mathcal {P}_i}\quad \text{ and } \quad \texttt {MAPE}_m = \frac{1}{N_m}\sum _{i=1}^{N_m} \texttt {RAE}_i, \end{aligned}$$

where \(N_m\) is a number of days for a given month \(m\in \{1, \dots , 12\}\) and \(\widehat{\mathcal {P}}_i\) is the predicted value of the daily peak obtained by the conditional mode, conditional median or regression function.

Figure 8 shows examples of peak load forecasts for eight consecutive days. One can see that conditional mode provides more accurate predictions than the two other methods. In Fig. 7, the 365 forecasted daily peak load are plotted against the observed ones. Clearly, one can observe that conditional mode performs well the forecasts while conditional median and regression function under-predict peaks in the cold season and over-predict it in hot season.

Figure 9 provides the distribution of the daily RAE for each month in 2005, obtained using the three prediction methods. One can observe that the conditional mode-based approach is much more efficient in winter as well as in summer than the other methods. Accurate forecasts in winter are of particular interest since the electricity demand might exceed the supply capacity in this period. Therefore, an efficient energy management in the electrical grid is highly required.

A numerical summary of Fig. 9 is detailed in Table 2 where the monthly \(\texttt {MAPE}_m\), the first quartile \(Q_{0.25}\), the median \(Q_{0.5}\) and the third quartile \(Q_{0.75}\), of the RAE, are provided for the three used methods. One can see that the conditional mode approach performs better the forecasts almost over all the year.

Fig. 8
figure 8

Observed daily peak load versus the predicted one obtained by the three forecast methods

Fig. 9
figure 9

Distribution (by month) of the daily RAE of the peak load

Table 2 Distribution (by month) of the RAE of the peak load obtained by using as covariate the previous day Prev.Day and as prediction methods the conditional mode, conditional median and regression function respectively

3.2 Electrical energy consumption forecasting for battery storage management

The electrical grid in most of the developed countries is expected to be under a large amount of strain in the future due to changes in demand behavior, the electrification of transport, heating and an increased penetration of distributed generation. The current electrical grids infrastructure may not be able to endure these changes and storage of energy produced by solar and wind power generations. One of the most used approaches to solve this technical issue consists in the storage in batteries of the energy coming from the traditional energy plants (e.g. nuclear, hydraulic, ...) and from renewable energy resources (e.g. solar and wind) during the day and then use it at the evening and especially over the 3 h around the peak (around 7 p.m. in winter and 2 p.m. in summer). Therefore, an accurate forecast of the energy that will be consumed in the evening allows to optimize the capacity of the storage and consequently to increase the batteries life.

In this subsection, we suggest to solve this forecasting issue by using the mode regression (Fig.  10). Regarding to the discussion made in the previous subsection, we use as covariate the load curve of the previous day. Formally, if we consider \(Z_i(t)\) the load curve of some day i, then the electrical energy consumed between \(t_1=17{:}30\) and \(t_2=20{:}30\) (respectively \(t_1=12{:}30\) and \(t_2=15{:}30\)) in winter (respectively, in summer) is defined as \(\mathcal {E}_i = \int _{t_1}^{t_2} Z_i(t) dt\) and measured in Giga-Watt per Hour (GWH). Therefore, here \(\varphi (\cdot )\) is the integral function. As in the previous subsection, the same data sets and the same evaluation procedures used here. As for the peak load forecasting (see details in Sect. 3.1), the same choices for the tuning parameters (K, H, \(d(\cdot , \cdot \)), \(h_K\) and \(h_H\)) are used. As mentioned before, the functional covariable is supposed to be the last observed daily load curve. Figures 11 and 12 show that conditional mode approach performs well energy forecasts and that conditional median, as well as, regression function, under-predict energy for cold days and over-predict it in hot ones. Figure 13 provides the distribution by month of the daily RAE. We can observe that accurate results are obtained with the conditional mode predictor. Numerical details, namely monthly MAPE, first quartile \(Q_{0.25}\), the median \(Q_{0.5}\) and the third quartile \(Q_{0.75}\), of the obtained errors are given in Table 3. We can see again that conditional mode performs better the forecasts of the consumed energy.

Fig. 10
figure 10

Observed daily consumed energy in 2005 versus its predicted values

Fig. 11
figure 11

Examples of energy demand forecast using conditional mode, conditional median and regression function for eight consecutive days

Fig. 12
figure 12

Observed daily consumed energy in 2005 versus its predicted values

Fig. 13
figure 13

Distribution (by month) of the daily RAE of the consumed energy

Table 3 Distribution (by month) of the relative absolute errors of the energy consumed obtained by using as covariate the previous day Prev.Day and as prediction methods the conditional mode, conditional median and regression function respectively

4 Proofs

In order to prove our results, we introduce some further notation. Let

$$\begin{aligned} \bar{f}_{\varphi ,n}(x,\ y)= & {} \frac{1}{nh_H\mathbb {E}[\varDelta _1(x)]}\sum _{i=1}^{n} \mathbb {E}\left[\varDelta _i(x)H\left( \frac{y-\varphi (Z_i)}{h_H}\right) \mid \mathcal {F}_{i-1}\right]. \end{aligned}$$

and

$$\begin{aligned} \bar{l}_{n}(x)=\frac{1}{n\mathbb {E}(\varDelta _1(x))}\sum _{i=1}^n\mathbb {E}\left( \varDelta _i(x) | \mathcal{F}_{i-1}\right) . \end{aligned}$$

Define the conditional bias of the conditional density estimate of \(\varphi (Z_i)\) given \(X=x\) as

$$\begin{aligned} B_{\varphi ,n}(x,y)=\frac{\bar{f}_{\varphi ,n}(x,y)}{\bar{l}_{n}(x)}-g_{\varphi }(y|x). \end{aligned}$$

Consider now the following quantities

$$\begin{aligned} R_{\varphi ,n}(x,y)=-B_{\varphi ,n}(x,y)({l}_{n}(x)-\bar{l}_{n}(x)), \end{aligned}$$

and

$$\begin{aligned} Q_{\varphi ,n}(x,y)=({f}_{\varphi ,n}(x,y)-\bar{f}_{\varphi ,n}(x,y))-g_{\varphi }(y|x)({l}_{n}(x)-\bar{l}_{n}(x)). \end{aligned}$$

It is then clear that the following decomposition holds

$$\begin{aligned} g_{\varphi ,n}(y|x)-g_{\varphi }(y|x)=B_{\varphi ,n}(x,y)+\frac{R_{\varphi ,n}(x,y)+Q_{\varphi ,n}(x,y)}{{l}_{n}(x)}. \end{aligned}$$
(7)

The proofs of our results need the following lemmas as tools for which details of their proofs may be found in Laïb and Louani (2011).

Lemma 1

Let \((X_n)_{n\ge 1}\) be a sequence of martingale differences with respect to the sequence of \(\sigma \)-fields \((\mathcal{F}_n=\sigma (X_1,\cdots ,X_n))_{n\ge 1}\), where \(\sigma (X_1,\cdots ,X_n)\) is the \(\sigma \)-field generated by the random variables \(X_1,\cdots ,X_n\). Set \(S_n=\sum _{i=1}^nX_i\). Suppose that the random variables \((X_i)_{i\ge 1}\) are bounded by a constant \(M>0\), i.e., for any \(i\ge 1,\ \ |X_i|\le M\) almost surely, and \(\mathbb {E}(X_i^2|\mathcal{F}_{i-1})\le d_i^2\) almost surely. Then we have, for any \(\lambda >0\), that

$$\begin{aligned} \mathbb {P}(|S_n|>\lambda )\le 2\exp \left\{ -\frac{\lambda ^2}{4D_n+2M\lambda }\right\} , \end{aligned}$$

where \(D_n=\sum _{i=1}^nd_i^2.\)

Lemma 2

Assume that conditions A1 ((i), (ii), (iv)) and A2 hold true. For \(1\le j\le 2+\delta \) for some \(\delta >0\), we have

$$\begin{aligned}&(i)\,\,\,\, \frac{1}{\phi (h_K)}\mathbb {E}[\varDelta ^{j}_{i}(x)|\mathcal {F}_{i-1}]=M_{j}f_{i,1}(x)+O_{a.s}\left( \frac{\psi _{i,x}(h_K)}{\phi (h_K)}\right) ,\\&(ii)\,\,\,\, \frac{1}{\phi (h_K)}\mathbb {E}[\varDelta ^{j}_{1}(x)]=M_{j}f_{1}(x)+o(1),\\ \end{aligned}$$

where \(\displaystyle {M_j=K^j(1)-\int _{0}^{1}(K^j)^{\prime }\tau _0(u)du}\).

Proof of Proposition 1

Considering the decomposition (7), the proof follows from Lemmas 3, 4, 5 and 6 given hereafter, establishing respectively the convergence of \(l_{n}(x)\) to 1 together with the rate convergence of \(l_n(x)-\bar{l}_n(x)\) to zero and the orders of terms \(B_{\varphi ,n}(x,y)\), \(R_{\varphi ,n}(x,y)\) and \(Q_{\varphi ,n}(x,y)\). Note that, due to the condition (3), the term \(R_{\varphi ,n}(x,y)\) is negligible as compared to the term \(B_{\varphi ,n}(x,y)\). \(\square \)

Lemma 3

Under assumptions A1 and A2, we have

  1. (i)

    \(l_n(x)-\bar{l}_n(x)=O_{a.s}\left( \sqrt{\frac{\log (n)}{n\phi (h_K)}}\right) ,\)

  2. (ii)

    \(\lim _{n\rightarrow \infty }l_n(x)=\lim _{n\rightarrow \infty }\bar{l}_n(x)=1, \quad a.s.\)

Proof of Lemma 3

The results follow by making use of Lemma 1 and Lemma 2 in Laïb and Louani (2011). Details of the proof may be found in Laïb and Louani (2010). \(\square \)

Lemma 4

Under assumptions A1, A2, A3(iv), A4(i) and A5, we have, as \( n\rightarrow \infty \),

$$\begin{aligned} \sup _{\varphi \in \mathcal {C}}\sup _{y\in S_{\varphi }}(\bar{f}_{\varphi ,n}(x,\ y)-\bar{l}_n(x)g_{\varphi }(y|x))=O_{a.s.}(h_K^{\beta }+{h_H}^{\nu }). \end{aligned}$$

Proof of Lemma 4

By condition A5 with \(j=0\), we have

$$\begin{aligned} \bar{f}_{\varphi ,n}(x,\ y)= & {} \frac{1}{nh_H\mathbb {E}[\varDelta _1(x)]}\sum _{i=1}^{n} \mathbb {E}\left[\varDelta _i(x)\mathbb {E}\left( H\left( \frac{y-\varphi (Z_i)}{h_H}\right) \mid X_i\right) |\mathcal {F}_{i-1}\right]. \end{aligned}$$

A change of variables and the fact that \(\displaystyle {\int _{\mathbb {R}}H(t)dt}=1\) allow us to write

$$\begin{aligned} \mathbb {E}\left[H\left( \frac{y-\varphi (Z_i)}{h_H}\right) \mid \ X_i\right]= & {} h_H\int _{\mathbb {R}}H(t)[g_{\varphi }(y-t h_H|X_i)-g_{\varphi }(y|x)]dt+h_H g_{\varphi }(y|x) \\=: & {} J_{i,\varphi }+h_H g_{\varphi }(y | x). \end{aligned}$$

Thus,

$$\begin{aligned} \bar{f}_{\varphi ,n}(x,y)= & {} \frac{g_{\varphi }(y|x)}{n\mathbb {E}[\varDelta _1(x)]}\sum _{i=1}^{n} \mathbb {E}\left[\varDelta _i(x)|\mathcal {F}_{i-1}\right]\\&+\frac{1}{nh_H \mathbb {E}[\varDelta _1(x)]}\sum _{i=1}^{n} \mathbb {E}\left[\varDelta _i(x)J_{i,\varphi }|\mathcal {F}_{i-1}\right]\\=: & {} \bar{l}_{n}(x)g_{\varphi }(y|x)+S_{2}. \end{aligned}$$

Using condition A3(iv), one may write

$$\begin{aligned} |S_{2}|\le & {} \frac{1}{n h_H\mathbb {E}[\varDelta _1(x)]}\sum _{i=1}^{n} \mathbb {E}\left[\varDelta _i(x)|J_{i,\varphi }||\mathcal {F}_{i-1}\right]\\\le & {} \frac{1}{n\mathbb {E}[\varDelta _1(x)]}\sum _{i=1}^{n} \mathbb {E}\left[\varDelta _i(x)\left\{ \int _{\mathbb {R}} H(t) \left( |t|^{\nu }{h_H}^{\nu }+(d(x,\ X_i))^{\beta }\right) dt\right\} |\mathcal {F}_{i-1}\right]\\\le & {} C_x\left\{ {h_H}^{\nu } \int _{\mathbb {R}}|t|^{\nu } H(t)dt+h_K^{\beta }\right\} \frac{1}{n\mathbb {E}[\varDelta _1(x)]}\sum _{i=1}^{n} \mathbb {E}\left[\varDelta _i(x)|\mathcal {F}_{i-1}\right]. \end{aligned}$$

Moreover, considering Lemma 2 in Laïb and Louani (2011) combined with the condition A4(i) imply that

$$\begin{aligned} \bar{f}_{\varphi ,n}(x,\ y)-\bar{l}_n(x)g_{\varphi }(y|x)=O_{a.s.}\left( h_K^{\beta }+{h_H}^{\nu }\right) , \end{aligned}$$

where \(O_{a.s.}\) does not depend on \(\varphi \in \mathcal {C}\). \(\square \)

The following Lemma describes the asymptotic behavior of the conditional bias term \(B_{\varphi ,n}(x,y)\) as well as that of \(R_{\varphi ,n}(x,y)\) and \(Q_{\varphi ,n}(x,y)\).

Lemma 5

Under conditions A1, A2, A3(ii), A4(i) and A5, we have

$$\begin{aligned}&\sup _{\varphi \in \mathcal {C}}\sup _{y\in S_\varphi }|B_{\varphi ,n}(x,y)|=O_{\text{ a.s. }}(h_K^\beta + {h_H}^\nu ), \end{aligned}$$
(8)
$$\begin{aligned}&\sup _{\varphi \in \mathcal {C}}\sup _{y\in S_\varphi }|R_{\varphi ,n}(x,y)|=O_{\text{ a.s. }}\left( (h_K^\beta + {h_H}^\nu )\left( \sqrt{\frac{\log (n)}{n\phi (h_K)}}\right) \right) . \end{aligned}$$
(9)

Moreover, when hypotheses (3)–(4) are satisfied, we have

$$\begin{aligned} \sup _{\varphi \in \mathcal {C}}\sup _{y\in S_\varphi }|Q_{\varphi ,n}(x,y)|=O_{a.s.}\left( \eta {h_H}^{-2}\right) +O_{a.s.}\left( \lambda _n\right) +O_{a.s}\left( \left( \frac{\log n}{n\phi (h_K)}\right) ^{1/2}\right) . \end{aligned}$$
(10)

Proof of Lemma 5

Observe that

$$\begin{aligned} B_{\varphi ,n}(x,y)=\frac{\overline{f}_{\varphi ,n}(x,y)-g_{\varphi }(y|x)\overline{l}_{n}(x)}{\overline{l}_{n}(x)}:=\frac{\tilde{B}_{\varphi ,n}(x,y)}{\overline{l}_{n}(x)}. \end{aligned}$$

Making use of Lemma 4, we obtain \(\sup _{\varphi \in \mathcal {C}}\sup _{y\in S_\varphi }|\tilde{B}_{\varphi ,n}(x,y)|=O_{\text{ a.s. }}(h_K^{\beta }+{h_H}^\nu ).\) The statement (8) follows then from the second part of Lemma 3.

To deal now with the quantity \(R_{\varphi ,n}(x,y)\), write it as

$$\begin{aligned} R_{\varphi ,n}(x,y)=-\frac{\tilde{B}_{\varphi ,n}(x,y)}{\overline{l}_{n}(x)}(l_n(x)-\bar{l}_n(x)). \end{aligned}$$

Therefore, the statement (9) follows from the statement (8) combined with Lemma 3 (i).

In order to check the result (10), recall that

$$\begin{aligned} Q_{\varphi ,n}(x,y)=({f}_{\varphi ,n}(x,y)-\bar{f}_{\varphi ,n}(x,y))-g_{\varphi }(y|x)({l}_{n}(x)-\bar{l}_{n}(x)). \end{aligned}$$

Therefore the statement (10) results from Lemma 3 and the use of Lemma 6 established hereafter. This completes the proof of Lemma 5. \(\square \)

The following Lemma is needed as a step in proving Theorem 1

Lemma 6

Under assumptions A1, A2, A3, A4(ii), A5 together with hypotheses (3)–(4), for n large enough, we have

$$\begin{aligned} \sup _{\varphi \in \mathcal {C}}\sup _{y\in S_\varphi }|f_{\varphi ,n}(x,\ y)-\bar{f}_{\varphi ,n}(x,\ y)|=O_{a.s.}\left( \eta {h_H}^{-2}\right) +O_{a.s.}\left( \lambda _n\right) . \end{aligned}$$

Proof of Lemma 6

Recall that, for any \(\varphi \in \mathcal{C}\),

$$\begin{aligned} S_{\varphi }=[ \theta _{\varphi }(x)-\xi , \ \ \theta _{\varphi }(x)+\xi ]. \end{aligned}$$

Let \(\varphi _1, \varphi _2\in \mathcal {C}\) and, for any \(\epsilon >0\), define the set

$$\begin{aligned} S_{\varphi _2}^\epsilon =\{ y\in \mathbb {R}: |y-\varTheta _{\varphi _2}(x)|\le \xi +\epsilon \}. \end{aligned}$$

It is easily seen, by condition A3(i), that, for any \(\epsilon >0\), there exists \(\eta >0\) for which the fact that \(\varphi _1\in B(\varphi _2, \eta )\) implies \(S_{\varphi _1}\subset S_{\varphi _2}^\epsilon \). Consider a grid \((\varphi _j)_{1\le j\le \mathcal{N}(\eta ,\mathcal{C},d_\mathcal{C})}\) on the space \(\mathcal {C}\) such that \(d_\mathcal{{C}}(\varphi _i,\varphi _j)\) for any \(i\ne j\). Therefore, we have

$$\begin{aligned}&\sup _{\varphi \in \mathcal {C}}\sup _{y\in S_\varphi }|f_{\varphi ,n}(x,\ y)-\bar{f}_{\varphi ,n}(x,\ y)|\nonumber \\&\quad \le \max _{1\le j\le \mathcal {N}(\eta ,\mathcal {C},d_{\mathcal {C}})}\sup _{\varphi \in B(\varphi _j,\eta )}\sup _{y\in S_{\varphi }}|f_{\varphi ,n}(x,\ y)-\bar{f}_{\varphi ,n}(x,\ y)| \nonumber \\&\quad \le \max _{1\le j\le \mathcal {N}(\eta ,\mathcal {C},d_{\mathcal {C}})}\sup _{\varphi \in B(\varphi _j,\eta )}\sup _{y\in S^\epsilon _{\varphi _j}}|f_{\varphi ,n}(x,\ y)-\bar{f}_{\varphi ,n}(x,\ y)|. \end{aligned}$$
(11)

Using now the compactness of \(S^\epsilon _{\varphi _j}\) and the fact that its length is \(2(\xi +\epsilon )\) for any \(\varphi _j\), we can write \(S^\epsilon _{\varphi _j}\subset \cup _{k=1}^{d_{\epsilon ,n}}S^\epsilon _{\varphi _j,k}\) where \(S^\epsilon _{\varphi _j,k}=(t^\epsilon _{\varphi _j,k}-m_n;\ t^\epsilon _{\varphi _j,k}+m_n)\) and \(m_n\) and \(d_{\epsilon , n}\) are such that \( d_{\epsilon ,n}=C_{\epsilon }m_n^{-1}\) for some positive constant \(C_{\epsilon }\). Moreover, we have

$$\begin{aligned} \sup _{y\in S^\epsilon _{\varphi _j}}|f_{\varphi ,n}(x,\ y)-\bar{f}_{\varphi ,n}(x,\ y)|\le & {} \max _{1\le k\le d_{\epsilon , n}}\sup _{y\in S^\epsilon _{\varphi _j,k}}| f_{\varphi ,n}(x,y)-f_{\varphi ,n}(x,t^\epsilon _{\varphi _j,k} )|\nonumber \\&+\max _{1\le k\le d_{\epsilon , n}}| f_{\varphi ,n}(x,t^\epsilon _{\varphi _j,k})-\bar{f}_{\varphi ,n}(x,t^\epsilon _{\varphi _j,k})| \nonumber \\&+ \max _{1\le k\le d_{\epsilon , n}}\sup _{y\in S^\epsilon _{\varphi _j,k}}| \bar{f}_{\varphi ,n}(x,t^\epsilon _{\varphi _j,k})-\bar{f}_{\varphi ,n}(x,y)|\nonumber \\=: & {} J_{\varphi ,n,1}+J_{\varphi ,n,2}+J_{\varphi ,n,3}. \end{aligned}$$
(12)

Making use of A4(ii), we obtain

$$\begin{aligned} J_{\varphi ,n,1}\le & {} \frac{1}{n h_H\mathbb {E}[\varDelta _1(x)]}\max _{1\le k\le d_{\epsilon , n}}\nonumber \\&\sup _{y\in S^\epsilon _{\varphi _j,k}}\sum _{i=1}^{n} \varDelta _i(x)\left| H\left( \frac{y-\varphi (Z_i)}{h_H}\right) -H\left( \frac{t^\epsilon _{\varphi _j,k}- \varphi (Z_i)}{h_H}\right) \right| \nonumber \\\le & {} C_Hm_n{h_H}^{-2}l_n(x). \end{aligned}$$
(13)

Similarly, we have also

$$\begin{aligned} J_{\varphi ,n,3}\le & {} C_Hm_n {h_H}^{-2}\bar{l}_n(x). \end{aligned}$$
(14)

Therefore,

$$\begin{aligned} \max _{1\le j\le \mathcal {N}(\eta ,\mathcal {C},d_{\mathcal {C}})}\sup _{\varphi \in B_{(\varphi _j,\eta )}}\left( J_{\varphi ,n,1}+J_{\varphi ,n,3}\right) \le C_Hm_n{h_H}^{-2}\left( l_n(x)+\bar{l}_n(x)\right) . \end{aligned}$$
(15)

Using Lemma 3(ii), it follows that

$$\begin{aligned} \max _{1\le j\le \mathcal {N}(\eta ,\mathcal {C},d_{\mathcal {C}})}\sup _{\varphi \in B(\varphi _j,\eta )}\left( J_{\varphi ,n,1}+J_{\varphi ,n,3}\right)= & {} O_{a.s.}\left( m_n{h_H}^{-2}\right) . \end{aligned}$$
(16)

To identify the convergence rate to zero of the term \(\displaystyle {\max _{1\le j\le \mathcal {N}(\eta ,\mathcal {C},d_{\mathcal {C}})}\sup _{\varphi \in B_{(\varphi _j,\eta )}}J_{\varphi ,n,2}}\), observe that

$$\begin{aligned}&\max _{1\le j\le \mathcal {N}(\eta ,\mathcal {C},d_{\mathcal {C}})}\sup _{\varphi \in B(\varphi _j,\eta )}J_{\varphi ,n,2}\nonumber \\&\quad \le \max _{1\le j\le \mathcal {N}(\eta ,\mathcal {C},d_{\mathcal {C}})}\sup _{\varphi \in B(\varphi _j,\eta )} \max _{1\le k\le d_{\epsilon , n}}| f_{\varphi ,n}(x,t^\epsilon _{\varphi _j,k})-f_{\varphi _j,n}(x,t^\epsilon _{\varphi _j,k})|\nonumber \\&\qquad +\,\max _{1\le j\le \mathcal {N}(\eta ,\mathcal {C},d_{\mathcal {C}})} \max _{1\le k\le d_{\epsilon , n}}| f_{\varphi _j,n}(x,t^\epsilon _{\varphi _j,k})-\bar{f}_{\varphi _j,n}(x,t^\epsilon _{\varphi _j,k})| \nonumber \\&\qquad + \,\max _{1\le j\le \mathcal {N}(\eta ,\mathcal {C},d_{\mathcal {C}})}\sup _{\varphi \in B(\varphi _j,\eta )} \max _{1\le k\le d_{\epsilon , n}}| \bar{f}_{\varphi _j,n}(x,t^\epsilon _{\varphi _j,k})-\bar{f}_{\varphi ,n}(x,t^\epsilon _{\varphi _j,k})|\nonumber \\&\quad =: J_{n,1}+J_{n,2}+J_{n,3}. \end{aligned}$$
(17)

By the same arguments as in the statement (15), we can show, under Condition A4(ii), that

$$\begin{aligned} \left( J_{n,1}+J_{n,3}\right) \le C_H\eta {h_H}^{-2}\left( l_n(x)+\bar{l}_n(x)\right) =O_{a.s.}\left( \eta {h_H}^{-2}\right) . \end{aligned}$$
(18)

We have to deal now with the middle term \(J_{n,2}\). Observe that

$$\begin{aligned} \mathbb {P}\left( J_{n,2}>\lambda \right)= & {} \mathbb {P}\left( \max _{1\le j\le \mathcal {N}(\eta ,\mathcal {C},d_{\mathcal {C}})} \max _{1\le k\le d_{\epsilon , n}}| f_{\varphi _j,n}(x,t^\epsilon _{\varphi _j,k})-\bar{f}_{\varphi _j,n}(x,t^\epsilon _{\varphi _j,k})|>\lambda \right) \nonumber \\\le & {} \sum _{j=1}^{\mathcal {N}(\eta ,\mathcal {C},d_{\mathcal {C}})} \sum _{k=1}^{d_{\epsilon , n}}\mathbb {P}\left( \frac{1}{nh_H}\left| \sum _{i=1}^nL_{i,\varphi _j}(x, t^\epsilon _{\varphi _j,k})\right| \ge \lambda \right) , \end{aligned}$$

where \(L_{i,\varphi _j}(x,y)=\frac{1}{\mathbb {E}[\varDelta _1(x)]}\left[\varDelta _i(x)H(\frac{y-\varphi _j(Z_i)}{h_H})-\mathbb {E}\left[ \varDelta _i(x)H(\frac{y-\varphi _j(Z_i)}{h_H}) |\ \mathcal{F}_{i-1}\right] \right]\). Notice that \(L_{i,\varphi _j}(x,y)\) is a martingale difference bounded by the quantity \(\displaystyle M:=\frac{2\bar{K}\overline{H}}{\phi (h_K)[M_1f_1(x)+o(1)]}.\) In fact, since the kernel K and the function H are bounded, it follows easily in view of Lemma 2 (ii) in Laïb and Louani (2011) that

$$\begin{aligned} |L_{i,\varphi _j}(x, y)|\le \frac{2\bar{K}\overline{H}}{\mathbb {E}[\varDelta _1(x)]}=\frac{2\bar{K}\overline{H}}{\phi (h_K)[M_1f_1(x)+o(1)]}, \end{aligned}$$

where \({\overline{H}}:=\sup _{y\in \mathbb {R}}H(y)\) and \({\overline{K}}:=\sup _{y\in \mathbb {R}}K(y)\). Observe now that

$$\begin{aligned}&\mathbb {E}[(L_{i,\varphi _j}(x,t^\epsilon _{\varphi _j,k}))^{2}|\mathcal {F}_{i-1}]\\&\quad \le \frac{1}{(\mathbb {E}[\varDelta _1(x)])^2}\mathbb {E}\left[\left( \varDelta _i(x)H\left( \frac{t^\epsilon _{\varphi _j,k}-\varphi _j(Z_i)}{h_H}\right) \right) ^{2} \mid \mathcal {F}_{i-1}\right]. \end{aligned}$$

Therefore, by condition A5, we have

$$\begin{aligned}&\mathbb {E}\left[\left( \varDelta _i(x)H\left( \frac{t^\epsilon _{\varphi _j,k}-\varphi _j(Z_i)}{h_H}\right) \right) ^{2} \mid \mathcal {F}_{i-1}\right]\\&\quad =\mathbb {E}\left[\left( \varDelta _i(x)\right) ^2 \mathbb {E}\left[\left( H\left( \frac{t^\epsilon _{\varphi _j,k}-\varphi _j(Z_i)}{h_H}\right) \right) ^{2}|\mathcal {G}_{i-1}\right]\mid \mathcal {F}_{i-1}\right]\\&\quad =\mathbb {E}\left[\left( \varDelta _i(x)\right) ^2 \mathbb {E}\left[\left( H\left( \frac{t^\epsilon _{\varphi _j,k}-\varphi _j(Z_i)}{h_H}\right) \right) ^{2}|X_i\right]\mid \mathcal {F}_{i-1}\right]\\&\quad =\mathbb {E}\left[\left( \varDelta _i(x)\right) ^2 \int _{\mathbb {R}}\left( H\left( \frac{u}{h_H}\right) \right) ^2 g_{\varphi _j}(t^\epsilon _{\varphi _j,k}-u|X_i)du\mid \mathcal {F}_{i-1}\right]\\&\quad :=\mathbb {E}\left[\left( \varDelta _i(x)\right) ^2 \mathcal {T}_{i,1}\mid \mathcal {F}_{i-1}\right]+\mathcal {T}_2\mathbb {E}\left[\left( \varDelta _i(x)\right) ^2 \mid \mathcal {F}_{i-1}\right], \end{aligned}$$

where we have set \(\mathcal{T}_{i,1}=\int _{\mathbb {R}}\left( H\left( \frac{u}{h_H}\right) \right) ^2 \left( g_{\varphi _j}(t^\epsilon _{\varphi _j,k}-u|X_i)-g_{\varphi _j}(t^\epsilon _{\varphi _j,k}|x)\right) du\) and \(\mathcal{T}_{2}=\int _{\mathbb {R}}\left( H\left( \frac{u}{h_H}\right) \right) ^2 g_{\varphi _j}(t^\epsilon _{\varphi _j,k}|x)du. \) Subsequently, for \(\eta >0\), we have

$$\begin{aligned} \mathcal {T}_{i,1}\le & {} \int _{|u|\le \eta }\left( H\left( \frac{u}{h_H}\right) \right) ^2 \left| g_{\varphi _j}(t^\epsilon _{\varphi _j,k}-u|X_i)-g_{\varphi _j}(t^\epsilon _{\varphi _j,k}|x)\right| du\\&+\,\int _{|u|>\eta }\left( H\left( \frac{u}{h_H}\right) \right) ^2 \left| g_{\varphi _j}(t^\epsilon _{\varphi _j,k}-u|X_i)-g_{\varphi _j} (t^\epsilon _{\varphi _j,k}|x)\right| du\\\le & {} h_H\sup _{|u|\le \eta }|g_{\varphi }(t^\epsilon _{\varphi _j,k}-u|X_i)-g_{\varphi _j}( t^\epsilon _{\varphi _j,k}|x)|\int _{|u|\le \eta / h_H}\left( H(u)\right) ^2 du\\&\quad +\,h_H \sup _{|u|>\eta /h_H}(H(u))^2+ h_H g_{\varphi _j}(t^\epsilon _{\varphi _j,k}|x)\int _{|u|>\eta /h_H}\left( H(u)\right) ^2du; \end{aligned}$$

Condition A3(iv) allows us, for any \(\eta >0\), to write

$$\begin{aligned} \mathcal {T}_{i,1}\le & {} h_H C_x(|\eta |^{\nu }+ d(x, \ X_i)^{\beta })\int _{|u|\le \eta /h_H}\left( H(u)\right) ^2 du\\&+\,h_H \sup _{|u|>\eta /h_H}(H(u))^2+ h_H g_{\varphi _j}(t^\epsilon _{\varphi _j,k}|x)\int _{|u|>\eta / h_H}\left( H(u)\right) ^2du. \end{aligned}$$

Thus,

$$\begin{aligned}&\mathbb {E}\left[\left( \varDelta _i(x)\right) ^2 \mathcal {T}_{i,1}\mid \mathcal {F}_{i-1}\right]\\&\quad \le h_H\mathbb {E}\bigg [\left( \varDelta _i(x)\right) ^2\bigg (C_x(|\eta |^{\nu }+ d(x, \ X_i)^{\beta })\int _{|u|\le \frac{\eta }{h_H}}H^2(u)du\\&\quad + \sup _{|u|>\frac{\eta }{h_H}}H^2(u)+g_{\varphi _j}(t^\epsilon _{\varphi _j,k}|x)\int _{|u|>\frac{\eta }{h_H}} H^2(u)du\bigg )\mid \mathcal {F}_{i-1}\bigg ]\\&\quad \le h_H\bigg (C_x(|\eta |^{\nu }+ h_K^{\beta })\int _{|u|\le \eta /h_H}\left( H(u)\right) ^2 du+ \sup _{|u|>\frac{\eta }{ h_H}}H^2(u)\\&\quad +\,g_{\varphi _j}(t^\epsilon _{\varphi _j,k}|x)\int _{|u|>\frac{\eta }{ h_H}}H^2(u)du \bigg )\mathbb {E}\bigg [\left( \varDelta _i(x)\right) ^2\mid \mathcal {F}_{i-1}\bigg ]. \end{aligned}$$

On another hand, we can see easily, for some positive constant \(C_0\), that

$$\begin{aligned} \mathcal {T}_2\mathbb {E}\left[\left( \varDelta _i(x)\right) ^2 \mid \mathcal {F}_{i-1}\right]\le C_0 h_H\mathbb {E}\left[\left( \varDelta _i(x)\right) ^2 \mid \mathcal {F}_{i-1}\right]. \end{aligned}$$

Therefore, since H is bounded and \(h_H\rightarrow 0\), it follows then that there exists a constant \(C_1>0\) such that

$$\begin{aligned} \mathbb {E}\left[\left( \varDelta _i(x)H\left( \frac{t^\epsilon _{\varphi _j,k}-\varphi (Z_i)}{h_H}\right) \right) ^{2} \mid \mathcal {F}_{i-1}\right]\le C_1 h_H \mathbb {E}\bigg [\left( \varDelta _i(x)\right) ^2\mid \mathcal {F}_{i-1}\bigg ]\ \text{ as }\ n\rightarrow \infty . \end{aligned}$$

Furthermore, using Condition (A1), which supposes almost surely that \(f_{i,1}\) is bounded by a deterministic function b(x) and that \(\psi _{i,x}(h_K)\le \phi (h_K)\) as \(h_K\rightarrow 0\), together with Lemma 2 in Laïb and Louani (2011), we have for n large enough

$$\begin{aligned} \mathbb {E}[(L_{i,\varphi _j}(x,t^\epsilon _{\varphi _j,k}))^{2}|\mathcal {F}_{i-1}]\le & {} \frac{C_1 h_H}{\left( \mathbb {E}[\varDelta _1(x)]\right) ^{2}}\mathbb {E}\left[\left( \varDelta _i(x)\right) ^{2} \mid \mathcal {F}_{i-1}\right]\nonumber \\\le & {} \frac{C_1 h_H}{\phi (h_K)[M_1^2f_1^2(x)+o(1)]}[M_2b_i(x)+1]=:d_i^2. \end{aligned}$$

Moreover, using Conditions A1(iii), (v), one may write

$$\begin{aligned} \frac{4D_n}{n}+2M h_H\lambda= & {} \frac{h_H}{\phi (h_K)}\left[ \frac{4C_1[M_2D(x)+1]}{M_1^2f_1^2(x)+o(1)}+ \frac{4\lambda \bar{K}\bar{H}}{M_1f_1(x)+o(1)} \right] \end{aligned}$$

and \(\displaystyle {\frac{n{h_H}^2\lambda ^2}{4D_n/n+2M h_H\lambda } = n h_H\phi (h_K)\lambda ^2C_{\epsilon }(x)}\), where

$$\begin{aligned} C_{\epsilon }(x)=\frac{M_1f_1(x)}{4}. \frac{1}{ \frac{C_1(M_2D(x)+1)}{M_1f_1(x)} +\lambda \bar{H}\bar{K}+o(1)}. \end{aligned}$$

Consequently,

$$\begin{aligned} \mathbb {P}\left( \frac{1}{n h_H}\left| \sum _{i=1}^nL_{i,\varphi _j}(x, t_{\varphi _j,k}^{\epsilon })\right| \ge \lambda \right) \le 2\exp \left\{ -n h_H\phi (h_K)\lambda ^2 C_{\epsilon }(x)\right\} . \end{aligned}$$

Choosing \(\lambda =\lambda _n\) and \(m_n=\eta \), we obtain

$$\begin{aligned} \mathbb {P}\left( J_{2,n} \ge \lambda _n\right)\le & {} 2\mathcal {N}(\eta ,\mathcal {C},d_{\mathcal {C}})d_{\epsilon ,n}\exp \left\{ -n h_H\phi (h_K)\lambda ^2_n C_{\epsilon }(x)\right\} \\\le & {} 2\exp \left\{ -\lambda ^2_nn h_H\phi (h_K)\left[ C_\epsilon (x)-\frac{c_\epsilon \log \mathcal {N}(\eta ,\mathcal {C},d_{\mathcal {C}})}{\eta \lambda _n^2n h_H\phi (h_K)}\right] \right\} . \end{aligned}$$

Taking into account the condition (4), it suffices to use the Borel–Cantelli Lemma to conclude the proof. \(\square \)

Proof of Theorem 1

Taylor series expansion of the function \(g_\varphi (\hat{\varTheta }_{\varphi ,n}(x)|x)\) around \(\varTheta _\varphi (x)\) together with the definition of \(\varTheta _\varphi (x)\) yield

$$\begin{aligned} g_\varphi \left( \hat{\varTheta }_{\varphi ,n}(x)|x\right) =g_\varphi \left( \varTheta _\varphi (x)|x\right) +\left( \hat{\varTheta }_{\varphi ,n}(x)-\varTheta _\varphi (x)\right) ^{2}\frac{1}{2} g_\varphi ^{(2)}\left( \varTheta ^{*}_{\varphi ,n}(x)|x\right) , \end{aligned}$$
(19)

where \(\varTheta ^{*}_{\varphi ,n}\) is between \(\hat{\varTheta }_{\varphi ,n}(x)\) and \(\varTheta _\varphi (x)\). Subsequently, considering the statement (19) we obtain

$$\begin{aligned} (\hat{\varTheta }_{\varphi ,n}(x)-\varTheta _\varphi (x))^{2}|g^{(2)}(\varTheta ^{*}_{\varphi ,n}(x)|x)|=O\left( \sup _{y \in S_\varphi }|g_n(y|x)-g(y|x)|\right) . \end{aligned}$$
(20)

To end the proof of the theorem, we need the following lemma which deals with the uniform (with respect to \(\varphi \in \mathcal{C}\)) asymptotic behavior of the conditional mode estimate.

Lemma 7

Under assumptions of Proposition 1, we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\sup _{\varphi \in \mathcal {C}} |\hat{\varTheta }_{\varphi ,n}(x)-\varTheta _{\varphi }(x)|=0 \quad a.s. \end{aligned}$$

Proof of Lemma 7

Since by the assumption A3(ii), uniformly in \(\varphi \in \mathcal {C}\), \(g_\varphi (\cdot | x)\) is uniformly continuous on the compact set \(S_\varphi \) on which \(\theta _\varphi (x)\) is the unique mode. Then, proceeding as in Parzen (1962), for any \(\varepsilon >0\), there exists \(\zeta > 0\) such that, for any \(y\in S_\varphi \),

$$\begin{aligned} \sup _{\varphi \in \mathcal {C}}|\varTheta _{\varphi }(x)-y|\ge \epsilon \Rightarrow \sup _{\varphi \in \mathcal {C}}|g_{\varphi }(\varTheta _{\varphi }(x)|x)- g_{\varphi }(y|x)|\ge \zeta . \end{aligned}$$
(21)

On another hand, we have

$$\begin{aligned} \sup _{\varphi \in \mathcal {C}}|g_\varphi (\hat{\varTheta }_{\varphi ,n}(x)|x)-g_\varphi (\varTheta _\varphi (x)|x)|\le & {} \sup _{\varphi \in \mathcal {C}}|g_{\varphi ,n}(\hat{\varTheta }_{\varphi ,n}(x)|x)-g_\varphi (\hat{\varTheta }_{\varphi ,n}(x)|x)| \nonumber \\&+\, \sup _{\varphi \in \mathcal {C}}|g_{\varphi ,n}(\hat{\varTheta }_{\varphi ,n}(x)|x)-g_\varphi (\varTheta _{\varphi }(x)|x)| \nonumber \\\le & {} \sup _{\varphi \in \mathcal {C}}\sup _{y\in S_\varphi }|g_{\varphi ,n}(y|x)-g_\varphi (y|x)|\nonumber \\&+\,\sup _{\varphi \in \mathcal {C}}|\sup _{y\in S_\varphi }g_{\varphi ,n}(y|x)-\sup _{y\in S_\varphi }g_\varphi (y|x)| \nonumber \\\le & {} 2\sup _{\varphi \in \mathcal {C}}\sup _{y \in S_\varphi }|g_{\varphi ,n}(y|x)-g_\varphi (y|x)|. \end{aligned}$$
(22)

Using the statements (21) and (22) combined with Proposition 1, we obtain the result. \(\square \)

We come back now on the proof of the Theorem. Making use of Lemma 7 combined with conditions A3(iii)–(iv), we deduce that

$$\begin{aligned} \lim _{n \rightarrow \infty } \sup _{\varphi \in \mathcal {C}}|g^{(2)}(\varTheta ^{*}_{\varphi ,n}(x)|x)|=\sup _{\varphi \in \mathcal {C}}|g^{(2)}(\varTheta _{\varphi }(x)|x)|=\varPhi (x)\ne 0. \end{aligned}$$
(23)

Moreover, the statements (20), (23) imply that

$$\begin{aligned} \sup _{\varphi \in \mathcal {C}}(\hat{\varTheta }_{\varphi ,n}(x)-\varTheta _\varphi (x))^{2}=O\left( \sup _{\varphi \in \mathcal {C}}\sup _{y \in S_\varphi }|g_n(y|x)-g(y|x)|\right) , \end{aligned}$$
(24)

which is enough, while considering Proposition 1, to complete the proof of Theorem 1. \(\square \)