1 INTRODUCTION

The regression analysis has proved to be a flexible tool and provided a powerful statistical modeling framework in various applied and theoretical contexts where one intends to model the predictive relationship between related responses and predictors. It is worth noticing that the parametric regression models provide useful tools for analyzing practical data when the models are correctly specified but may suffer from large modelling biases if the structures of the models are misspecified, which is the case in many practical problems. As an alternative, nonparametric smoothing methods ease the concerns of modeling biases. Kernel nonparametric function estimation methods are popular, presenting only one of many approaches to constructing good function estimators, including nearest-neighbor, spline and wavelet methods. These methods have been applied to a wide variety of data. In the present paper, we shall focus on constructing consistent kernel-type estimators. For good sources of references to the research literature in this area along with statistical applications, consult [20, 26, 27, 30, 31, 35, 39, 51, 52, 57, 76, 78, 106, 108, 124, 126, 128, 130, 136, 138] and the references therein.

Recently, increasing interest has been given to regression models in which the response variable is real-valued, and the explanatory variable takes the form of smooth functions that vary randomly between repeated observations or measurements. Statistical problems related to the study of functional random variables, that is to say, variables with values in an infinite-dimensional space, have known, since the last decades, a growing interest in the statistics literature. The development of this research theme is indeed motivated by the abundance of data measured on an increasingly fine temporal/spatial grid as is the case, for instance, in meteorology, medicine, satellite imagery, and many other research areas. Thus, the statistical modeling of these data, seen as random functions, led to several challenging theoretical and numerical research questions. For an overview of theoretical as well as practical aspects of functional data analysis, the reader can refer to the monographs of [14] for linear models for random variables taking values in a Hilbert space, [120] for scalar-on-function and function-on-function linear models, functional principle component analysis, and parametric discriminant analysis. Ferraty and Vieu [62] however focused more on nonparametric methods and mainly kernel-type estimation for scalar-on-function nonlinear regression models. They extended such tools to classification and discrimination analysis. Horváth and Kokoszka [82] discussed the generalization of several interesting concepts in statistics, such as goodness-of-fit tests, portmanteau tests, and change detection, to the functional data framework. For the latest contributions in FDA and its related topics, one can refer to [1–4, 18, 25, 37, 54, 93, 103, 139]. In various scenarios, there is a keen interest in gauging the rate at which specific probabilities converge. Frequently, these probabilities exhibit rapid exponential convergence. Numerous researchers have explored large deviations and unearthed various applications, primarily within the realm of mathematical physics. To be more precise, the utilization of large deviation results in the fields of probability and statistics encompasses a wide range of applications. The primary application of statistics is in the evaluation of efficiency through the comparison of test procedures. This allows for the determination of the procedure that is most efficient based on the minimum amount of data required to achieve predefined performance levels, which are typically expressed in terms of the risk of the first kind, test power, and alternative hypotheses. For further information, refer to [112]. Additional applications arise in the assessment of estimate techniques, wherein the inaccuracy rates associated with each method are taken into account and subsequently compared. It is important to observe that large deviation results can serve as effective tools for establishing the consistency of estimators and their convergence rates. Valuable resources on the subject of large deviations can be found in [9, 49, 50, 132]. Extending beyond the conventional results of weak and strong convergence in regression analysis, the problem of functional moderate deviations introduces new challenges that these existing frameworks cannot readily address. There exists an extensive large and moderate deviation literature involving many areas of probability and statistics. We refer to the book of [49] and the references therein for an account of large deviation results and applications. In the nonparametric function estimation setting, several results have been obtained in these last years. We refer to [114], where the studies involve the Nadaraya–Watson and histogram estimates of the regression function, respectively both in the real vector case. Louani and Ould Maouloud [95] established a large deviation principle (LDP, in short) for a vector process, allowing to derive LDP’s for the kernel density and regression function estimators by the contraction principle. Further results for the multivariate regression estimates are due to [105], where large together with moderate deviation principles are stated for the Nadaraya–Watson estimator as well as for the semi-recursive kernel estimator. Large deviation results for the kernel regression function estimate on a functional covariate are obtained by [95]. For more references we refer to [10–12, 42, 60, 66, 75, 84–86, 88, 104, 117, 119, 123, 125, 133].

The selection of the kernel function in our setup is mostly unconstrained, except for meeting certain mild requirements that will be provided subsequently. However, the choice of bandwidth presents a greater challenge. It is important to acknowledge that the selection of the bandwidth plays a critical role in achieving a good rate of consistency. Specifically, it significantly impacts the magnitude of bias in the estimate. Broadly speaking, our focus lies in determining bandwidth that yields an estimator exhibiting a favorable trade-off between the bias and variance of the estimators under consideration. It is more suitable to evaluate the variability of bandwidth based on the applied criteria, available data, and location, which cannot be obtained by conventional methods. For further elaboration and analysis on the subject, readers are encouraged to consult [96]. The main aim of the present paper is to establish large deviation and moderate deviationFootnote 1 results in the functional data uniformly in the bandwidth. The uniform in bandwidth problem has attracted great attention, we refer to [15, 17, 21, 22, 25, 27, 29, 33, 34, 43, 47, 53, 59, 96, 98, 99]. To effectively tackle the challenges posed by functional moderate deviations, novel methodologies, and theoretical frameworks must be developed. Advanced statistical techniques, such as empirical process theory offer promising avenues for addressing these challenges. By extending our focus beyond the conventional regression models and embracing the complexities of functional moderate deviations, we can enhance our understanding of the behavior and limitations of kernel estimators. This, in turn, provides valuable insights into the underlying processes and patterns within functional data.

The layout of the article is as follows. Section 2 gives the notation and the definitions we need. Section 3 shows the moderate deviation principle, which is equivalent to the moderate deviation principle of the finite-dimensional distributions, given in Theorem 3.1, plus an exponential asymptotic equicontinuity condition concerning a pseudometric, given in Theorem 3.2. Section 4 provides applications of our main results including the kernel regression function estimate in Subsection 4.1, the kernel conditional distribution function in Subsection 4.2, The kernel quantile regression in Subsection 4.3, the kernel conditional density function in Subsection 4.4 and finally the kernel conditional copula function in Subsection 4.5. We discuss a bandwidth choice for practical use in Section 5. A summary of the findings highlighting remaining open issues may be found in Section 6. All proofs are deferred to Section 7. Due to the lengthiness of the proofs, we limit ourselves to the most important arguments. Finally, a few relevant technical results are given in Appendix A.

2 THE GENERAL PROCESS

We consider a sequence \(\{\left(X_{i},Y_{i}\right):i\geq 1\}\) of i.i.d. pairs of random copies of the random element [rv] \((X,Y)\), where \(X\) takes its values in some abstract space \(\mathcal{E}\) and \(Y\) is a \(\mathbb{R}^{q}\)-valued random variable, \(q\geq 1,\) with density \(g(\cdot)\), concerning the Lebesgue measure on \(\mathbb{R}^{q}\). Suppose that \(\mathcal{E}\) is endowed with a semi-metric \(d(\cdot,\cdot)\) Footnote 2 defining a topology to measure the proximity between two elements of \(\mathcal{E}\) and which is disconnected from the definition of \(X\) to avoid measurability problems. This covers the case of semi-normed spaces of possibly infinite dimension (e.g., Hilbert or Banach spaces). We will consider especially the conditional expectation of \(l(Y)\) given \(X=x\),

$$r^{l}(x):=\mathbb{E}(l(Y_{1})|X_{1}=x),\quad x\in\mathcal{E},$$

whenever this regression function is meaningful. Here and elsewhere, \(l(\cdot)\) denotes a specified measurable function from \(\mathbb{R}^{q}\) to \(\mathbb{R}\), which is assumed to be bounded on each compact sub-interval of \(\mathbb{R}^{q}\). The general Nadaraya–Watson [107, 137] type estimator of \(r^{l}(\cdot)\) has been introduced by [61], for \(l(y)=y\). It is defined, for any fixed \(x\in\mathcal{E}\), by

$$\widehat{r}_{n}^{l}(x,h)=\frac{\sum_{i=1}^{n}l(Y_{i})K(h^{-1}d(x,X_{i}))}{\sum_{i=1}^{n}K(h^{-1}d(x,X_{i}))}:=\frac{\widehat{r}_{n,2}^{l}(x,h)}{\widehat{r}_{n,1}^{l}(x,h)},$$
(2.1)

where \(K(\cdot)\) is a real-valued kernel function, \(h\) is the bandwidth parameter and, for \(k=1,2\) and

$$\widehat{r}_{n,k}^{l}(x,h)=\frac{1}{n\mathbb{E}[\Delta_{1}(x,h)]}\sum_{i=1}^{n}l^{k-1}(Y_{i})\Delta_{i}(x,h),$$

where \(\Delta_{i}(x,h)=K(h^{-1}d(x,X_{i}))\). Notice that taking \(l_{A}(y)={\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{\{y\in A\}}\) in the statement (2.1), where \(A\) is a subset of \(\mathbb{R}\), we obtain the well-known kernel estimator \(\hat{\mu}(A|x)\) of the conditional empirical measure

$$\mu(A|x):=\mathbb{P}(Y_{1}\in A|X=x).$$

Properties of \(\hat{\mu}(A|x)\), whenever \(A=(\infty,t]\) for \(t\in\mathbb{R}\) and \(x\in\mathbb{R}\), have been investigated by several authors among whom we cite [16, 17, 121, 122, 129]. In the functional data case, see [18, 65, 103].

The purpose of this paper is to establish some general moderate deviation results which allow us to derive, under mild regularity conditions, as a by-product the uniform functional moderate deviation principle for the kernel \(l\)-indexed regression function estimator, whenever \(l(\cdot)\) belongs to an appropriate class \(\mathcal{L}\). Towards this end, consider two real continuous functions, \(c_{l}(\cdot)\) from \(\mathcal{E}\) to \(\mathbb{R}\), \(d_{l}(\cdot)\) from \(\mathcal{E}\) to \(\mathbb{R}\) and define the following process. For any \(x\in\mathcal{E}\) and \(z=(l,\varrho)\in\mathcal{L}\times\mathcal{H}_{0}\), where \(\mathcal{H}_{0}=[\vartheta_{1},\vartheta_{2}]\) and \(0<\vartheta_{1}<\vartheta_{2}<\infty\), set (assuming that this expression is meaningful)

$$W_{n}(x,z)=\frac{1}{n\mathbf{E}[\Delta_{1}(x,\varrho h)]}\sum_{i=1}^{n}\Bigg{\{}\Big{(}c_{l}(x)l(Y_{i})+d_{l}(x)\Big{)}\Delta_{i}(x,\varrho h)-\mathbf{E}\bigg{[}\Big{(}c_{l}(x)l(Y_{i})+d_{l}(x)\Big{)}\Delta_{i}(x,\varrho h)\bigg{]}\Bigg{\}}$$
$${}:=\frac{1}{n\mathbf{E}[\Delta_{1}(x,\varrho h)]}\sum_{i=1}^{n}\Bigg{\{}\mathcal{M}_{x,l}(Y_{i})\Delta_{i}(x,\varrho h)-\mathbf{E}\bigg{[}\mathcal{M}_{x,l}(Y_{i})\Delta_{i}(x,\varrho h)\bigg{]}\Bigg{\}}.$$
(2.2)

In what follows, we first establish a functional uniform moderate deviation principle for the process \(\{W_{n}(x,z):z\in\mathcal{L}\times\mathcal{H}_{0}\}\), with a fixed \(x\in\mathcal{J}\), where \(\mathcal{J}\) denotes a suitable subset of \(\mathcal{E}\). Subsequently, through the utilization of exponential contiguity arguments, we deduce the corresponding moderate deviation principle for the regression estimate \(\widehat{r}_{n}^{l}(\cdot)\), encompassing the kernel distribution estimator. To provide more clarity, we present a corollary that delineates the behavior of the conditional distribution kernel estimator under scenarios of moderate deviations. This is complemented by the introduction of innovative applications of our main findings, as detailed in Section 4. Finally, we establish the moderate deviation principle for

$$\left\{\sup_{(x,z)\in\mathcal{J}\times\mathcal{L}\times\mathcal{H}_{0}}\left|W_{n}(x,z)\right|\right\}.$$

3 MAIN RESULTS

We will impose the following set of assumptions for our main results.

(A1) \(K(\cdot)\) is a nonnegative bounded differentiable kernel over its support \([0,1]\) and \(K(1)>0\). The derivative \(K^{\prime}(\cdot)\) of \(K(\cdot)\) exists on the interval \([0,1]\) , and it is bounded and satisfies the condition \(K^{\prime}(t)\leq 0\), \(\forall t\in(0,1)\).

(A2) For each \(x\in\mathcal{E}\) and a real number \(v\), there exist a nonnegative functional \(f_{v}(\cdot)\), a function \(g_{x,v}(\cdot)\) and a nonnegative real function \(\phi(\cdot)\) tending to zero, as its argument tends to \(0\), such that,

(i) \(F_{x}^{v}(u):=\mathbb{P}(d(x,X_{1})\leq u|Y=v)=\phi(u)f_{v}(x)+g_{x,v}(u)\) with, uniformly in \(v\), \(g_{x,v}(u)=o(\phi(u))\) as \(u\rightarrow 0\) and \(g_{x,v}(u)/\phi(u)\) is almost surely bounded;

(ii) there exists a nondecreasing bounded function \(\tau_{0}(u)\) such that, uniformly in \(u\in[0,1]\),

$$\frac{\phi(uh)}{\phi(h)}=\tau_{0}(u)+o(1),\quad\textrm{as}\quad h\downarrow 0.$$

(A3)Let \(h=h_{n}\) and \(w_{n}\) be sequences of positive numbers such that, as \(n\rightarrow\infty\),

$${h_{n}}\rightarrow 0,\quad{w_{n}}\rightarrow\infty,\quad\frac{w_{n}^{2}}{n\phi(\gamma h)}\rightarrow 0\quad\text{for some}\quad\gamma\in[\vartheta_{1},\vartheta_{2}].$$

(A4)For any real numbers \(a\) and \(b\), and any \((x,l)\in\mathcal{E}\times\mathcal{L}\)

$$\textrm{(i)}\int|l_{1}(v)l_{2}(v)|g(v)dv<\infty\quad\text{for any}\quad l_{1},\ l_{2}\in\mathcal{L};$$
$$\textrm{(ii)}\int e^{|a+bl(v)|}f_{v}(x)g(v)dv<\infty;\quad\textrm{(iii)}\int e^{|a+bl(v)|}g(v)dv<\infty,$$

Discussion of Assumptions

Condition (A1) is very usual in nonparametric estimation literature devoted to the functional data context. Notice that [115] symmetric kernel is not adequate in this context since the random process \(d\left(x,X_{i}\right)\) is positive, therefore we consider \(K(\cdot)\) with support \([0,1]\). This is a natural generalization of the assumption usually made on the kernel in the multivariate case where \(K(\cdot)\) is supposed to be a spherically symmetric density function. Because the fact that Lebesgue measure does not exist on infinite dimension space, assumptions (A2) involve the small ball techniques related to the fractal dimension used in this paper, for instance, see [100], who in turn was inspired by [68] in his non-parametric density estimation under functional observations. From [64], one can cite:

1. in the case when \(\mathcal{E}=\mathbb{R}^{d},\mathbb{P}(d(x,X_{1})\leq u|Y=v)\approx C(d)u^{d}f_{v}(x)\), where \(C(d)\) is the volume of the unit ball in \(\mathbb{R}^{d}\);

2. \(\mathbb{P}(d(x,X_{1})\leq u|Y=v)\approx f_{v}(x)u^{\gamma}\) for some \(\gamma>0\), then \(\tau_{0}(u)=u^{\gamma}\);

3. \(\mathbb{P}(d(x,X_{1})\leq u|Y=v)\approx f_{v}(x)u^{\gamma}\exp\left\{-c/u^{\kappa}\right\}\) for some \(\gamma>0\) and \(\kappa>0\), then \(\tau_{0}(u)=\delta_{1}(u)\), where \(\delta_{1}(\cdot)\) is a Dirac function.

Masry [100] explains that if \(\mathcal{E}=\mathbb{R}\), then the condition coincides with the fundamental axioms of probability calculus, furthermore if \(\mathcal{E}\) is an infinitely dimensional Hilbert space then \(\phi(h)\) can decrease toward \(0\) by an exponential speed as \(n\to\infty.\) (A2) (ii) approximately shows that the small ball probability can be written approximately as the product of two independent functions, refer to [101] for the diffusion process, [13] for a Gaussian measure, [91] for a general Gaussian process and  has employed these assumptions [100] for strongly mixing processes. For example, the function \(\phi(\cdot)\) can be expressed as \(\phi(\epsilon)=\epsilon^{\delta}\exp(-C/\epsilon^{a})\) with \(\delta\geq 0\) and \(a\geq 0\), and it corresponds to the Ornstein–Uhlenbeck and general diffusion processes (for such processes, \(a=2\) and \(\delta=0\)) and the fractal processes (for such processes, \(\delta>0\) and \(a=0\)). This class of processes also satisfies condition (A2). For other examples, we refer to [24, 28, 64, 128]. Since \(n\phi\left(h\right)\rightarrow\infty\), suppose \(\phi\left(h\right)=n^{-c}\) for some \(0<c<1\). Then Condition (A3) is satisfied provided \(w_{n}=n^{\gamma}\), for \(0<\gamma<(1-c)/2\). Assumptions (A4) are set on to ensure the needed properties of finiteness and differentiability of the finite-dimensional moment generating function associated with the process \(\{W_{n}(x,z):z\in\mathcal{L}\times\mathcal{H}_{0}\}\). All these general assumptions are sufficiently weak relative to the different objects involved in the statement of our main results. They cover and exploit the principal axes of this contribution, which are the topological structure of the functional variables, and the probability measure in this functional space. From now on, consider the following covariance function defined, for any \(z_{1}:=(l_{1},\varrho_{1})\in\mathcal{L}\times\mathcal{H}_{0}\), \(z_{2}:=(l_{2},\varrho_{2})\in\mathcal{L}\times\mathcal{H}_{0}\) and \(\gamma\in\mathcal{H}_{0}\),

$$R_{\gamma}(x,z_{1},z_{2}):=\frac{\displaystyle\alpha_{1}(\varrho_{1},\varrho_{2})\tau(\varrho,\gamma)}{\displaystyle\alpha_{0}^{2}\tau(\varrho_{1},\gamma)\tau(\varrho_{2},\gamma)}\frac{\displaystyle\int\mathcal{M}_{x,l_{1}}(v)\mathcal{M}_{x,l_{2}}(v)f_{v}(x)g(v)\,dv}{\displaystyle\Bigg{(}\int f_{v}(x)g(v)\,dv\Bigg{)}^{2}},$$
(3.1)

where

$$\alpha_{0}=K(1)-\int\limits_{0}^{1}K^{\prime}(u)\tau_{0}(u)\,du$$
(3.2)

and

$$\alpha_{1}(\varrho_{1},\varrho_{2})=K\bigg{(}\frac{\varrho}{\varrho_{1}}\bigg{)}K\bigg{(}\frac{\varrho}{\varrho_{2}}\bigg{)}-\int\limits_{0}^{1}\bigg{(}K\bigg{(}\frac{\varrho}{\varrho_{1}}u\bigg{)}K\bigg{(}\frac{\varrho}{\varrho_{2}}u\bigg{)}\bigg{)}^{\prime}\tau_{0}(u)du$$

with \(\varrho=\min(\varrho_{1},\varrho_{2})\) and

$$\tau(a,b)=\tau_{0}\left(\frac{a}{b}\right){\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{\{a\leq b\}}+\left[\tau_{0}\left(\frac{b}{a}\right)\right]^{-1}{\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{\{a>b\}},$$

where \({\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{A}\) denotes the indicator of \(A\). Note that, \(\tau(a,b)=(\tau(b,a))^{-1}\) which gives that \(R_{\gamma}(x,z_{1},z_{2})=R_{\gamma}(x,z_{2},z_{1})\). Let \(\{\Xi_{\gamma}(x,z):\ z\in\mathcal{L}\times\mathcal{H}_{0}\}\) be a mean-zero Gaussian process such that, for any \((z_{1},z_{2})\in\big{(}\mathcal{L}\times\mathcal{H}_{0}\big{)}^{2}\),

$$\mathbf{E}[\Xi_{\gamma}(x,z_{1})\Xi_{\gamma}(x,z_{2})]=R_{\gamma}(x,z_{1},z_{2}).$$

Let \(\mathcal{Z}_{x,\gamma}\) be the closed linear subspace of the space \(L_{2}\), generated by

$$\Big{\{}\Xi_{\gamma}(x,z):z\in\mathcal{L}\times\mathcal{H}_{0}\Big{\}}.$$

Define the function \(\varphi:\mathcal{Z}_{x,\gamma}\rightarrow l_{\infty}\big{(}\mathcal{L}\times\mathcal{H}_{0}\big{)}\) by

$$\varphi(\xi)(z)=\mathbf{E}[\Xi_{\gamma}(x,z)\xi].$$

Note that the reproducing kernel Hilbert space is associated with the covariance function \(R_{\gamma}(x,z_{1},z_{2})\) is the Hilbert space \(\{\varphi(\xi):\xi\in\mathcal{Z}_{x,\gamma}\}\) equipped with the inner product

$$\langle\varphi(\xi_{1}),\varphi(\xi_{2})\rangle=\mathbf{E}[\xi_{1}\xi_{2}].$$

For any \((z_{1},\ldots,z_{m})\in\mathcal{L}\times\mathcal{H}_{0}\) and any \(\lambda_{1},\ldots,\lambda_{m}\in\mathbb{R}\), set

$$\Gamma^{\gamma}_{x,z_{1}\ldots,z_{m}}(\lambda_{1},\ldots,\lambda_{m})=\inf\Big{\{}2^{-1}\mathbf{E}[\xi^{2}]:\xi\in\mathcal{Z}_{x,\gamma},\ \varphi(\xi)(z_{j})=\lambda_{j},1\leq j\leq m\Big{\}}.$$
(3.3)

The following theorem gives a finite-dimensional moderate deviation principle for the process \(\{W_{n}(x,z):z\in\mathcal{L}\times\mathcal{H}_{0}\}\). Let, for any \(x\in\mathcal{E}\),

$$f(x)=\int f_{v}(x)g(v)\,dv.$$

Theorem 3.1. Assume that the assumptions (A1)–(A4) are fulfilled. If \(f(x)>0\) and \(\tau_{0}(\frac{\vartheta_{1}}{\vartheta_{2}})>0\). Then the random sequence \(w_{n}(W_{n}(x,z_{1}),\ldots,W_{n}(x,z_{m}))\) satisfies a LDP with the speed \(n\phi(\gamma h)/w_{n}^{2}\) \((\gamma\in\mathcal{H}_{0})\) and the good rate function \(\Gamma^{\gamma}_{x,z_{1}\ldots,z_{m}}(\cdot)\) defined in \((3.3)\), where \(z_{i}=(l_{i},\varrho_{i})\), for \(i=1,\ldots,m\).

The proof of Theorem 3.1 is postponed until Section 7.

Remark 3.1. For any \(z:=(l,\varrho)\in\mathcal{L}\times\mathcal{H}_{0}\), it follows by Theorem 3.1 that the random sequence \(w_{n}W_{n}(x,z)\) satisfies a LDP with the speed \(n\phi(\gamma h)/w_{n}^{2}\) \((\gamma\in\mathcal{H}_{0})\) and the good rate function \(\Gamma_{x,z}^{\gamma}(\cdot)\) given by

$$\Gamma_{x,z}^{\gamma}(\lambda)=\sup_{a}\Bigg{\{}\lambda a-\frac{1}{2}a^{2}R_{\gamma}(x,z,z)\Bigg{\}}=\frac{\lambda^{2}}{2R_{\gamma}(x,z,z)},$$

where

$$R_{\gamma}(x,z,z):=\frac{\bigg{(}K^{2}(1)-\int\limits_{0}^{1}\Big{(}K^{2}(u)\Big{)}^{\prime}\tau_{0}(u)\,du\bigg{)}}{\alpha_{0}^{2}\tau(\varrho,\gamma)}\frac{\displaystyle\int\mathcal{M}_{x,l}^{2}(v)f_{v}(x)g(v)\,dv}{\displaystyle\Bigg{(}\int f_{v}(x)g(v)\,dv\Bigg{)}^{2}}.$$
(3.4)

Remark 3.2. If we suppose that the derivative of the function \(\tau_{0}(\cdot)\) exists and considering the fact that \(\tau_{0}(0)=0\) and \(\tau_{0}(1)=1\), then, integrating by parts, we obtain

$$R_{\gamma}(x,z,z)=\frac{1}{\displaystyle\tau(\varrho,\gamma)}\frac{\displaystyle\int\limits_{0}^{1}K^{2}(u)\tau^{\prime}_{0}(u)du\int\mathcal{M}_{x,l}^{2}(v)f_{v}(x)g(v)dv}{\displaystyle\Bigg{[}\int\limits_{0}^{1}K(u)\tau^{\prime}_{0}(u)du\Bigg{]}^{2}\Bigg{(}\int f_{v}(x)g(v)dv\Bigg{)}^{2}},$$

which gives a simpler form of the rate function. Also, we observe, whenever \(K(u)={\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{[0,1]}(u)\), that

$$R_{\gamma}(x,z,z)=\frac{1}{\displaystyle\tau(\varrho,\gamma)}\frac{\displaystyle\int\mathcal{M}_{x,l}^{2}(v)f_{v}(x)g(v)dv}{\displaystyle\bigg{(}\int f_{v}(x)g(v)dv\bigg{)}^{2}}.$$

In the sequel, we investigate the functional moderate deviation principle of the process

$$\bigg{\{}W_{n}(x,z):z\in\mathcal{L}\times\mathcal{H}_{0}\bigg{\}}$$

in the space \(l_{\infty}\big{(}\mathcal{L}\times\mathcal{H}_{0}\big{)}\) equipped with the uniform topology.

Let \(L(\cdot)\) denote the finite-valued measurable envelope function of the class \(\mathcal{L}\) of measurable functions on \(\mathbb{R}\), that is,

$$L(y)\geq\sup_{l\in\mathcal{L}}|l(y)|,\quad y\in\mathbb{R}.$$

Define

$$N(\epsilon,\mathcal{L})=\sup_{Q}\mathcal{N}\bigg{(}\epsilon\sqrt{Q(L^{2})},\mathcal{L},d_{Q}\bigg{)},$$

where the supremum is taken over all probability measures \(Q\) on \(\mathbb{R}\), for

$$0<Q(L^{2})=\int L^{2}(y)dQ(y)<\infty,$$

and \(d_{Q}\) is the \(L_{2}(Q)\)-metric. As usual, \(N(\epsilon,\mathcal{L},d_{Q})\) is the minimal number of balls \(\{l:d_{Q}(l,l{{}^{\prime}})<\epsilon\}\) of \(d_{Q}\)-radius \(\epsilon\) needed to cover \(\mathcal{L}\). To formulate our functional moderate deviation principle, we consider some additional conditions.

(B1) (i) For some \(C^{\prime}>0\) and \(\nu_{2}>0\),

$$N(\epsilon,\mathcal{L})\leq C^{\prime}\epsilon^{-\nu_{2}},\quad 0<\epsilon<1;$$

(ii) \(\mathcal{L}\) is a pointwise measurable class, that is, there exists a countable subclass \(\mathcal{L}_{0}\) of \(\mathcal{L}\) such that, for any function \(l\in\mathcal{L}\), we can find a sequence of function \(\{l_{n}\}\) in \(\mathcal{L}_{0}\) for which

$$l_{n}(y)\rightarrow l(y),\quad y\in\mathbb{R}.$$

(B2) The class

$$\mathcal{K}=\Bigg{\{}x\mapsto K\left(\frac{d(x,u)}{h}\right):u\in\mathcal{J},\ h>0\Bigg{\}}$$

satisfies the Condition (B1).

(B3) Uniformly in \(v\in\mathbb{R}\), the function \(f_{v}(.)\) is continuous and strictly positive on \(\mathcal{J}.\)

As in [48], let us denote by \(\{\mathcal{M}(x):x\geqslant 0\}\) a continuous, increasing and non-negative function fulfilling, for some \(q>2\), ultimately as \(x\uparrow\infty\),

$$\textrm{(i) }x^{-q}\mathcal{M}(x)\uparrow,\quad\textrm{(ii) }x^{-1}\log\mathcal{M}(x)\downarrow,$$

where ‘\(\uparrow\)’ (resp. ‘\(\downarrow\)’) stands for non-decreasing (resp. non-increasing). For each \(t\geqslant\mathcal{M}(0)\), we denote by \(\mathcal{M}^{\text{inv}}(t)\) the uniquely defined non-negative number such that \(\mathcal{M}\left(\mathcal{M}^{\mathrm{inv}}(t)\right)=t.\) The following choices of \(\mathcal{M}(\cdot)\) are of particular interest:

(i) \(\mathcal{M}(x)=x^{p}\) for some \(p>2\);

(ii) \(\mathcal{M}(x)=\exp(sx)\) for some \(s>0\).

(B4) (i) For some \(t>\max(s,1)\),

$$\mathbf{E}\big{[}\exp\big{(}tL(Y)\big{)}\big{]}<\infty;$$

(ii)

$$\displaystyle{\lim_{n\rightarrow\infty}\frac{\displaystyle w_{n}^{2}\max\Big{(}\mathcal{M}^{\mathrm{inv}}(n),\log(n)\Big{)}}{\displaystyle n\phi(\gamma h)}=\infty};$$

(iii)

$$\displaystyle{\lim_{n\rightarrow\infty}\frac{n\phi(\gamma h)\mathcal{M}^{\mathrm{inv}}(n)^{-2}}{\max\Big{(}\log\big{(}\mathcal{M}^{\mathrm{inv}}(n)\big{)},\log\big{(}1/\phi(\gamma h)\big{)}\Big{)}}}=\infty\quad\textrm{and}\quad\displaystyle{\limsup_{n\rightarrow\infty}\displaystyle{\frac{\displaystyle w_{n}^{2}\log\big{(}\mathcal{M}^{\mathrm{inv}}(n)\big{)}}{\displaystyle n\phi(\gamma h)}}<\infty};$$

(iv)

$$\displaystyle{\limsup_{n\rightarrow\infty}\displaystyle{\frac{\displaystyle w_{n}^{2}\log\big{(}1/\phi(\gamma h)\big{)}}{\displaystyle n\phi(\gamma h)}}=0}.$$

\(\mathbf{{(B}^{\prime}}\mathbf{4)}\)

(i) There exists a constant \(L_{0}>0\) such that \(L(Y){\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{\{X\in\mathcal{J}\}}\leq L_{0}\) a.s.;

(ii)

$$\displaystyle{\limsup_{n\rightarrow\infty}\frac{\displaystyle w_{n}^{2}\log(1/\phi(\gamma h))}{\displaystyle n\phi(\gamma h)}=0\quad\textrm{and}\quad\limsup_{n\rightarrow\infty}\frac{\displaystyle n\phi(\gamma h)}{\displaystyle\log(1/\phi(\gamma h))}=\infty}.$$

Comments on Additional Hypotheses

For Assumption (B1)(i), see [116, Examples 26 and 38], [113, Lemma 22], [55, Subsection 4.7.], [131, Theorem 2.6.7], [89, Subsection 9.1] provide a number of sufficient conditions under which (B1)(i) holds, we may also refer to [46, 15, 16, 27, Subsection 3.2] for further discussions. For instance, it is satisfied, for general \(d\geq 1\), whenever \(l(\mathbf{x})=\Psi(p(\mathbf{x}))\), with \(p(\mathbf{x})\) is either a polynomial in \(d\) variables or the \(\alpha\)th power of the absolute value of a real polynomial for some \(\alpha>0\) and \(\Psi(\cdot)\) is a real-valued function of bounded variation, which covers commonly used kernels, such as Gaussian, Epanečnikov, Uniform, etc, we refer the reader to [59, p. 1381]. We also mention that Condition (B1)(i) is satisfied whenever that class of functions contains functions of bounded variation on \(\mathbb{R}^{q}\) (in the sense of Hardy and Kauser ([80, 90, 134]), see, e.g., [44, 81, 111, 135]). Assumption (B1)(ii) is made to avoid measurability difficulties. Our definition of ‘‘pointwise measurability’’ is borrowed from example 2.3.4 in [131], [72, p. 262] calls a pointwise measurable function class satisfying the pointwise countable approximation property. This condition is discussed in [131, Example 2.3.4, p. 110] and [89, Subsection 8.2, p. 110] and it is satisfied whenever \(l(\cdot)\) is right continuous. Assumption (B1)(i) ensures that \(\mathcal{L}\) is VC type with characteristics \(C^{\prime}\) and \(\nu\). Condition (B2) in [73] is formulated as follows:

(B\(\mathbf{{}^{\prime}}\)2) \(K(x)>0\), is a bounded and compactly supported measurable function that belongs to the linear span (the set of finite linear combinations) of functions \(k(x)\geq 0\) satisfying the following property: the subgraph of \(k(\cdot),\) \(\{(s,u):k(s)\geq u\}\), can be represented as a finite number of Boolean operations among sets of the form

$$\bigg{\{}(s,u):p(s,u)\geq\varphi(u)\bigg{\}},$$

where \(p\) is a polynomial on \(\mathbb{R}\times\mathbb{R}\) and \(\varphi\) is an arbitrary real function.

Indeed, for a fixed polynomial \(p\), the family of sets

$$\bigg{\{}\{(s,u):p((s-t)/h,u)\geq\varphi(u)\}:t\in\mathbb{R},h>0\bigg{\}}$$

is contained in the family of positivity sets of a finite-dimensional space of functions, and then the entropy bound follows by Theorems 4.2.1 and 4.2.4 in [55]. Our results are demonstrated through an examination of the bounded scenario, where the Condition (B\({}^{\prime}\)4) is imposed, and the unbounded scenarios, which are explored under the Condition (B4). To establish the result of Theorem 3.2, we may relax and replace the Assumption (B4)(iv) by the following assumption:

$$\displaystyle{\limsup_{n\rightarrow\infty}\frac{\displaystyle w_{n}^{2}\log(1/\phi(\gamma h))}{\displaystyle n\phi(\gamma h)}<\infty.}$$

A function \(\pi:T\rightarrow T\) is said to be a finite partition function of a set \(T\) if, for each \(t\in T\), \(\pi(\pi(t))=\pi(t)\) and the cardinality of \(\{\pi(t):t\in T\}\) is finite. Let \(\pi(T)=\{t_{1},\ldots,t_{m}\}\) and \(A_{j}=\{t\in T:\pi(t)=t_{j}\}\) for \(1\leq j\leq m\); then \(\{A_{1},\ldots,A_{m}\}\) is a partition of the set \(T\). Note that finite partition functions can be used to characterize the compactness of \(l_{\infty}(T)\). A set \(B\) of \(l_{\infty}(T)\) is compact if and only if it is closed and bounded and if for each \(\tau>0\) there exists a finite partition function \(\pi:T\rightarrow T\) such that

$$\sup_{\psi\in B}\big{|}\psi(t)-\psi(\pi(t))\big{|}\leq\tau,$$

for instance, see (Theorem IV.5.6 in [56] or [6], p. 573). We also have that if \(B\) is a compact set of \(l_{\infty}(T)\), then \(B\) is a set of uniformly bounded and uniformly equicontinuous functions in the pseudo-metric space \((T,d_{B})\), where

$$d_{B}(t_{1},t_{2})=\sup_{\psi\in B}|\psi(t_{1})-\psi(t_{2})|.$$

From now on, for any real-valued function \(\psi\) defined on a set \(T\) , we use the notation

$$||\psi||_{T}:=\sup_{t\in T}|\psi(t)|.$$

For future use, let us introduce two classes of continuous and bounded functions on \(\mathcal{J}\) indexed by \(\mathcal{L}\)

$$\mathcal{C}:=\Big{\{}c_{l}:l\in\mathcal{L}\Big{\}}\quad\textrm{and}\quad\mathcal{D}:=\Big{\{}d_{l}:l\in\mathcal{L}\Big{\}}.$$

We shall always assume that the classes \(\mathcal{C}\) and \(\mathcal{D}\) are compact with respect to the uniform topology. Let us define

$$C_{\mathcal{L}}:=\sup\Big{\{}||c_{l}||_{\mathcal{J}}:l\in\mathcal{L}\Big{\}}\quad\textrm{and}\quad D_{\mathcal{L}}:=\sup\Big{\{}||d_{l}||_{\mathcal{J}}:l\in\mathcal{L}\Big{\}}.$$

For any \(x\in\mathcal{J}\), the moderate deviation principle for the process \(\{W_{n}(x,z):z\in\mathcal{L}\times\mathcal{H}_{0}\}\) in the space \(l_{\infty}\big{(}\mathcal{L}\times\mathcal{H}_{0}\big{)}\) is presented in the following theorem.

Theorem 3.2. Assume that Assumptions (A1)–(A4), (B1)–(B3), and (B4) or (B\({}^{\prime}\)4) hold true, and \(\tau_{0}(\displaystyle\frac{\vartheta_{1}}{\vartheta_{2}})>0\). Furthermore, consider the classes of continuous functions \(\mathcal{C}\) and \(\mathcal{D}\) given above. Then we have, for any \(\gamma\in\mathcal{H}_{0}\) and any \(x\in\mathcal{J}\),

(i) for any \(0<c<\infty\), \(\{\psi\in l_{\infty}(\mathcal{L}\times\mathcal{H}_{0}):\,I_{x}^{\gamma}(\psi)\leq c\}\) is a compact set of \(l_{\infty}(\mathcal{L}\times\mathcal{H}_{0})\);

(ii) for all open subsets \(O\) of \(l_{\infty}(\mathcal{L}\times\mathcal{H}_{0})\),

$$\liminf_{n\rightarrow\infty}\frac{w_{n}^{2}}{n\phi(\gamma h)}\log\Bigg{(}\mathbb{P}\bigg{\{}w_{n}W_{n}(x,\cdot)\in O\bigg{\}}\Bigg{)}\geq-\inf_{\psi\in O}I_{x}^{\gamma}(\psi)$$

and for all closed subsets \(F\) of \(l_{\infty}(\mathcal{L}\times\mathcal{H}_{0})\),

$$\limsup_{n\rightarrow\infty}\frac{w_{n}^{2}}{n\phi(\gamma h)}\log\Bigg{(}\mathbb{P}\bigg{\{}w_{n}W_{n}(x,\cdot)\in F\bigg{\}}\Bigg{)}\leq-\inf_{\psi\in F}I_{x}^{\gamma}(\psi),$$

where

$$I_{x}^{\gamma}(\psi)=\inf\Bigg{\{}2^{-1}\mathbf{E}[\xi^{2}]:\xi\in\mathcal{Z}_{x,\gamma},\ \varphi(\xi)=\psi\Bigg{\}}.$$
(3.5)

The proof of Theorem 3.2 is postponed until Section 7.

Remark 3.3. Making use of similar arguments as in the paper [8, p. 5], it follows, whenever \({\sup_{z\in\mathcal{L}\times\mathcal{H}_{0}}R_{\gamma}(x,z,z)>0}\), that for any \(\lambda\geq 0\),

$$\inf_{\{\psi\in l_{\infty}(\mathcal{L}\times\mathcal{H}_{0}):\sup_{z\in\mathcal{L}\times\mathcal{H}_{0}}|\psi(z)|\geq\lambda\}}I_{x}^{\gamma}(\psi)=\frac{\displaystyle\lambda^{2}}{\displaystyle 2\sup_{z\in\mathcal{L}\times\mathcal{H}_{0}}R_{\gamma}(x,z,z)}$$

and, whenever \(\sup_{z\in\mathcal{L}\times\mathcal{H}_{0}}R_{\gamma}(x,z,z)=0\),

$$I_{x}^{\gamma}(\psi)=\begin{cases}0,\quad\textrm{if}\quad\displaystyle\sup_{z\in\mathcal{L}\times\mathcal{H}_{0}}|\psi(z)|=0\\ \infty,\quad\textrm{if}\quad\displaystyle\sup_{z\in\mathcal{L}\times\mathcal{H}_{0}}|\psi(z)|>0.\end{cases}$$

Therefore, by Theorem 3.2, for any \(\lambda\geq 0\), we have

$${\lim_{n\rightarrow\infty}\frac{w_{n}^{2}}{n\phi(\gamma h)}\log\mathbb{P}\bigg{(}w_{n}\sup_{z\in\mathcal{L}\times\mathcal{H}_{0}}|W_{n}(x,z)|\geq\lambda\bigg{)}}$$
$${}=\begin{cases}\displaystyle-\frac{\lambda^{2}}{\displaystyle{2\sup_{z\in\mathcal{L}\times\mathcal{H}_{0}}R_{\gamma}(x,z,z)}}\quad\textrm{if}\quad\displaystyle\displaystyle{\sup_{z\in\mathcal{L}\times\mathcal{H}_{0}}R_{\gamma}(x,z,z)>0}\\ \displaystyle-\infty\quad\textrm{if}\quad\displaystyle{\sup_{z\in\mathcal{L}\times\mathcal{H}_{0}}R_{\gamma}(x,z,z)=0}\end{cases}$$
$${}=:-I_{x}^{\gamma}(\lambda).$$

The uniform moderate deviations principle on \(\mathcal{J}\times\mathcal{L}\times\mathcal{H}_{0}\) is presented in the following theorem. Towards this end, for any \(\epsilon>0\), consider the following number

$$\mathcal{N}(\epsilon,\mathcal{J},d)=\min\Bigg{\{}n:\text{ there exist }x_{1},\ldots,x_{n}\text{ in }\mathcal{J}\text{ such that for any }x\in\mathcal{J}$$
$$\text{there exists }1\leq k\leq n\text{ such that }d(x,x_{k})<\epsilon\Bigg{\}},$$

which is the minimal number of open ball with \(d\)-radius \(\epsilon\) needed to cover the subset \(\mathcal{J}\). We assume that \(\mathcal{J}\) satisfies the following property.

(J) For any \(\epsilon>0\), there exists \(C>0\) and \(\nu>0\) such that \(\mathcal{N}(\epsilon,\mathcal{J},d)\leq C\epsilon^{-\nu}.\)

Theorem 3.3. If assumptions of Theorem 3.2 and assumption (J) hold, then for any \(\lambda>0\),

$$\lim_{n\rightarrow\infty}\frac{w_{n}^{2}}{n\phi(\gamma h)}\log\Bigg{(}\mathbb{P}\bigg{\{}\sup_{(x,z)\in\mathcal{J}\times\mathcal{L}\times\mathcal{H}_{0}}w_{n}\left|W_{n}(x,z)\right|>\lambda\bigg{\}}\Bigg{)}=-I^{\gamma}(\lambda),$$

where

$$I^{\gamma}(\lambda)=\inf_{x\in\mathcal{J}}I_{x}^{\gamma}(\lambda).$$

The proof of Theorem 3.3 is postponed until Section 7.

Remark 3.4. As in the [97] bootstrap, see also [19, 36, 127] for recent references. Following [46], we introduce an auxiliary i.i.d. sequence \(Z=Z_{1},Z_{2},\ldots\) of real-valued rv’s, independent of \(\left\{\left({X}_{i},{Y}_{i}\right):i\geq 1\right\}\), and such that

(R1) \(\mathbb{E}(Z)=1\); \(\mathbb{E}\left(Z^{2}\right)=2;\)

(R2) for some \(\epsilon>0,\) \(\mathbb{E}\left(e^{tZ}\right)<\infty\) for all \(|t|\leq\epsilon\).

Setting \(T_{n}=Z_{1}+\cdots+Z_{n}\) we define the \(\left\{\mathfrak{W}_{i,n}:1\leq i\leq n\right\}\), by setting, for \(i=1,\ldots,n\)

$$\mathfrak{W}_{i,n}=\begin{cases}\displaystyle\frac{\displaystyle Z_{i}}{\displaystyle T_{n}}=\frac{\displaystyle Z_{i}}{\displaystyle\sum_{j=1}^{n}Z_{j}}\quad\text{when}\quad T_{n}>0\\ \displaystyle\frac{\displaystyle 1}{\displaystyle n}\quad\text{when}\quad T_{n}\leq 0.\end{cases}$$

Introduce the resampled version of the process \((2.2)\) given by

$$W_{n}^{*}(x,z)=\frac{1}{n\mathbf{E}[\Delta_{1}(x,\varrho h)]}\sum_{i=1}^{n}\Bigg{\{}\Big{(}c_{l}(x)l(Y_{i})Z_{i}+d_{l}(x)\Big{)}\Delta_{i}(x,\varrho h)$$
$${}-\mathbf{E}\bigg{[}\Big{(}c_{l}(x)l(Y_{i})Z_{i}+d_{l}(x)\Big{)}\Delta_{i}(x,\varrho h)\bigg{]}\Bigg{\}}.$$
(3.6)

Following [46], we observe that \(W_{n}^{*}(x,l,h)\) reduces to a process of the form \(W_{n}(x,l^{*},h)\), for a suitable measurable \(l^{*}(\cdot)\), and after some easy changes. Without loss of generality, we set \(Z=Q(U)\) and \(Z_{i}=Q\left(U_{i}\right)\) for \(i=1,\ldots,n\), where \(U\) and \(U_{1},\ldots,U_{n}\) are independent rv’s, with a uniform distribution on \((0,1)\), and independent of \(\left\{\left({X}_{i},{Y}_{i}\right):1\leq i\leq n\right\}\). This allows us to define a measurable function \(\psi^{*}(\cdot)\) on \(\mathbb{R}^{q+1}\), and a \(\textrm{rv}\mathbf{Y}^{*}\), by

$$\mathbf{Y}^{*}=\left[\begin{matrix}U&Y_{1}\end{matrix}\right]^{\prime}\in\mathbb{R}^{1+q}\quad\text{ and }\quad l^{*}\left(\mathbf{Y}^{*}\right)=Q(U)l({Y})=Zl({Y}).$$

Letting \(\left({X}_{i},\mathbf{Y}_{i}^{*}\right),i=1,2,\ldots\), denote i.i.d. random copies of \(\left(X,\mathbf{Y}^{*}\right)\), it is readily checked that \(\{\left({X}_{i},\mathbf{Y}_{i}^{*}\right):1\leq i\leq n\}\), and \(l^{*}(\cdot)\) fulfill the general assumptions imposed in Theorem 3.2. Then the process

$$\bigg{\{}w_{n}W_{n}^{*}(x,z):z\in\mathcal{L}\times\mathcal{H}_{0}\bigg{\}}$$

satisfies a LDP in the space \(l_{\infty}(\mathcal{L}\times\mathcal{H}_{0})\) with the speed \(n\phi(\gamma h)/w_{n}^{2}\) and the good rate function \(I_{x}^{\gamma}(\cdot)\). Then we have, for any \(\gamma\in\mathcal{H}_{0}\) and any \(x\in\mathcal{J}\),

(i) for any \(0<c<\infty\), \(\{\psi\in l_{\infty}(\mathcal{L}\times\mathcal{H}_{0}):\,I_{x}^{\gamma}(\psi)\leq c\}\) is a compact set of \(l_{\infty}(\mathcal{L}\times\mathcal{H}_{0})\);

(ii) for all open subsets \(O\) of \(l_{\infty}(\mathcal{L}\times\mathcal{H}_{0})\),

$$\liminf_{n\rightarrow\infty}\frac{w_{n}^{2}}{n\phi(\gamma h)}\log\Bigg{(}\mathbb{P}\bigg{\{}w_{n}W_{n}^{*}(x,\cdot)\in O\bigg{\}}\Bigg{)}\geq-\inf_{\psi\in O}I^{\gamma}(\psi)$$

and for all closed subsets \(F\) of \(l_{\infty}(\mathcal{L}\times\mathcal{H}_{0})\),

$$\limsup_{n\rightarrow\infty}\frac{w_{n}^{2}}{n\phi(\gamma h)}\log\Bigg{(}\mathbb{P}\bigg{\{}w_{n}W_{n}^{*}(x,\cdot)\in F\bigg{\}}\Bigg{)}\leq-\inf_{\psi\in F}I_{x}^{\gamma}(\psi),$$

where we recall

$$I_{x}^{\gamma}(\psi)=\inf\Bigg{\{}2^{-1}\mathbf{E}[\xi^{2}]:\xi\in\mathcal{Z}_{\gamma},\ \varphi(\xi)=\psi\Bigg{\}}.$$
(3.7)

By Theorem 3.3 we have, for any \(\lambda>0\),

$$\lim_{n\rightarrow\infty}\frac{w_{n}^{2}}{n\phi(\gamma h)}\log\Bigg{(}\mathbb{P}\bigg{\{}\sup_{(x,z)\in\mathcal{J}\times\mathcal{L}\times\mathcal{H}_{0}}w_{n}\left|W_{n}^{*}(x,z)\right|>\lambda\bigg{\}}\Bigg{)}=-I^{\gamma}(\lambda),$$

while it is possible that the last result also holds for the exchangeably weighted bootstrap, such a determination is beyond the scope of this paper and appears to be quite difficult.

Remark 3.1. According to [64], our methodology is heavily dependent on the distribution function \(\phi(\cdot)\). This is evident in our conditions (via the function \(\tau_{0}(\cdot)\)) and in the convergence rates of our estimate (via the asymptotic behavior of the quantity \(n\phi(h))\). More precisely, the behavior of \(\phi(\cdot)\) around \(0\) turns out to be of paramount importance. Thus, the tiny ball probabilities of the underlying functional variable \(X\) are crucial. In probability theory, the calculation of the quantity \(\mathbb{P}(||X-x||<s)\) for ‘‘small’’ \(s\) (i.e., for \(s\) tending toward zero) and for a fixed \(x\) is known as the ‘‘small ball problem.’’ Unfortunately, there are solutions for very few random variables (or processes) \(X\) even when \(x=0\). In several functional spaces, taking \(x\neq 0\) results in formidable obstacles that may be insurmountable. Typically, authors emphasize Gaussian random variables. We refer you to [91] for a summary of the key findings regarding the probability of small balls. If \(X\) is a Gaussian random element on the separable Banach space \(\mathcal{E}\) and \(x\) belongs to the reproducing kernel Hilbert space associated with \(X\), then the following well-known result holds:

$$\mathbb{P}(||X-x||\leq s)\sim C_{x}\mathbb{P}(||X||\leq s),\quad\text{as}\quad s\rightarrow 0.$$

As far as we know, the results that are available in the published literature are basically all of the forms

$$\mathbb{P}(||X-x||<s)\sim c_{x}s^{-\alpha}\exp\left(-C/s^{\beta}\right),$$

where \(\alpha,\beta,c_{\chi}\), and \(C\) are positive constants and \(||\cdot||\) may be a supremum norm, a \(L^{p}\) norm or a Besov norm. The interested reader can refer to [62–64, 128] for more discussion. Notice that the pioneer book by [62] extensively comments on the links between nonparametric functional statistics small-ball probability theory and topological structure on the functional space \(\mathcal{E}\).

4 APPLICATIONS

While only the examples provided below will be discussed, they serve as archetypes for various functionals and can be explored similarly.

4.1 The Kernel Regression Function Estimate

Some further conditions are needed to establish the functional moderate deviations principle for the kernel regression function estimate \(\widehat{r}_{n}^{l}(x)\).

(C1) (i) For each \((x,x^{\prime})\in\mathcal{J}^{2}\), \(l\in\mathcal{L}\), and some constants \(\beta>0\) and \(\varsigma>0\)

$$|r^{l}(x)-r^{l}(x^{\prime})|\leq\varsigma d(x,x^{\prime})^{\beta};$$

(ii) \(w_{n}h^{\beta}\rightarrow 0\) as \(n\rightarrow\infty\).

Assumption (C1)(i) imposes some smoothness of the regression operator. Define the function \(I_{x,1}^{\gamma}(\cdot)\) as the function \(I_{x}^{\gamma}(\cdot)\) in the statement (3.5), with \(c_{l}(x)=1\) and \(d_{l}(x)=-r^{l}(x)\) for any \(x\in\mathcal{J}\). The large deviation principle for the process \(\{w_{n}(\widehat{r}_{n}^{l}(x,\varrho h)-r^{l}(x)):(l,\varrho)\in\mathcal{L}\times\mathcal{H}_{0}\}\) in the space \(l_{\infty}(\mathcal{L}\times\mathcal{H}_{0})\) is presented in following corollary.

Corollary 4.1. Under the assumption of Theorem \(3.2\), assume that the Conditions (C1) hold. Then the process

$$\bigg{\{}w_{n}(\widehat{r}_{n}^{l}(x,\varrho h)-r^{l}(x)):(l,\varrho)\in\mathcal{L}\times\mathcal{H}_{0}\bigg{\}}$$

satisfies a LDP in the space \(l_{\infty}(\mathcal{L}\times\mathcal{H}_{0})\) with the speed \(n\phi(\gamma h)/w_{n}^{2}\) and the good rate function \(I_{x,1}^{\gamma}(\cdot)\).

The proof of Corollary 4.1 is postponed until Section 7.

Proposition 4.1. Under the assumption of Theorem \(3.2\), assume that the Conditions (C1) hold. Then, for any \(\delta>0\), we have

$$\lim_{n\rightarrow\infty}\frac{w_{n}^{2}}{n\phi(\gamma h)}\log\mathbb{P}\left(\exists x\in\mathcal{E},\quad r^{l}(x)\notin\left[\widehat{r}_{n}^{l}(x,h)-\delta\frac{\hat{\sigma}_{n}}{w_{n}},\widehat{r}_{n}^{l}(x,h)+\delta\frac{\hat{\sigma}_{n}}{w_{n}}\right]\right)=-\frac{\delta^{2}}{2}.$$

Moreover, the sequence of sets of functions

$$D_{n}=\left\{g:\mathcal{E}\rightarrow\mathbb{R},\left|g(x)-\widehat{r}_{n}^{l}(x,h)\right|\leq\delta\frac{\hat{\sigma}_{n}}{w_{n}},\quad\forall x\in\mathcal{E}\right\},$$

is an asymptotic almost sure sequence of confidence regions of \(r^{l}(x)\), here \(\hat{\sigma}_{n}^{2}:=\hat{\sigma}_{n}^{2}(x)\) is any consistent estimate of the \(\widehat{r}_{n}^{l}(x,h)\)’s variance.

Remark 4.2. Let \(||\cdot||\) be a norm on \(\mathcal{E}=\mathbb{R}^{d}\). Denote by \(B(x,r)\) the set of all points \(z\in\mathbb{R}^{d}\) satisfying \(||x-z||\leq r\). For each \(n\geq 1\) and \(k\in\{1,\ldots,n\}\), the \(k\)-nearest neighbor bandwidth at \(x\) is denoted by \(\hat{k}_{n,x}\) and defined as the smallest radius \(r\geq 0\) such that the ball \(B(x,r)\) contains at least \(k\) points from the collection \(\{X_{1},\ldots,X_{n}\}\), i.e.,

$$\hat{k}_{n,x}=\inf\left\{r\geq 0\,:\,\sum_{i=1}^{n}{\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{B(x,r)}(X_{i})\geq k\right\}.$$

The \(k\)-nearest neighbor estimate of the regression function, \(x\mapsto\mathbb{E}[l(Y)|X=x]\), is defined as, for all \(x\in\mathbb{R}^{d}\),

$$\sum_{i=1}^{n}l(Y_{i}){\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{B(x,\hat{k}_{n,x})}(X_{i})/\sum_{i=1}^{n}{\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{B(x,\hat{k}_{n,x})}(X_{i}).$$

This estimate is an adaptive bandwidth version of the Nadaraya–Watson estimate which here would be defined in the same way except that a non-random bandwidth (depending only on \(n\), e.g., \(n^{-1/5}\)) is used in place of \(\hat{k}_{n,x}\). It would be of interest to investigate the process defined in (2.2) in the \(k\)-nearest neighbor setting.

4.2 The Kernel Conditional Distribution Function

To present the functional moderate deviations principle for the conditional distribution function estimate \(\widehat{F}_{nh}(t|x):=\widehat{\mu}((\infty,t]|x)\) for \((x,t)\in\mathcal{J}\times\mathbb{R}\) as a special case, the Conditions (C1) have to be formulated for the conditional distribution function \(F(t|x):=\mu((\infty,t]|x)\) for \((x,t)\in\mathcal{J}\times\mathbb{R}\) as follows.

(C2) (i) For any \((x,x^{\prime})\in\mathcal{J}^{2}\), any \((t,t^{\prime})\in\mathbb{R}^{2q}\), some \(\beta_{1}>0\) \(\beta_{2}>0\) and a constant \(\varsigma^{\prime}>0\)

$$|F(t|x)-F(t^{\prime}|x^{\prime})|\leq\varsigma^{\prime}\Big{(}d(x,x^{\prime})^{\beta_{1}}+||t-t^{\prime}||^{\beta_{2}}\Big{)};$$

(ii) \(w_{n}h^{\beta_{1}}\rightarrow 0\) as \(n\rightarrow\infty\).

Assumption (C2)(i) introduces a level of smoothness to the conditional distribution. Define the function \(I_{2,x}^{\gamma}(\cdot)\) as the function \(I^{\gamma}_{x}(\cdot)\) in statement (3.5), where \(l(y)={\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{(\infty,t]}(y)\), \(c_{l}(x)=1\), and \(d_{l}(x)=-F(t|x)\) for any \((x,t)\in\mathcal{J}\times\mathbb{R}\). Let the conditional distribution function estimate be defined as follows

$$\widehat{F}_{n,h}(t|x)=\frac{\sum_{i=1}^{n}{\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{(\infty,t]}(Y_{i})K(h^{-1}d(x,X_{i}))}{\sum_{i=1}^{n}K(h^{-1}d(x,X_{i}))}.$$
(4.1)

The large deviation principle for the process \(\{w_{n}(\widehat{F}_{n,\varrho h}(t|x)-F(t|x)):(t,\varrho)\times\mathbb{R}\times\mathcal{H}_{0}\}\) in \(l_{\infty}(\mathbb{R}\times\mathcal{H}_{0})\) is presented in the following corollary.

Corollary 4.2. Assume that assumptions (A1)–(A5), (B2)–(B3), and (C2) hold true. Then the process

$$\bigg{\{}w_{n}(\widehat{F}_{n,\varrho h}(t|x)-F(t|x)):(t,\varrho)\in\mathbb{R}\times\mathcal{H}_{0}\bigg{\}}$$

satisfies a large deviations in \(l_{\infty}(\mathbb{R}\times\mathcal{H}_{0})\) with the speed \(n\phi(\gamma h)/w_{n}^{2}\) and the good rate function \(I_{2,x}^{\gamma}(\cdot)\).

The proof of Corollary 4.2 is postponed until Section 7.

4.3 The Kernel Quantile Regression

For a given \(\alpha\in(0,1)\), the \(\alpha\)th-order conditional quantile of the distribution of a real-valued \(Y\) given \(X=x\) is defined as

$$q_{\alpha}(x)=\inf\bigg{\{}y\in\mathbb{R}:F(y|x)\geq\alpha\bigg{\}}.$$

Notice that, whenever \(F(\cdot|x)\) is strictly increasing and continuous in a neighborhood of \(q_{\alpha}(x)\), the function \(F(\cdot|x)\) has a unique quantile of order \(\alpha\) at a point \(q_{\alpha}(x)\), that is \(F\left(q_{\alpha}(x)|x\right)=\alpha\). In such case

$$q_{\alpha}(x)=F^{-1}(\alpha|x)=\inf\bigg{\{}y\in\mathbb{R}:F(y|x)\geq\alpha\bigg{\}},$$

which may be estimated uniquely by

$$\widehat{q}_{n,\alpha}(x,\varrho)=\widehat{F}_{n,\varrho h}^{-1}(\alpha|x).$$

Conditional quantiles have been widely studied in the literature when the predictor \(X\) is of finite dimension, see for instance, [45]. Let us first recall some conceptions of Hadamard differentiability [70, 89, 131]. Let \(\mathcal{X}\) and \(\mathcal{Y}\) be two metrizable topological linear spaces. A map \(\Phi\) defined on a subset \(\mathcal{D}_{\Phi}\) of \(\mathcal{X}\) with values in \(\mathcal{Y}\) is called Hadamard differentiable at \(x\) if there exists a continuous mapping \(\Phi_{x}^{\prime}:\mathcal{X}\mapsto\mathcal{Y}\) such that

$$\lim_{n\rightarrow\infty}\frac{\Phi\left(x+t_{n}\nu_{n}\right)-\Phi(x)}{t_{n}}=\Phi_{x}^{\prime}(h)$$

holds for all sequences \(t_{n}\) converging to \(0+\) and \(\nu_{n}\) converging to \(\nu\) in \(\mathcal{X}\) such that \(x+t_{n}\nu_{n}\in\mathcal{D}_{\Phi}\) for every \(n\).

Corollary 4.3. Let \(0<p<q<1\) be fixed and let \(F(\cdot|x)\) be a conditional distribution function with continuous and positive derivative \(f(\cdot|x)\) on the interval \(\left[F^{-1}(p|x)-\varepsilon,F^{-1}(q|x)+\varepsilon\right]\) for some \(\varepsilon>0\). Then, under conditions of Corollary \(4.2\), the process \(\left\{w_{n}\left(\widehat{q}_{n,\alpha}(x,\varrho)-q_{\alpha}(x)\right)\right\}\) satisfies the \(LDP\) in \(l_{\infty}([p,q]\times\mathcal{H}_{0})\) with speed \(n\phi(\gamma h)/w_{n}^{2}\) and rate function \(I_{\gamma,x}^{EQ}\) given by

$$I_{\gamma,x}^{EQ}(\phi)=\inf\Bigg{\{}I_{2,x}^{\gamma}(\psi):-\frac{\psi\left(F^{-1}(t|x)\right)}{f\left(F^{-1}(t|x)\right)}=\phi(x)\text{ for all }t\in[p,q]\Bigg{\}}.$$

The proof of Corollary 4.3 is postponed until Section 7.

Remark 4.3. It will be interesting to extend our findings to the following settings.

1. (Expectile regression.) For \(p\in(0,1)\), let \(l(T-\boldsymbol{\theta})=(p-\{T-\boldsymbol{\theta}\leq 0\})|T-\boldsymbol{\theta}|\), then the zero of \(r^{l}(\cdot)\) with respect to \(\boldsymbol{\theta}\) leads to quantities called expectiles by [110]. Expectiles, as defined by [110], may be introduced either as a generalization of the mean or as an alternative to quantiles. Indeed, classical regression provides us with a high sensitivity to extreme values, allowing for more reactive risk management. Quantile regression, on the other hand, provides the ability to acquire exhaustive information on the effect of the explanatory variable on the response variable by examining its conditional distribution, refer to [102, 103] for further details on expectiles in functional data setting.

2. (Conditional winsorized mean.) As in [83], if we consider \(l(T-\boldsymbol{\theta})=-k,T-\boldsymbol{\theta},k\) if \(T-\boldsymbol{\theta}<-k\), \(|T-\boldsymbol{\theta}|\leq k\), or \(T-\boldsymbol{\theta}>k\), then the zero of \(r^{l}(\cdot)\) with respect to \(\boldsymbol{\theta}\) will be the conditional winsorized mean. Notably, this parameter was not considered in the literature on nonparametric functional data analysis involving wavelet estimators. Our paper offers asymptotic results for the conditional winsorized mean when the covariates are functions.

4.4 The Kernel Conditional Density Function

By setting \(l(\cdot)=\frac{1}{h_{1}}K_{1}(h_{1}^{-1}(\cdot-t))\), for \({t}\in\mathbb{R}\), \(h_{1}\) is a bandwidth parameter and \(K_{1}(\cdot)\) is the kernel function, into (2.1), we obtain the kernel estimator of the conditional density function \(f({t}|{x})\) given by

$$\widehat{f}_{n,bh}({t}|{x})=\frac{\sum_{i=1}^{n}\frac{1}{h_{1}}K_{1}(h_{1}^{-1}(Y_{i}-t))K(bh^{-1}d(x,X_{i}))}{\sum_{i=1}^{n}K(bh^{-1}d(x,X_{i}))}.$$
(4.2)

(C3) (i) For any \((x,x^{\prime})\in\mathcal{J}^{2}\), any \((t,t^{\prime})\in[a,b]^{2}\subset\mathbb{R}^{2}\), some \(\beta_{1}>0\), \(\beta_{2}>0\), and a constant \(\varrho>0\)

$$|f(t|x)-f(t^{\prime}|x^{\prime})|\leq\varrho\Bigg{(}d(x,x^{\prime})^{\beta_{1}}+|t-t^{\prime}|^{\beta_{2}}\Bigg{)};$$

(ii) \(w_{n}(h^{\beta_{1}}+h_{1}^{\beta_{2}})\rightarrow 0\) as \(n\rightarrow\infty\).

\(\mathbf{{(B}^{\prime}}\mathbf{2)}\) The class

$$\tilde{\mathcal{K}}=\Bigg{\{}(h,x,y)\mapsto K\left(\frac{y-t}{h_{1}}\right)K\left(\frac{d(x,x^{\prime})}{h}\right):x^{\prime}\in\mathcal{J},t\in\mathbb{R},h>0\Bigg{\}}$$

satisfies the Condition (B1).

Assumption (C3)(i) imposes some smoothness of the conditional density. Define the function \(I_{3,x}^{\gamma}(\cdot)\) as the function \(I^{\gamma}_{x}(\cdot)\) in the statement (3.5), with \(l(\cdot)=\frac{1}{h_{1}}K_{1}(h_{1}^{-1}(\cdot-t))\), \(c_{l}(x)=1\) and \(d_{l}(x)=-f(t|x)\) for any \((x,t)\in\mathcal{J}\times[a,b]\).

Corollary 4.4. Assume that assumptions (A1)–(A5), (B\({}^{\prime}\) \({}^{\prime}\)2)–(B4), and (C3) hold. Then the process

$$\bigg{\{}w_{n}(\widehat{f}_{n,\varrho h}(t|x)-f(t|x)):(t,\varrho)\in[a,b]\times\mathcal{H}_{0}\bigg{\}}$$

satisfies large deviations in \(l_{\infty}([a,b]\times\mathcal{H}_{0})\) with the speed \(nh_{1}\phi(\gamma h)/w_{n}^{2}\) and the good rate function \(I_{3,x}^{\gamma}(\cdot)\).

The proof of Corollary 4.4 is similar to the proof of Corollary 4.2 and therefore omitted.

4.5 The Kernel Conditional Copula Function

Let us recall the setting of [69]. Assume that \(\left(X_{1},Y_{11},Y_{21}\right),\ldots,\left({X}_{n},Y_{1n},Y_{2n}\right)\) is a sample of \(n\) independent and identically distributed triples of random variables. The random variables \(Y_{1i}\) and \(Y_{2i}\) are real and the \({X}_{i}\)’s are random elements. Suppose that the conditional distribution of \(\left(Y_{1},Y_{2}\right)^{\top}\) given \({X}=x\) exists and denote the corresponding conditional joint distribution function by

$$H_{x}\left(y_{1},y_{2}\right)=\mathbb{P}\left(Y_{1}\leq y_{1},Y_{2}\leq y_{2}|{X}=x\right).$$

If the marginals of \(H_{x}(\cdot,\cdot)\) denoted as

$$F_{1x}\left(y_{1}\right)=\mathbb{P}\left(Y_{1}\leq y_{1}|{X}=x\right),\quad F_{2x}\left(y_{2}\right)=\mathbb{P}\left(Y_{2}\leq y_{2}|{X}=x\right)$$

are continuous, then according to Sklar’s theorem (see e.g., [109]) there exists a unique copula \(C_{x}(\cdot,\cdot)\) which equals

$$C_{x}\left(u_{1},u_{2}\right)=H_{x}\left(F_{1x}^{-1}\left(u_{1}\right),F_{2x}^{-1}\left(u_{2}\right)\right),$$

where

$$F_{1x}^{-1}(u)=\inf\left\{y:F_{1x}(y)\geq u\right\}$$

is the conditional quantile function of \(Y_{1}\) given \({X}=x\) and \(F_{2x}^{-1}(\cdot)\) is the conditional quantile function of \(Y_{2}\) given \({X}=x\). The conditional copula \(C_{x}(\cdot,\cdot)\) fully describes the conditional dependence structure of \(\left(Y_{1},Y_{2}\right)^{\top}\) given \({X}=x\). An estimator of the joint conditional distribution function \(H_{x}(\cdot,\cdot)\) is

$$H_{xh}\left(y_{1},y_{2}\right)=\frac{\sum_{i=1}^{n}{\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}\left\{Y_{1i}\leq y_{1},Y_{2i}\leq y_{2}\right\}K(h^{-1}d(x,X_{i}))}{\sum_{i=1}^{n}K(h^{-1}d(x,X_{i}))}.$$
(4.3)

Then analogously as in [69] and [23] one can suggest the following empirical estimator of the copula \(C_{x}(\cdot,\cdot)\)

$$C_{x,h}\left(u_{1},u_{2}\right)=H_{xh}\left(F_{1xh}^{-1}\left(u_{1}\right),F_{2xh}^{-1}\left(u_{2}\right)\right),\quad 0\leq u_{1},\quad u_{2}\leq 1,$$

where \(F_{1xh}(\cdot)\) and \(F_{2xh}(\cdot)\) are the corresponding marginal distribution functions of \(H_{xh}(\cdot,\cdot)\), i.e., \(F_{1xh}\left(y_{1}\right)=H_{xh}\left(y_{1},+\infty\right)\) and \(F_{2xh}\left(y_{2}\right)=H_{xh}\left(+\infty,y_{2}\right)\).

Corollary 4.5. Let \(0<p<q<1\) be fixed. Suppose that \(F_{1x}\left(\cdot\right)\) and \(F_{2x}\left(\cdot\right)\) are continuously differentiable on the intervals \(\left[F_{1x}^{-1}(p)-\varepsilon,F_{1x}^{-1}(q)+\varepsilon\right]\) and \(\left[F_{2x}^{-1}(p)-\right.\) \(\left.\varepsilon,F_{2x}^{-1}(q)+\varepsilon\right]\) with strictly positive derivatives \(f_{1}(\cdot|x)\) and \(f_{2}(\cdot|x)\), respectively, for some \(\varepsilon>0\). Furthermore, assume that \(\partial H_{x}/\partial y_{1}\) and \(\partial H_{x}/\partial y_{2}\) exist and are continuous on the product intervals. Then, under conditions of Corollary \(4.2\), the process

$$\bigg{\{}w_{n}(C_{x,\varrho h}(u_{1},u_{2})-C_{x}(u_{1},u_{2})):((u_{1},u_{2}),\varrho)\in[p,q]^{2}\times\mathcal{H}_{0}\bigg{\}}$$

satisfies the \(LDP\) in \(l_{\infty}\left([p,q]^{2}\times\mathcal{H}_{0}\right)\) with speed \(n\phi(\gamma h)/w_{n}^{2}\) and rate function \(I^{C}_{\gamma,x}(\cdot)\) defined by

$$I^{C}_{\gamma,x}(\phi)=\inf\Big{\{}I_{1,x}^{\gamma}(\varpi):\Phi_{H}^{\prime}(\varpi)=\phi\Big{\}},$$

where

$$\Phi_{H}^{\prime}(\varpi)(u_{1},u_{2})=\varpi\left(F^{-1}_{1x}(u_{1}),F^{-1}_{2x}(u_{2})\right)$$
$${}-\frac{\partial H_{x}}{\partial y_{1}}\left(F^{-1}_{1,x}(u_{1}),F^{-1}_{2,x}(u_{2})\right)\frac{\varpi\left(F^{-1}_{1,x}(u_{1}),\infty\right)}{f_{1}\left(F^{-1}_{1,x}(u_{1})|x\right)}$$
$${}-\frac{\partial H_{x}}{\partial y_{1}}\left(F^{-1}_{1,x}(u_{1}),F^{-1}_{2,x}(u_{2})\right)\frac{\varpi\left(F^{-1}_{2,x}(u_{2}),\infty\right)}{f_{2}\left(F^{-1}_{2,x}(u_{2})|x\right)}.$$

Corollary 4.5 is a direct consequence Corollary 4.2, Lemma 3.9.28 of [131] (or [32]) and Theorem 3.1 of [67] in a similar way as in Theorem 4.6 of the last mentioned reference.

Remark 4.4. We define the conditional hazard function on \(\mathbb{R}\) by

$$S(\cdot|x)=\frac{f(\cdot|{x})}{1-F(\cdot|{x})}.$$

The kernel estimator \({S}_{n;h_{n}}(y|x)\) is defined for all \(y\in\mathbb{R}\) such that \(F(y|x)<1\), by

$${S}_{n;h_{n}}(t|\mathbf{x})=\frac{\widehat{f}_{n}({t}|{x})}{1-\widehat{F}_{n}(t|x)}.$$

Our result can be applied to \({S}_{n;h_{n}}(t|\mathbf{x})\) by combining Corollaries 4.2 and 4.4.

Remark 4.5. The use of the single index model has been adopted to decrease the dimensionality of the explanatory variable, aiming to circumvent the ‘‘curse of dimensionality’’ while preserving the benefits of nonparametric smoothing in multivariate regression over the past few decades. By assuming \(\mathcal{X}\) is a Hilbert space, in the single index setting, the process in (2.2) takes the form

$$W_{n}(x,l,\theta,\vartheta)=\frac{1}{n\mathbf{E}[\Delta_{1}(x,\theta,\vartheta h)]}\sum_{i=1}^{n}\Bigg{\{}\Big{(}c_{l}(x)l(Y_{i})+d_{l}(x)\Big{)}\Delta_{i}(x,\theta,\vartheta h)$$
$${}-\mathbf{E}\bigg{[}\Big{(}c_{l}(x)l(Y_{i})+d_{l}(x)\Big{)}\Delta_{i}(x,\theta,\vartheta h)\bigg{]}\Bigg{\}},$$
(4.4)

where \(\theta\) is a functional single index valued in a subset \(\Theta\) of a separable Hilbert space \(\mathcal{X}\) and \(\Delta_{i}(x,\theta,h)=K(h^{-1}|\langle X-x,\theta\rangle|),\) one can refer to [37, 38]. Although it is conceivable that our findings may apply to the single index model, establishing such a conclusion is outside the purview of this paper and seems to pose significant challenges.

Remark 4.6. This study holds significance in the realm of functional data analysis. Firstly, the results presented in this paper are enriched by an additional uniformity constraint, specifically \(a_{n}\leq h\leq b_{n}\). Secondly, the scope of applications is broadened to encompass novel areas in the field, including kernel quantile regression, kernel conditional density function, and kernel conditional copula function. These extensions represent pioneering contributions in functional data analysis. The findings of this study play a crucial role in establishing uniform consistency for various estimators, employing data-driven bandwidths, associated with the aforementioned applications. Further insights and relevant results can be explored in [41] and [94].

5 THE BANDWIDTH SELECTION CRITERION

Numerous methods have been developed to construct asymptotically optimal bandwidth selection rules for nonparametric kernel estimators, particularly for the Nadaraya–Watson regression estimator. Prominent works in this regard include [25, 30, 77, 79, 118]. The selection of this parameter is crucial, whether in the standard finite-dimensional case or within the infinite-dimensional framework, to ensure effective practical performance. Let’s define the leave-out-\(\left(X_{i},Y_{i}\right)\) estimator for the regression function

$$\widehat{r}_{n}^{l;j}(x)=\frac{\sum_{i=1,i\neq j}^{n}l(Y_{i})K(h^{-1}d(x,X_{i}))}{\sum_{i=1,i\neq j}^{n}K(h^{-1}d(x,X_{i}))}.$$
(5.1)

To minimize the quadratic loss function, we introduce the following criterion, where \(\mathbb{W}(\cdot)\) is a known non-negative weight function

$$CV\left(\varphi,h_{n}\right):=\frac{1}{n}\sum_{j=1}^{n}\left(l\left(Y_{j}\right)-\widehat{r}_{n}^{l;j}(X_{i})\right)^{2}\mathbb{W}\left(X_{j}\right).$$
(5.2)

Following the ideas developed by [118], a natural way for choosing the bandwidth is to minimize the precedent criterion, so let’s choose \(\widehat{h}_{n}\in[a_{n},b_{n}]\) minimizing among \(h\in[a_{n},b_{n}]\) :

$$CV\left(\varphi,h_{n}\right).$$

One can replace (5.2) by

$$\widehat{CV}\left(\varphi,h\right):=\frac{1}{n}\sum_{j=1}^{n}\left(l\left(Y_{j}\right)-\widehat{r}_{n}^{l;j}(X_{i})\right)^{2}\widehat{\mathcal{W}}\left(\mathbf{x}_{j},\mathbf{x}\right).$$
(5.3)

In practice, one takes for \(j=1,\ldots,n\), the uniform global weights \(\mathbb{W}\left(\mathbf{x}_{j}\right)=1\), and the local weights

$$\widehat{\mathcal{W}}(\mathbf{x}_{j},{\mathbf{x}})=\begin{cases}1\quad\textrm{if}\quad d(x_{j},{\mathbf{x}})\leq h\\ 0\quad\textrm{otherwise}.\end{cases}$$

By similar reasoning, one can select \(\widehat{h}_{n}\) for the process defined in (2.2). Let us define

$$\widehat{W}_{n}(x,l)=W_{n}(x,l,\widehat{h}_{n})=\frac{1}{n\mathbf{E}[\Delta_{1}(x,\widehat{h}_{n})]}\sum_{i=1}^{n}\Bigg{\{}\Big{(}c_{l}(x)l(Y_{i})+d_{l}(x)\Big{)}\Delta_{i}(x,\widehat{h}_{n})$$
$${}-\mathbf{E}\bigg{[}\Big{(}c_{l}(x)l(Y_{i})+d_{l}(x)\Big{)}(x,\widehat{h}_{n})\bigg{]}\Bigg{\}}.$$
(5.4)

The following corollary is an immediate consequence of Theorem 3.2.

Corollary 5.1. Assume that assumptions (A1)–(A4), (B1)–(B3), and (B4) \(or\) (B\({}^{\prime}\)4) hold true. Furthermore, consider the classes of continuous functions \(\mathcal{C}\) and \(\mathcal{D}\) given above. Then we have, for any \(x\in\mathcal{J}\),

(i) for any \(0<c<\infty\), \(\{\psi\in l_{\infty}(\mathcal{L}):\,I_{x}(\psi)\leq c\}\) is a compact set of \(l_{\infty}(\mathcal{L})\);

(ii) for all open subsets \(O\) of \(l_{\infty}(\mathcal{L})\),

$$\liminf_{n\rightarrow\infty}\frac{w_{n}^{2}}{n\phi(\widehat{h}_{n})}\log\Bigg{(}\mathbb{P}\bigg{\{}w_{n}\widehat{W}_{n}(x,\cdot)\in O\bigg{\}}\Bigg{)}\geq-\inf_{\psi\in O}I_{x}(\psi)$$

and for all closed subsets \(F\) of \(l_{\infty}(\mathcal{L})\),

$$\limsup_{n\rightarrow\infty}\frac{w_{n}^{2}}{n\phi(\widehat{h}_{n})}\log\Bigg{(}\mathbb{P}\bigg{\{}w_{n}\widehat{W}_{n}(x,\cdot)\in F\bigg{\}}\Bigg{)}\leq-\inf_{\psi\in F}I_{x}(\psi),$$

where

$$I_{x}(\psi)=\inf\Bigg{\{}2^{-1}\mathbf{E}[\xi^{2}]:\xi\in\tilde{\mathcal{Z}}_{x},\ \varphi(\xi)=\psi\Bigg{\}},$$
(5.5)

and \(\tilde{\mathcal{Z}}_{x}\) is the closed linear subspace of the space \(L_{2}\) generated by the mean-zero Gaussian process \(\Big{\{}\Xi(x,l):l\in\mathcal{L}\Big{\}}.\)

As in [87], let us minimize the following errors:

$$\textrm{Err}_{1}\left(\widehat{f}_{n,b},f\right)=\iint\left\{\widehat{f}_{n,b}({y}|{x})-f({y}|{x})\right\}^{2}W_{1}(x)W_{2}(y)d\mathbb{P}(x,y),$$
$$\textrm{Err}_{2}\left(\widehat{f}_{n,b},f\right)=\frac{1}{n}\sum_{i=1}^{n}\left\{\widehat{f}_{n,b}({Y_{i}}|X_{i})-f({Y_{i}}|{X_{i}})\right\}^{2}\frac{W_{1}\left(X_{i}\right)W_{2}\left(Y_{i}\right)}{f({Y_{i}}|{X_{i}})},$$

or

$$\textrm{Err}_{3}\left(\widehat{f}_{n,b},f\right)=\iint\mathbf{E}\left\{\widehat{f}_{n,b}({y}|{x})-f({y}|{x})\right\}^{2}W_{1}(x)W_{2}(y)d\mathbb{P}(x,y),$$

where \(W_{1}(\cdot)\) and \(W_{2}(\cdot)\) are some non-negative weight functions. These theoretical errors are not computable in practice and the following leave-one-out cross-validation criterion can be constructed to approximate them in some fully data-driven way:

$$\mathrm{CV}(b)=\frac{1}{n}\sum_{i=1}^{n}W_{1}\left(X_{i}\right)\int\left(\widehat{f}_{n,b}^{-i}({y}|{X_{i}})\right)^{2}W_{2}(y)d\mathbb{P}(y)-\frac{2}{n}\sum_{i=1}^{n}\widehat{f}_{n,b}^{-i}({Y_{i}}|{X_{i}})W_{1}\left(X_{i}\right)W_{2}\left(Y_{i}\right),$$

where

$$\widehat{f}_{n,b}^{-i}({t}|{X_{i}})=\frac{\sum_{j=1,j\neq i}^{n}\frac{1}{h_{1}}K_{1}(h_{1}^{-1}(Y_{j}-t))K(h^{-1}d(x,X_{j}))}{\sum_{j=1,i\neq j}^{n}K(b^{-1}d(x,X_{j}))}.$$
(5.6)

Then the smoothing parameter \(b\) is selected by the following procedure:

$$\widehat{h}=\arg\min_{a_{n}\leq h\leq b_{n}}\textrm{CV}(h).$$

While the aforementioned cross-validation procedures focus on approximating quadratic errors of estimation, alternative approaches for selecting smoothing parameters may prioritize optimizing the predictive power of the method. This can be achieved by minimizing one of the following prediction criteria

$$\widetilde{h}^{(1)}=\arg\min_{a_{n}\leq h\leq b_{n}}\sum_{i=1}^{n}\left(Y_{i}-\widehat{Y}_{i}^{(1)}\right)^{2},$$

where the prediction is performed using either of the conditional median

$$\widehat{F}_{nh}^{-i}\left(\frac{1}{2}|x,h\right)=\frac{\sum_{j=1,j\neq i}^{n}{\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{(\infty,t]}(Y_{j})K(h^{-1}d(x,X_{j}))}{\sum_{j=1,j\neq i}^{n}K(h^{-1}d(x,X_{j}))}\text{ and }\widehat{Y}_{i}^{(1)}=\left\{\widehat{F}_{nh}^{-i}\right\}^{-1}\left(\frac{1}{2}|X_{i},h\right)$$
(5.7)

or

$$\widetilde{h}^{(2)}=\arg\min_{a_{n}\leq b\leq b_{n}}\sum_{i=1}^{n}\left(Y_{i}-\widehat{Y}_{i}^{(2)}\right)^{2},$$

using the conditional mode, viz.

$$\widehat{Y}_{i}^{(2)}=\arg\max_{y}\widehat{f}_{n,b}^{-i}({y}|{X_{i}}).$$

For more discussion, one can refer to [16, 17, 29].

Remark 5.2. It is essential to highlight that the primary challenge in employing an estimator like the one in (2.1) lies in properly selecting the smoothing parameter \(h\). The consistency results with uniformity in bandwidth, as presented in Corollary 4.1, indicate that the choice of \(h_{1}\) and \(h_{2}\) within certain intervals guarantees the moderate deviations principle for \(\widehat{r}_{n}^{l}(x,h)\). In other words, the fluctuations in bandwidth within a small interval do not impact the moderate deviations principle of the nonparametric estimator \(\widehat{r}_{n}^{l}(x,h)\) for \(r^{l}(x)\).

Remark 5.3. It is straightforward to modify the proofs of our results to show that it remains true when the entropy condition is substituted by the bracketing condition.Footnote 3 For some \(C_{0}>0\) and \(v_{0}>0,\)

$$\mathcal{N}_{[]}\left(\mathcal{F},L_{2}(\mathbb{P}),\epsilon\right)\leq C_{0}\epsilon^{-v_{0}},\quad 0<\epsilon<1.$$

Remark 5.4. Observe that the standardizing factor is \(\left(n\phi(h_{n})\right)^{1/2}\) with \(\phi(h_{n})\rightarrow 0\), indicating a lower rate of convergence. This is the cost incurred when concluding the conditional (local) quantities.

6 CONCLUSIONS

We employ general empirical process methods to establish, under mild regularity conditions, the functional moderate deviations of kernel-type function estimators that are dependent on an infinite-dimensional covariate. We present a valuable moderate deviation principle for a function-indexed process by leveraging intricate exponential contiguity arguments. The moderate deviation principles are a useful tool in analyzing the behavior of the estimators in question. This paper aims to make several noteworthy contributions to the existing literature on functional data analysis. Specifically, we focus on establishing functional moderate deviation principles for the Nadaraya–Watson estimators, the conditional distribution processes, the kernel quantile regression, the kernel conditional density function, and the kernel conditional copula function. Our findings extend the current knowledge in the field and offer new avenues for future research in functional data analysis. Extending our work to encompass \(k\)-nearest neighbors estimators holds a significant interest. However, achieving this goal requires the development of new technical arguments, as it presently lies beyond reasonable expectations. Exploring the realm of \(k\)-nearest neighbors estimators would expand the scope of our research and provide valuable insights into their performance and properties. Additional extensions involve models about dimension reduction. While our paper presents fully nonparametric functional models, recent FDA literature has emphasized the interest in semi-parametric models as a bridge between flexible (but excessively dimensional) and low dimensional (but excessively restrictive) linear models. This field comprises, for example, functional single index models (refer to [37, 38, 74] for the most recent advancements), projection pursuit models (refer to [40]), and partial linear models (refer to [5, 92]). As far as we are aware, the majority of the literature proposed in these models relates to moderate deviations principals, and we hypothesize that our ideas and methodologies could likely be used successfully for deriving such results, thereby opening up new research avenues. Such an extension would require innovative approaches and advanced theoretical frameworks to effectively tackle the challenges associated with projection pursuit regression and projection pursuit conditional distribution processes. By embarking on this path, we can further enrich the existing literature and contribute to advancing functional data analysis. Finally, the extensions of our ideas would concern dependent statistical samples with possible time series applications. The literature on dependent kernel functional estimators has been rather developed (cf. [24, 28, 62, 100, 128]) but always with moderate deviation results. Note that this extension should be harder to get than the previous ones because one main difficulty in developing such a dependent extension of our work would be the statement of new probabilistic results since those used herein (cf. results in [6, 8]) are specific to iid samples. In conclusion, our ongoing exploration involves extending our ideas to encompass dependent statistical samples, with potential applications in time series analysis. The existing body of work on dependent kernel functional estimators has seen significant development, as documented in sources such as [24, 28, 62, 100]. However, it is worth noting that these developments have generally been investigated without moderate deviation results. It is important to acknowledge that extending our research in this direction presents a more formidable challenge compared to our previous endeavors. The primary obstacle lies in the necessity of formulating novel probabilistic results, as the ones employed in our current work, as demonstrated in [6–7], are tailored specifically for independent and identically distributed (i.i.d.) samples.

PROOFS

In this section, we present the proofs for all the theoretical findings outlined in this study. The previously introduced notation will be consistently applied throughout the ensuing discussion. We now explore slightly more generalized processes compared to those defined in (2.2). Specifically, we offer a broader framework that does not necessitate \(\mathcal{M}_{x,l}(\cdot)=c_{l}(x)l(\cdot)+d_{l}(x)\). The proof of Theorems 3.1 and 3.2 is intricate and will be dissected into multiple lemmas, each elucidated in Section 8.

Proof of Theorem 3.1

The proof utilizes the Gärtner–Ellis theorem as a principal tool. For any integer \(k\geq 1\) and arbitrary \((x,l,\varrho)\in\mathcal{E}\times\mathcal{L}\times\mathcal{H}_{0}\), it is asserted that, for any nonnegative real function \(M_{x,l}(\cdot)\) defined over the domain \(\mathbb{R}\), one may discern the following

$$\mathbf{E}[M_{x,l}(Y)\Delta_{1}(x,\varrho h)]=\int\limits_{0}^{1}\int M_{x,l}(v)K(u)\,d\mathbb{P}\left(\frac{d(x,X_{1})}{\varrho h}\leq u,Y\leq v\right)$$
$${}=\int\limits_{0}^{1}\int M_{x,l}(v)K(u)\,d\mathbb{P}\left(\frac{d(x,X_{1})}{\varrho h}\leq u|Y=v\right)g(v)\,dv$$
$${}:=\int M_{x,l}(v)\mathcal{S}_{x,\varrho}(v,h)g(v)\,dv.$$

Given the existence of \(K^{\prime}(\cdot)\), it follows

$$K(u)=K(0)+\int\limits_{0}^{u}K^{\prime}(t)\,dt,$$

which, by using a Condition (A2)(i), implies that

$$\mathcal{S}_{x,\varrho}(v,h)=K(0)F_{x}^{v}(\varrho h)+\int\limits_{0}^{1}\left(\int\limits_{0}^{t}K^{\prime}(u)\,du\right)\,d\mathbb{P}\left(\frac{d(x,X_{1})}{\varrho h}\leq t|Y=v\right)$$
$${}=K(0)F_{x}^{v}(\varrho h)+\int\limits_{0}^{1}K^{\prime}(u)\mathbb{P}\left(u\leq\frac{d(x,X_{1})}{\varrho h}\leq 1|Y=v\right)du$$
$${}=K(1)F_{x}^{v}(\varrho h)-\int\limits_{0}^{1}K^{\prime}(u)F_{x}^{v}(u\varrho h)\,du$$
$${}=K(1)\Big{(}\phi(\varrho h)f_{v}(x)+g_{x,v}(\varrho h)\Big{)}-\int\limits_{0}^{1}K^{\prime}(u)\Big{(}\phi(u\varrho h)f_{v}(x)+g_{x,v}(u\varrho h)\Big{)}\,du$$
$${}=\phi(\varrho h)\left\{K(1)\Big{(}f_{v}(x)+\frac{g_{x,v}(\varrho h)}{\phi(\varrho h)}\Big{)}-\int\limits_{0}^{1}K^{\prime}(u)\frac{\phi(u\varrho h)}{\phi(\varrho h)}\Big{(}f_{v}(x)+\frac{g_{x,v}(u\varrho h)}{\phi(u\varrho h)}\Big{)}\,du\right\}$$
$${}:=\phi(\varrho h)L_{x,\varrho}(v,h).$$

Therefore, for any \((x,l,\varrho)\in\mathcal{E}\times\mathcal{L}\times\mathcal{H}_{0}\), we infer

$$\mathbf{E}[M_{x,l}(Y)\Delta_{1}(x,\varrho h)]=\phi(\varrho h)\int L_{x,\varrho}(v,h)M_{x,l}(v)g(v)\,dv.$$
(7.1)

Let \(y_{1}:=(x,l_{1})\) and \(y_{2}:=(x,l_{2})\) in \(\mathcal{E}\times\mathcal{L}\), and \((\varrho_{1},\varrho_{2})\in(\mathcal{H}_{0})^{2}\). For any real function \(\mathcal{M}_{y_{1},y_{2}}(\cdot)\) defined on \(\mathbb{R}\) such that \(\mathbf{E}[|\mathcal{M}_{y_{1},y_{2}}(Y)|]<\infty\), observe that

$$\mathbf{E}[\mathcal{M}_{y_{1},y_{2}}(Y)\Delta_{1}(x,\varrho_{1}h)\Delta_{1}(x,\varrho_{2}h)]$$
$${}=\int\limits_{0}^{1}\int\mathcal{M}_{y_{1},y_{2}}(v)K(\frac{\varrho}{\varrho_{1}}u)K(\frac{\varrho}{\varrho_{2}}u)\,d\mathbb{P}\left(\frac{d(x,X_{1})}{\varrho h}\leq u|Y=v\right)g(v)\,dv$$
$${}:=\int\mathcal{M}_{y_{1},y_{2}}(v)\mathcal{S}_{x,\varrho_{1},\varrho_{2}}(v,h)g(v)\,dv.$$

Similarly, we have

$$K\bigg{(}\frac{\varrho}{\varrho_{1}}u\bigg{)}K\bigg{(}\frac{\varrho}{\varrho_{2}}u\bigg{)}=K(0)^{2}+\int\limits_{0}^{u}\bigg{(}K\bigg{(}\frac{\varrho}{\varrho_{1}}t\bigg{)}K\bigg{(}\frac{\varrho}{\varrho_{2}}t\bigg{)}\bigg{)}^{\prime}\,dt,$$

which, by using the Condition (A2)(i), implies that

$$\mathcal{S}_{x,\varrho_{1},\varrho_{2}}(v,h)=\phi(\varrho h)\Bigg{\{}K\bigg{(}\frac{\varrho}{\varrho_{1}}\bigg{)}K\bigg{(}\frac{\varrho}{\varrho_{2}}\bigg{)}\Bigg{(}f_{v}(x)+\frac{g_{x,v}(\varrho h)}{\phi(\varrho h)}\Bigg{)}$$
$${}-\int\limits_{0}^{1}\bigg{(}K\bigg{(}\frac{\varrho}{\varrho_{1}}u\bigg{)}K\bigg{(}\frac{\varrho}{\varrho_{2}}u\bigg{)}\bigg{)}^{\prime}\frac{\phi(u\varrho h)}{\phi(\varrho h)}\Bigg{(}f_{v}(x)+\frac{g_{x,v}(u\varrho h)}{\phi(u\varrho h)}\Bigg{)}\,du\Bigg{\}}$$
$${}:=\phi(\varrho h)\mathcal{L}_{x,\varrho_{1},\varrho_{2}}(v,h).$$

Therefore, we have

$$\mathbf{E}[\mathcal{M}_{y_{1},y_{2}}(Y)\Delta_{1}(x,\varrho_{1}h)\Delta_{1}(x,\varrho_{2}h)]=\phi(\varrho h)\int\mathcal{L}_{x,\varrho_{1},\varrho_{2}}(v,h)\mathcal{M}_{y_{1},y_{2}}(v)g(v)\,dv.$$
(7.2)

For some \(\gamma\in\mathcal{H}_{0}\), set \(\beta_{n}=n\phi(\gamma h)/w_{n}\). For any tuple \((\theta_{1},\ldots,\theta_{m})\in\mathbb{R}^{m}\), the Laplace transform corresponding to \(\beta_{n}(W_{n}(x,z_{1}),\ldots,W_{n}(x,z_{m}))\) is explicitly defined as follows

$${\Phi^{x,z_{1},\ldots,z_{m}}_{n}(\theta_{1},\ldots,\theta_{m})}$$
$${}=\mathbf{E}\Bigg{[}\exp\bigg{\{}\Big{\langle}(\theta_{1},\ldots,\theta_{m}),\beta_{n}\big{(}W_{n}(x,z_{1}),\ldots,W_{n}(x,z_{m})\big{)}\Big{\rangle}\bigg{\}}\Bigg{]}$$
$${}=\mathbf{E}\left[\exp\left\{\sum_{i=1}^{n}\sum_{j=1}^{m}\frac{\beta_{n}\theta_{j}}{n\mathbf{E}\big{[}\Delta_{1}(x,\varrho_{j}h)]}\Big{(}\mathcal{M}_{x,l_{j}}(Y_{i})\Delta_{i}(x,\varrho_{j}h)-\mathbf{E}\big{[}\mathcal{M}_{x,l_{j}}(Y_{i})\Delta_{i}(x,\varrho_{j}h)\big{]}\Big{)}\right\}\right]$$
$${}=\left(\mathbf{E}\left[\exp\left\{\sum_{j=1}^{m}\frac{\beta_{n}\theta_{j}}{n\mathbf{E}\big{[}\Delta_{1}(x,\varrho_{j}h)]}\Big{(}\mathcal{M}_{x,l_{j}}(Y)\Delta_{1}(x,\varrho_{j}h)-\mathbf{E}\big{[}\mathcal{M}_{x,l_{j}}(Y)\Delta_{1}(x,\varrho_{j}h)\big{]}\Big{)}\right\}\right]\right)^{n}$$
$${}:=\Big{(}\varphi^{x,z_{1},\ldots,z_{m}}_{n}(\theta_{1},\ldots,\theta_{m})\Big{)}^{n}.$$
(7.3)

Let us now evaluate the quantity \(\varphi^{x,z_{1},\ldots,z_{m}}_{n}(\theta_{1},\ldots,\theta_{m})\). We remark that

$$\varphi^{x,z_{1},\ldots,z_{m}}_{n}(\theta_{1},\ldots,\theta_{m})=1+\frac{1}{2}\mathcal{T}_{1}+\mathcal{T}_{2},$$
(7.4)

where

$$\mathcal{T}_{1}=\mathbf{E}\left[\left\{\sum_{j=1}^{m}\frac{\theta_{j}\beta_{n}}{n\mathbf{E}[\Delta_{1}(x,\varrho_{j}h)]}\bigg{(}\mathcal{M}_{x,l_{j}}(Y)\Delta_{1}(x,\varrho_{j}h)-\mathbf{E}[\mathcal{M}_{x,l_{j}}(Y)\Delta_{1}(x,\varrho_{j}h)]\bigg{)}\right\}^{2}\right]$$

and

$$\mathcal{T}_{2}=\sum_{k=3}^{\infty}\frac{1}{k!}\bigg{(}\frac{\beta_{n}}{n}\bigg{)}^{k}\mathbf{E}\left[\left|\sum_{j=1}^{m}\frac{\theta_{j}}{\mathbf{E}[\Delta_{1}(x,\varrho_{j}h)]}\bigg{\{}\mathcal{M}_{x,l_{j}}(Y)\Delta_{1}(x,\varrho_{j}h)-\mathbf{E}\Big{[}\mathcal{M}_{x,l_{j}}(Y)\Delta_{1}(x,\varrho_{j}h)\Big{]}\bigg{\}}\right|^{k}\right].$$

Now, observe that

$$\mathcal{T}_{1}=\sum_{j=1}^{m}\sum_{p=1}^{m}\frac{\theta_{j}\theta_{p}\beta_{n}^{2}}{n^{2}\mathbf{E}[\Delta_{1}(x,\varrho_{j}h)]\mathbf{E}[\Delta_{1}(x,\varrho_{p}h)]}\mathbf{E}\Bigg{[}\mathcal{M}_{x,l_{j}}(Y))\mathcal{M}_{x,l_{p}}(Y)\Delta_{1}(x,\varrho_{j}h)\Delta_{1}(x_{p},\varrho_{p}h)\Bigg{]}$$
$${}-\sum_{j=1}^{m}\sum_{p=1}^{m}\frac{\theta_{j}\theta_{p}\beta_{n}^{2}}{n^{2}\mathbf{E}[\Delta_{1}(x,\varrho_{j}h)]\mathbf{E}[\Delta_{1}(x,\varrho_{p}h)]}\mathbf{E}\Bigg{[}\mathcal{M}_{x,l_{j}}(Y)\Delta_{1}(x,\varrho_{j}h)\Bigg{]}\mathbf{E}\Bigg{[}\mathcal{M}_{x,l_{p}}(Y)\Delta_{1}(x,\varrho_{p}h)\Bigg{]}$$
$${}:=\frac{\beta_{n}^{2}}{n^{2}}(\mathcal{T}_{1,1}-\mathcal{T}_{1,2}).$$
(7.5)

Let \(\varrho=\min(\varrho_{j},\varrho_{p})\) be defined. Referring to (7.2) and under the conditions (A1)–(A2) and (A4)(ii) we deduce that

$$\lim_{n\rightarrow\infty}\frac{1}{\phi(\varrho h)}\mathbf{E}\Bigg{[}\mathcal{M}_{x,l_{j}}(Y))\mathcal{M}_{x,l_{p}}(Y)\Delta_{1}(x,\varrho_{j}h)\Delta_{1}(x,\varrho_{p}h)\Bigg{]}$$
$${}=\alpha_{1}(\varrho_{j},\varrho_{p})\int\mathcal{M}_{x,l_{j}}(v))\mathcal{M}_{x,l_{p}}(v)f_{v}(x)g(v)\,dv,$$
(7.6)

where

$$\alpha_{1}(\varrho_{j},\varrho_{p})=K\bigg{(}\frac{\varrho}{\varrho_{j}}\bigg{)}K\bigg{(}\frac{\varrho}{\varrho_{p}}\bigg{)}-\int\limits_{0}^{1}\bigg{(}K\bigg{(}\frac{\varrho}{\varrho_{j}}u\bigg{)}K\bigg{(}\frac{\varrho}{\varrho_{p}}u\bigg{)}\bigg{)}^{\prime}\tau_{0}(u)du,$$

Using the equality (7.1) with \(M_{x,l}(v)=1\), in accordance with conditions (A1)–(A2), we find

$$\lim_{n\rightarrow\infty}\frac{1}{\phi(\varrho_{j}h)}\mathbf{E}[\Delta_{1}(x,\varrho_{j}h)]=\left(K(1)-\int\limits_{0}^{1}K^{\prime}(u)\tau_{0}(u)\,du\right)\int f_{v}(x)g(v)\,dv$$
$${}:=\alpha_{0}\int f_{v}(x)g(v)\,dv.$$
(7.7)

Now, by Condition (A2)(ii), we have

$$\lim_{n\rightarrow\infty}\frac{\phi(\gamma h)\phi(\varrho h)}{\phi(\varrho_{j}h)\phi(\varrho_{p}h)}=\frac{\tau(\varrho,\gamma)}{\tau(\varrho_{j},\gamma)\tau(\varrho_{p},\gamma)}.$$

Therefore, we obtain

$$\lim_{n\rightarrow\infty}\phi(\gamma h)\mathcal{T}_{1,1}=\sum_{j=1}^{m}\sum_{p=1}^{m}\theta_{j}\theta_{p}\frac{\alpha_{1}(\varrho_{j},\varrho_{p})\tau(\varrho,\gamma)}{\alpha_{0}^{2}\tau(\varrho_{j},\gamma)\tau(\varrho_{p},\gamma)}\frac{\displaystyle\int\mathcal{M}_{x,l_{j}}(v))\mathcal{M}_{x,l_{p}}(v)f_{v}(x)g(v)\,dv}{\displaystyle\Bigg{(}\int f_{v}(x)g(v)\,dv\Bigg{)}\Bigg{(}\int f_{v}(x)g(v)\,dv\Bigg{)}}.$$
(7.8)

Making use of the condition (A3), we infer that

$$\lim_{n\rightarrow\infty}\frac{\beta_{n}^{2}}{n^{2}}\mathcal{T}_{1,1}=0.$$
(7.9)

Likewise, by the condition (A3), we derive

$$\lim_{n\rightarrow\infty}\phi(\gamma h)\mathcal{T}_{1,2}=0\quad\textrm{and}\quad\lim_{n\rightarrow\infty}\frac{\beta_{n}^{2}}{n^{2}}\mathcal{T}_{1,2}=0.$$
(7.10)

Using the \(C_{r}\)-inequality and the boundedness of \(K(\cdot)\), we get, for some \(\kappa_{0}>0,\)

$$\mathcal{T}_{2}\leq\sum_{j=1}^{m}\sum_{k=3}^{\infty}\frac{1}{k!}\bigg{(}\frac{2^{m}\beta_{n}}{n}\bigg{)}^{k}\frac{\kappa_{0}^{k}|\theta_{j}|^{k}\mathbf{E}\left[|\mathcal{M}_{x,l_{j}}(Y)|^{k}\Delta_{1}(x,\varrho_{j}h)\right]}{\Big{(}\mathbf{E}[\Delta_{1}(x,\varrho_{j}h)]\Big{)}^{k}}.$$
(7.11)

By (7.1), and making use of the conditions (A1)–(A2), there exists a constant \(\kappa_{1}>0\) such that

$${\mathbf{E}\left[|\mathcal{M}_{x,l_{j}}(Y)|^{k}\Delta_{1}(x,\varrho_{j}h)\right]}$$
$${}<\kappa_{1}^{k}\phi(\varrho_{j}h)\left(\int|\mathcal{M}_{x,l_{j}}(v)|^{k}f_{v}(x)g(v)dv+\int|\mathcal{M}_{x,l_{j}}(v)|^{k}g(v)dv\right).$$
(7.12)

Again, by the equality (7.1) with \(M_{x,l}(\cdot)=1\), we have

$$\mathbf{E}[\Delta_{1}(x,\varrho_{j}h)]=\phi(\varrho_{j}h)\int L_{x,\varrho_{j}}(v,h)g(v)\,dv:=\phi(\varrho_{j}h)T(x,\varrho_{j}h).$$
(7.13)

For each \(j=1,\ldots,m\), under conditions (A1)–(A2) and considering the fact that \(f(x)>0\), we obtain

$$\lim_{h\rightarrow 0}T(x,\varrho_{j}h)>0.$$
(7.14)

For sufficiently large \(n\) and for some \(\gamma\in\mathcal{H}_{0}\), there exists \(t_{0}>0\) such that \(2^{m}\beta_{n}/(n\phi(\gamma h))=2^{m}/w_{n}\leq t_{0}\). Now, by employing (7.11)–(7.14), we derive

$$\mathcal{T}_{2}\leq\bigg{(}\frac{2^{m}\beta_{n}}{t_{0}n\phi(\gamma h)}\bigg{)}^{3}\phi(\gamma h)\sum_{j=1}^{m}\big{(}\tau(\varrho_{j},\gamma)+o(1)\big{)}\bigg{(}\int\bigg{[}\exp\bigg{\{}\frac{t_{0}\kappa_{0}|\theta_{j}\mathcal{M}_{x,l_{j}}(v)|}{\big{(}\tau(\varrho_{j},\gamma)+o(1)\big{)}T(x,\varrho_{j}h)}\bigg{\}}f_{v}(x)$$
$${}+\exp\bigg{\{}\frac{t_{0}C|\theta_{j}\mathcal{M}_{x,l_{j}}(v)|}{\big{(}\tau(\varrho_{j},\gamma)+o(1)\big{)}T(x,\varrho_{j}h)}\bigg{\}}\bigg{]}g(v)dv\bigg{)}.$$

Making use of (7.14), we readily infer

$$\mathcal{T}_{2}\leq\kappa_{2}\left(\frac{2^{m}\beta_{n}}{t_{0}n\phi(\gamma h)}\right)^{3}\phi(\gamma h)\sum_{j=1}^{m}\Bigg{(}\int\big{[}\exp\left(t_{0}\kappa_{3}|\theta_{j}\mathcal{M}_{x,l_{j}}(v)|\right)f_{v}(x)$$
$${}+\exp\left(t_{0}\kappa_{3}|\theta_{j}\mathcal{M}_{x,l_{j}}(v)|\right)\big{]}g(v)dv\bigg{)},$$

where \(\kappa_{2},\kappa_{3}>0\). By Conditions (A4)(ii)–(iii), we get

$$\mathcal{T}_{2}=O\Bigg{(}\bigg{(}\frac{\beta_{n}}{n\phi(\gamma h)}\bigg{)}^{3}\phi(\gamma h)\Bigg{)}.$$
(7.15)

Thus, based on the Condition (A3), we deduce

$$\lim_{n\rightarrow\infty}\mathcal{T}_{2}=0\quad\textrm{and}\quad\lim_{n\rightarrow\infty}\frac{n^{2}\phi(\gamma h)}{\beta_{n}^{2}}\mathcal{T}_{2}=0.$$
(7.16)

By combining (7.3)–(7.10) with (7.15), we readily obtain

$$\lim\limits_{n\rightarrow\infty}\frac{n\phi(\gamma h)}{\beta_{n}^{2}}\log\Phi^{x,z_{1},\ldots,z_{m}}_{n}(\theta_{1},\ldots,\theta_{m})=\frac{1}{2}\sum_{j=1}^{m}\sum_{p=1}^{m}\theta_{j}\theta_{p}R_{\gamma}(x,z_{j},z_{p})$$
$${}:=\Psi_{\gamma}^{x,z_{1},\ldots,z_{m}}(\theta_{1}\ldots,\theta_{m}),$$

where the function \(R_{\gamma}\) is defined in the Statement (3.1). Note that the Condition (A4) implies that the function \(\Psi_{\gamma}^{x,z_{1},\ldots,z_{m}}\) is finite and differentiable everywhere. The Fenchel–Legendre transform of \(\Psi_{\gamma}^{x,z_{1},\ldots,z_{m}}\) is given by

$$\Gamma^{\gamma}_{x,z_{1}\ldots,z_{m}}(\lambda_{1},\ldots,\lambda_{m})=\sup_{(\theta_{1},\ldots,\theta_{m})}\left\{\sum_{i=1}^{m}\lambda_{j}\theta_{j}-\Psi_{\gamma}^{x,z_{1},\ldots,z_{m}}(\theta_{1},\ldots,\theta_{m})\right\}.$$

We need to establish the essential smoothness of the function \(\Psi_{\gamma}^{x,z_{1},\ldots,z_{m}}\) and utilize the Gärtner–Ellis theorem for the proof, as outlined in [49] on page 44. Under the assumption (A4), it is evident that the interior of the set

$$D=\{(\theta_{1},\ldots,\theta_{m}):\Psi_{\gamma}^{x,z_{1},\ldots,z_{m}}(\theta_{1},\ldots,\theta_{m})<\infty\}$$

is not empty. Furthermore, considering these conditions, the function \(\Psi_{\gamma}^{z_{1},\ldots,z_{m}}\) is shown to be steep, establishing its essential smoothness. In conclusion, this ensures that for any \(\theta_{1},\ldots,\theta_{m}\in\mathbb{R}\), the function holds the properties necessary for applying the Gärtner–Ellis theorem in our proof

$$\sum_{j,p}^{m}\theta_{j}\theta_{p}R_{\gamma}(x,z_{j},z_{p})\geq 0,$$

there exists a mean-zero Gaussian process \(\{\Xi_{\gamma}(x,z),z\in\mathcal{L}\times\mathcal{H}_{0}\}\) such that

$$\mathbf{E}[\Xi_{\gamma}(x,z_{1})\Xi_{\gamma}(x,z_{2})]=R_{\gamma}(x,z_{1},z_{2})\text{ for each }z_{1},z_{2},\in\mathcal{L}\times\mathcal{H}_{0}.$$

Considering Eq. (3.3) and employing Lemma \(4.1\) from the work of [7], one can express the rate function as follows

$$\Gamma^{\gamma}_{x,z_{1}\ldots,z_{m}}(\lambda_{1},\ldots,\lambda_{m})=\inf\Bigg{\{}2^{-1}\mathbf{E}[\xi^{2}]:\xi\in\mathcal{Z}_{\gamma},\ \mathbf{E}[\Xi_{\gamma}(x,z_{j})\xi]=\lambda_{j}\ \text{for each}\ 1\leq j\leq m\Bigg{\}},$$

where \(\mathcal{Z}_{\gamma}\) is defined in the introduction of this rate function [7, p. 6]. \(\Box\)

The proof of The demonstration of Theorem 3.2 requires several intermediary results, which we present in the following set of lemmas.

Lemma 7.1. Under the assumptions (A1)–(A2) and (B3) we have, for any nonnegative function \(M_{0}(\cdot)\) such that \(\mathbf{E}[M_{0}(Y)]<\infty\),

$$\lim_{n\rightarrow\infty}\sup_{(x,\varrho)\in\mathcal{J}\times\mathcal{H}_{0}}\bigg{|}\frac{\mathbb{E}[M_{0}(Y)\Delta_{1}(x,\varrho h)]}{\phi(\varrho h)}-\alpha_{0}\int M_{0}(v)f_{v}(x)g(v)\,dv\bigg{|}=0,$$

where \(\alpha_{0}\) is given in (3.2).

The proof of Lemma 7.1 is postponed until Section 8.

In the proof of Theorem 3.2 we shall apply Lemma 7.1 with \(M_{0}(\cdot)\) belongs to a finite class \(\mathcal{M}_{0}\) of nonnegative real valued functions such that \(\mathbf{E}[M_{0}(Y)]<\infty\). Denote

$$\epsilon_{n}=\sup_{(x,\varrho)\in\mathcal{J}\times\mathcal{H}_{0}}\left|\frac{\mathbb{E}[M_{0}(Y)\Delta_{1}(x,\varrho h)]}{\phi(\varrho h)}-\alpha_{0}\int M_{0}(v)f_{v}(x)g(v)\,dv\right|.$$

By Lemma 7.1

$$\epsilon_{n}\rightarrow 0\quad\textrm{as}\quad n\rightarrow\infty.$$

By condition (B3), we have \(\delta_{f}:=\inf_{v}\inf_{x\in\mathcal{J}}f_{v}(x)>0\) and therefore, for \(n\) large enough,

$$\epsilon_{n}\leq\frac{\delta_{f}}{2}\min\left\{\int M_{0}(v)g(v)\,dv:M_{0}\in\mathcal{M}_{0}\right\}.$$

Then for any \((x,\varrho)\in\mathcal{J}\times\mathcal{H}_{0}\) and \(M_{0}(\cdot)\in\mathcal{M}_{0}\), we have

$$\frac{\phi(\varrho h)\alpha_{0}}{2}J(M_{0})\leq\mathbb{E}[M_{0}(Y)\Delta_{1}(x,\varrho h)]\leq 2\phi(\varrho h)\alpha_{0}J(M_{0}),$$

where \(J(M_{0}):=\int M_{0}(v)f_{v}(x)g(v)\,dv\). By assumptions (A1) and (A2) we can see that \(\alpha_{0}>0\). Then by condition (B4)(i) and the fact that \(0<\tau_{0}\left(\displaystyle\frac{\vartheta_{1}}{\vartheta_{2}}\right)\leq\tau_{0}\left(\displaystyle\frac{\varrho}{\gamma}\right)\), we obtain, for any \((x,\varrho)\in\mathcal{J}\times\mathcal{H}_{0}\),

$$C_{1}\phi(\gamma h)\leq\mathbb{E}[M_{0}(Y)\Delta_{1}(x,\varrho h)]\leq C_{2}\phi(\gamma h),$$
(7.17)

where \(C_{1}\) and \(C_{2}\) are two strictly positive constants. For any \(l\in\mathcal{L}\), set

$$\bar{\mathcal{M}}_{x,l}(v)=c_{l}(x)l(v){\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{\{L(v)\leq\mathcal{M}^{\textrm{inv}}(n)\}}+d_{l}(x).$$
(7.18)

Lemma 7.2. Assuming that the conditions (A1)–(A2), (B1), (B3), and (B4)(i) or (B\({}^{\prime}\)4)(i) are satisfied, along with \(\tau_{0}\left(\displaystyle\frac{\vartheta_{1}}{\vartheta_{2}}\right)>0\), then for any given \(\epsilon>0\), there exists a finite subclass \(\mathcal{L}_{\epsilon}\) of \(\mathcal{L}\) such that, for sufficiently large \(n\), for any \(l_{1}\in\mathcal{L}\), we have

$$\min_{l_{2}\in\mathcal{L}_{\epsilon}}\sup_{x\in\mathcal{J}}\sup_{\varrho\in\mathcal{H}_{0}}\mathbf{E}\Bigg{[}\bigg{\{}\bar{\mathcal{M}}_{x,l_{1}}(Y)-\bar{\mathcal{M}}_{x,l_{2}}(Y)\bigg{\}}^{2}\frac{\big{(}\Delta_{1}(x,\varrho h)\big{)}^{2}}{(\mathbf{E}[\Delta_{1}(x,\varrho h)])^{2}/\phi(\gamma h)}\Bigg{]}\leq\epsilon.$$
(7.19)

The proof of Lemma 7.2 is postponed until Section 8.

Set \(\epsilon>0\) and select \(n_{0}>0\) to be sufficiently large such that (7.19) holds for all \(n\geq n_{0}\). For any \(l_{1},l_{2}\in\mathcal{L}\), define

$$d_{\mathcal{L}}(l_{1},l_{2})=\sup_{n\geq n_{0}}\sup_{x\in\mathcal{J}}\sup_{\varrho\in\mathcal{H}_{0}}\mathbf{E}\Bigg{[}\bigg{\{}\bar{\mathcal{M}}_{x,l_{1}}(Y)-\bar{\mathcal{M}}_{x,l_{2}}(Y)\bigg{\}}^{2}\frac{\big{(}\Delta_{1}(x,\varrho h)\big{)}^{2}}{(\mathbf{E}[\Delta_{1}(x,\varrho h)])^{2}/\phi(\gamma h)}\Bigg{]}.$$

Consider any \(\zeta_{1},\zeta_{2}>0\) and define

$$\mathcal{F}_{0}(\zeta_{1},\zeta_{2})=\Big{\{}(z_{1},z_{2})\in\big{(}\mathcal{L}\times\mathcal{H}_{0}\big{)}^{2}:d_{\mathcal{L}}(l_{1},l_{2})\leq\zeta_{1}\ \textrm{and}\ |\varrho_{1}-\varrho_{2}|\leq\zeta_{2}\Big{\}},$$

where \(z_{i}=(l_{i},\varrho_{i})\), \(i=1,2\). For any \(0<\delta<1\), let \(l_{n}=\mathcal{N}(\delta\phi(\gamma h),\mathcal{J},d)\).

Proposition 7.3. Under assumptions of Theorem 3.2, for any \(\eta>0\), we have, for any \(x\in\mathcal{J}\)

$$\lim\limits_{(\delta,\zeta_{1},\zeta_{2})\rightarrow(0,0,0)}\limsup_{n\rightarrow\infty}\frac{w_{n}^{2}}{n\phi(\gamma h)}$$
$${}\times\log\mathbb{P}\Bigg{(}\sup_{y\in B(x,\delta\phi(\gamma h))}\sup_{(z_{1},z_{2})\in\mathcal{F}_{0}(\zeta_{1},\zeta_{2})}w_{n}|W_{n}(x,z_{1})-W_{n}(y,z_{2})|\geq\eta\Bigg{)}=-\infty,$$

where \(B(x,\delta\phi(\gamma h))\) is the open ball with the center \(x\) and the radius \(\delta\phi(\gamma h)\). In order to prove Proposition 7.3, we need an exponential inequality for the empirical process. Let us introduce first some additional notation. Let \((\mathcal{X},\mathcal{A})\) be a measurable space on which we consider a uniformly bounded collection of measurable functions \(\mathcal{F}\). The class \(\mathcal{F}\) is said to be a bounded measurable VC class of functions if it satisfies the Condition (B1). For any map \(T\) from \(\mathcal{F}\) into \(\mathbb{R}\), set

$$||T||_{\mathcal{F}}=\sup_{g\in\mathcal{F}}|T(g)|.$$

Let \(\mu\) be any probability measure on \((\mathcal{X},\mathcal{A})\) and \(Pr=\prod_{i\in\mathbb{N}}\) \(\mu_{i}\) the probability measure product where, for \(i\in\mathbb{N}\), \(\mu_{i}=\mu\). Set \(\pi_{i}:\mathcal{X}^{\mathbb{N}}\mapsto\mathcal{X}\), \(i\in\mathbb{N}\), to be the coordinate functions. The following lemma is due to [71].

Lemma 7.4. Let \(\mathcal{F}\) be a measurable uniformly VC class of functions, and \(\sigma^{2}\) and \(U\) be any numbers such that \(\sigma^{2}\geq\sup_{f\in\mathcal{F}}{\textrm{Var}}_{Pr}(f)\), \(U\geq\sup_{f\in\mathcal{F}}||f||_{\infty},\) and \(0<\sigma^{2}\leq U/2\). Then there exist constants \(C\) and \(M\) depending only on the characteristic \((C,\nu)\) of the class \(\mathcal{F}\), such that the inequality

$$Pr\Bigg{(}\left|\left|\sum_{i=1}^{n}g(\pi_{i})-\mathbf{E}_{Pr}[g(\pi_{i})]\right|\right|_{\mathcal{F}}>t\Bigg{)}$$
$${}\leq M\exp\Bigg{\{}-\frac{t}{MU}\log\bigg{(}1+\frac{tU}{M(\sqrt{n}\sigma+U\sqrt{\log(U/\sigma)})^{2}}\bigg{)}\Bigg{\}},$$
(7.20)

whenever

$$t\geq C\left(U\log\Bigg{(}\frac{U}{\sigma}\Bigg{)}+\sqrt{n}\sigma\sqrt{\log(U/\sigma)}\right).$$

The proof of Proposition 7.3 is split up into two cases, the unbounded case where the Assumption (B4) is assumed and the bounded case where we suppose that the Condition (B\({}^{\prime}\)4) is satisfied.

The unbounded case will follow as a consequence of a sequence of lemmas. Set, for any \((x,z)=(x,l,\varrho)\in\mathcal{J}\times\mathcal{L}\times\mathcal{H}_{0}\)

$$\tilde{\mathbf{W}}_{n}(x,z)=\frac{1}{n\mathbf{E}[\Delta_{1}(x,\varrho h)]}\sum_{i=1}^{n}\Bigg{\{}\bar{\mathcal{M}}_{x,l}(Y_{i})\Delta_{i}(x,\varrho h)-\mathbf{E}\bigg{[}\bar{\mathcal{M}}_{x,l}(Y_{i})\Delta_{i}(x,\varrho h)\bigg{]}\Bigg{\}}.$$

We will first show that the processes

$$\Bigg{\{}w_{n}W_{n}(x,z):z\in\mathcal{J}\times\mathcal{L}\times\mathcal{H}_{0}\Bigg{\}}\quad\textrm{and}\quad\Bigg{\{}w_{n}\tilde{\mathbf{W}}_{n}(x,z):z\in\mathcal{J}\times\mathcal{L}\times\mathcal{H}_{0}\Bigg{\}}$$

are exponentially contiguous.

Lemma 7.5. Under the assumptions (A1)–(A3) and (B3–(B4)(ii), for any \(\eta>0\), we have

$$\limsup_{n\rightarrow\infty}\frac{w_{n}^{2}}{n\phi(\gamma h)}\log\mathbb{P}\Bigg{(}\sup_{z\in\mathcal{J}\times\mathcal{L}\times\mathcal{H}_{0}}w_{n}|W_{n}(x,z)-\tilde{\mathbf{W}}_{n}(x,z)|\geq\eta\Bigg{)}=-\infty.$$

The proof of Lemma 7.5 is postponed until Section 8.

For any \(z=(l,\varrho)\in\mathcal{L}\times\mathcal{H}_{0}\) and \(x\in\mathcal{J}\), set

$${B}_{n,x,z}(u,v)=\phi(\gamma h)\frac{\bar{\mathcal{M}}_{x,l}(v)K(d(u,x)/\varrho h)}{\mathbf{E}[\Delta_{1}(x,\varrho h)]}.$$
(7.21)

Lemma 7.6. Assume that assumptions (A1)–(A2) and (B1)–(B3) hold true. Furthermore, consider the classes of continuous functions \(\mathcal{C}\) and \(\mathcal{D}\) given above. Then the class

$${\mathfrak{M}}:=\Bigg{\{}(u,v)\mapsto B_{n,x_{1},z_{1}}(u,v)-B_{n,x_{2},z_{2}}(u,v):(x_{1},x_{2})\in\mathcal{J}^{2},(z_{1},z_{2})\in\big{(}\mathcal{L}\times\mathcal{H}_{0}\big{)}^{2}\Bigg{\}}$$

is a pointwise measurable class of functions with the envelope function

$$G(x,v):={C_{0}}(C_{\mathcal{L}}L(v)+D_{\mathcal{L}}),$$

and satisfying the condition

$$N(\epsilon,\mathfrak{M})\leq C_{1}\epsilon^{-\nu},\quad 0<\epsilon<1,$$

where \(C_{0}\), \(C_{1}\), and \(\nu\) are suitable positive constants.

The proof of Lemma 7.6 is postponed until Section 8.

Lemma 7.7. Assuming that the conditions (A1)–(A2), (B1), (B3), and (B4)(i) or (B\({}^{\prime}\)4)(i) are satisfied, along with \(\tau_{0}\left(\displaystyle\frac{\vartheta_{1}}{\vartheta_{2}}\right)>0\), then, for \(n\) large enough and any \(\epsilon>0\), there exist \(\delta,\zeta_{1},\zeta_{2}>0\) such that for any \((z_{1},z_{2})\in\mathcal{F}_{0}(\zeta_{1},\zeta_{2})\), we have, for any \(x_{1},x_{2}\in\mathcal{J}\) such that \(d(x_{1},x_{2})\leq\delta\phi(\gamma h)\),

$${\textrm{Var}}(B_{n,x_{1},z_{1}}(X,Y)-B_{n,x_{2},z_{2}}(X,Y))\leq C\,\epsilon\,\phi(\gamma h)$$

for suitable constant \(C>0\).

The proof of Lemma 7.7 is postponed until Section 8.

Proof of Proposition 7.3

The proof uses Lemma 7.4 as a device. By Lemma 7.6 the following classes

$$\mathcal{F}_{n,x}(\delta,\zeta_{1},\zeta_{2})=\Bigg{\{}(u,v)\mapsto B_{n,x,z_{1}}(u,v)-B_{n,y,z_{2}}(x,v):y\in B(x,\delta\phi(\gamma h)),\,(z_{1},z_{2})\in\mathcal{F}_{0}(\zeta_{1},\zeta_{2})\Bigg{\}}$$

are measurable VC classes of functions. Now, for \(n\) large enough, take \(U=C_{0}(C_{\mathcal{L}}+D_{\mathcal{L}})\mathcal{M}^{\textrm{inv}}(n)\) where \(C_{0}\) is as in Lemma 7.6, \(\mathcal{F}=\mathcal{F}_{n,x}(\delta,\zeta_{1},\zeta_{2})\) and by Lemma 7.7 \(\sigma^{2}=C\epsilon\phi(\gamma h)\). Then, for \(n\) large enough, we have \(\sigma\leq U/2\), and by the Condition (B4)(iii)–(iv), we infer

$$\sqrt{n}\sigma\bar{C}\geq U\sqrt{\log(U/\sigma)},\quad\text{for suitable constant }\bar{C}$$

and

$$\limsup_{n\rightarrow\infty}\frac{w_{n}\bigg{(}U\log(U/\sigma)+\sqrt{n}\sigma\sqrt{\log(U/\sigma)}\bigg{)}}{n\phi(\gamma h)}\leq\limsup_{n\rightarrow\infty}\frac{(1+\bar{C})w_{n}\sqrt{n}\sigma\sqrt{\log(U/\sigma)}}{n\phi(\gamma h)}<\infty.$$

Consequently, there exists a positive integer number \(n_{0}\) such that for any \(n\geq n_{0}\)

$$\frac{{n\phi(\gamma h)}}{w_{n}}\geq\bar{C}_{1}\bigg{(}U\log(U/\sigma)+\sqrt{n}\sigma\sqrt{\log(U/\sigma)}\bigg{)}$$

and

$$\sqrt{n}\sigma+U\sqrt{\log(U/\sigma)}\leq(1+\bar{C})\sqrt{nC\epsilon\phi(\gamma h)}.$$

Applying Lemma 7.4, for any \(n\geq n_{0}\), we obtain

$${\mathbb{P}\Bigg{(}\sup_{(y,z_{1},z_{2})\in B(x,\delta\phi(\gamma h))\times\mathcal{F}_{0}(\zeta_{1},\zeta_{2})}w_{n}|\tilde{\mathbf{W}}_{n}(x,z_{1})-\tilde{\mathbf{W}}_{n}(y,z_{2})|\geq\eta\Bigg{)}}$$
$${}\leq M\exp\Bigg{\{}-\frac{n\phi(\gamma h)\eta}{w_{n}MC_{0}(C_{\mathcal{L}}+D_{\mathcal{L}})\mathcal{M}^{\textrm{inv}}(n)}\log\Bigg{(}1+\frac{\eta C_{0}(C_{\mathcal{L}}+D_{\mathcal{L}})\mathcal{M}^{\textrm{inv}}(n)}{w_{n}M(1+\bar{C})^{2}C\epsilon}\Bigg{)}\Bigg{\}}.$$

Therefore, by (B4)(iii)–(iv), we infer

$$\limsup_{n\rightarrow\infty}\frac{w_{n}^{2}}{n\phi(\gamma h)}\log\Bigg{(}\mathbb{P}\bigg{(}\sup_{(y,z_{1},z_{2})\in B(x,\delta\phi(\gamma h))\times\mathcal{F}_{0}(\zeta_{1},\zeta_{2})}w_{n}|\tilde{\mathbf{W}}_{n}(x,z_{1})-\tilde{\mathbf{W}}_{n}(y,z_{2})|\geq\eta\bigg{)}\Bigg{)}$$
$${}\leq-\frac{\eta^{2}}{M^{2}(1+\bar{C})^{2}C\epsilon}.$$
(7.22)

Letting \(\epsilon\) go to \(0\), we prove that the process

$$\bigg{\{}\tilde{\mathbf{W}}_{n}(x,z):(x,z)\in\mathcal{J}\times\mathcal{L}\times\mathcal{H}_{0}\bigg{\}}$$

fulfills the results of Proposition 7.3. Finally, the same conclusion holds for the process \(\{{W}_{n}(x,z):(x,z)\in\mathcal{J}\times\mathcal{L}\times\mathcal{H}_{0}\}\) in the unbounded case in view of Lemma 7.5. In the Bounded case, under Condition (B\({}^{\prime}\)4), we take

$$U=C_{0}(\mathcal{C}_{\mathcal{L}}L_{0}+\mathcal{D}_{\mathcal{L}})\quad\textrm{and}\quad\sigma^{2}=C\epsilon\phi(\gamma h).$$

The same arguments as above yield, under the Condition (B\({}^{\prime}\)4), the same inequality as in (7.22). This achieves the proof. \(\Box\)

Proof of Theorem 3.2

By combining the findings from Theorem 3.1, Proposition 7.3, and Lemma 7.2, and by applying Theorem \(3.1\) in [6], we can establish that the process \(\{w_{n}W_{n}(x,z):z\in\mathcal{L}\times\mathcal{H}_{0}\}\) satisfies the large deviation principle within \(l_{\infty}(\mathcal{L}\times\mathcal{H}_{0})\), featuring the speed \(n\phi(\gamma h)/w_{n}^{2}\) and the corresponding good rate function

$$I_{x}^{\gamma}(\psi)=\sup\Bigg{\{}\Gamma_{x,z_{1}\ldots,z_{m}}^{\gamma}(\psi(z_{1}),\ldots,\psi(z_{m})):z_{i}\in\mathcal{L}\times\mathcal{H}_{0},i=1,\ldots,m,m\geq 1\Bigg{\}}.$$

Finally, by Theorem \(4.2\) in [7] this rate function can be expressed as

$$I_{x}^{\gamma}(\psi)=\inf\Bigg{\{}2^{-1}\mathbf{E}[\xi^{2}]:\xi\in\mathcal{Z}_{x,\gamma},\ \varphi(\xi)=\psi\Bigg{\}}.$$

Hence the proof is complete. \(\Box\)

By condition (J) there exists \(x_{n,1},\ldots,x_{n,l_{n}}\) in \(\mathcal{J}\) such that

$$\mathcal{J}\subset\bigcup_{k=1}^{l_{n}}B_{n,k},$$

and there exists \(\nu\geq 0\) such that \(l_{n}\leq C(\delta\phi(\gamma h))^{-\nu}\), for some suitable positive constant \(C\). Here \(B_{n,k}\) denotes the open ball with center \(x_{n,k}\) and radius \(\delta\phi(\gamma h)\).

Proof of Theorem 3.3

The lower bound is easy. In fact, for any \(x\in\mathcal{J}\), by Theorem 3.2 we have

$$\liminf_{n\rightarrow\infty}\frac{w_{n}^{2}}{n\phi(\gamma h)}\log\Bigg{(}\mathbb{P}\bigg{\{}w_{n}||W_{n}||_{\infty}>\lambda\bigg{\}}\Bigg{)}\geq\liminf_{n\rightarrow\infty}\frac{w_{n}^{2}}{n\phi(\gamma h)}\log\Bigg{(}\mathbb{P}\bigg{\{}w_{n}\sup_{z\in\mathcal{L}\times\mathcal{H}_{0}}|W_{n}(x,z)|>\lambda\bigg{\}}\Bigg{)}$$
$${}\geq-I_{x}^{\gamma}(\lambda).$$

Hence

$$\liminf_{n\rightarrow\infty}\frac{w_{n}^{2}}{n\phi(\gamma h)}\log\Bigg{(}\mathbb{P}\bigg{\{}w_{n}||W_{n}||_{\infty}>\lambda\bigg{\}}\Bigg{)}\geq-I^{\gamma}(\lambda).$$

Now we show the upper bound. By considering the condition (J) and applying Proposition 7.3 we obtain from the Condition (B4)(iv) and inequality \(\log(a+b)\leq\log 2+\max(\log a,\log b)\), \(a\geq 0\), \(b\geq 0\), for any \(\epsilon<\lambda\),

$$\limsup_{n\rightarrow\infty}\frac{w_{n}^{2}}{n\phi(\gamma h)}\log\Bigg{(}\mathbb{P}\bigg{\{}w_{n}||W_{n}||_{\infty}>\lambda\bigg{\}}\Bigg{)}$$
$${}\leq\limsup_{\delta\rightarrow 0}\limsup_{n\rightarrow\infty}\frac{w_{n}^{2}}{n\phi(\gamma h)}\log\Bigg{(}\mathbb{P}\bigg{\{}w_{n}\max_{1\leq k\leq l_{n}}\sup_{x\in B_{n,k}}\sup_{z\in\mathcal{L}\times\mathcal{H}_{0}}|W_{n}(x,z)-W_{n}(x_{n,k},z)|>\epsilon\bigg{\}}$$
$${}+\mathbb{P}\bigg{\{}w_{n}\max_{1\leq k\leq l_{n}}\sup_{z\in\mathcal{L}\times\mathcal{H}_{0}}|W_{n}(x_{n,k},z)|>\lambda-\epsilon\bigg{\}}\Bigg{)}$$
$${}\leq\limsup_{\delta\rightarrow 0}\limsup_{n\rightarrow\infty}\frac{w_{n}^{2}}{n\phi(\gamma h)}\log\Bigg{(}\mathbb{P}\bigg{\{}w_{n}\max_{1\leq k\leq l_{n}}\sup_{z\in\mathcal{L}\times\mathcal{H}_{0}}|W_{n}(x_{n,k},z)|>\lambda-\epsilon\bigg{\}}\Bigg{)}.$$

On the other hand

$$\limsup_{n\rightarrow\infty}\frac{w_{n}^{2}}{n\phi(\gamma h)}\log\Bigg{(}\mathbb{P}\bigg{\{}w_{n}\max_{1\leq k\leq l_{n}}\sup_{z\in\mathcal{L}\times\mathcal{H}_{0}}|W_{n}(x_{n,k},z)|>\lambda-\epsilon\bigg{\}}\Bigg{)}$$
$${}\leq\limsup_{n\rightarrow\infty}\frac{w_{n}^{2}}{n\phi(\gamma h)}\log\big{(}l_{n})+\sup_{x\in\mathcal{J}}\limsup_{n\rightarrow\infty}\frac{w_{n}^{2}}{n\phi(\gamma h)}\log\Bigg{(}\mathbb{P}\bigg{\{}w_{n}\sup_{z\in\mathcal{L}\times\mathcal{H}_{0}}|W_{n}(x,z)|>\lambda-\epsilon\bigg{\}}\Bigg{)}.$$

It follows, by Condition (B4)(iv) that

$$\limsup_{n\rightarrow\infty}\frac{w_{n}^{2}}{n\phi(\gamma h)}\log\Bigg{(}\mathbb{P}\bigg{\{}w_{n}||W_{n}||_{\infty}>\lambda\bigg{\}}\Bigg{)}\leq-I^{\gamma}(\lambda-\epsilon).$$

The demonstration of Theorem 3.3 may be concluded through the asymptotic limit of the parameter \(\epsilon\) converging towards zero since the function \(I^{\gamma}(\cdot)\) is continuous. \(\Box\)

To prove Corollary 4.1, we need some intermediate results. For any \(x\in\mathcal{J}\) and any \((l,\varrho)\in\mathcal{L}\times\mathcal{H}_{0}\), set

$$\mathfrak{B}_{n}(x,l,\varrho)=c_{l}(x)\bigg{(}\widehat{r}_{n,2}^{l}(x,\varrho h)-r^{l}(x)\bigg{)}+d_{l}(x)\bigg{(}\widehat{r}_{n,1}^{l}(x,\varrho h)-1\bigg{)}.$$
(7.23)

Lemma 7.8. Under the assumption of Theorem \(3.2\), assume that the condition (C1) holds. Then the process

$$\bigg{\{}w_{n}\mathfrak{B}_{n}(x,l,\varrho):(l,\varrho)\in\mathcal{L}\times\mathcal{H}_{0}\bigg{\}}$$

satisfies a large deviation principle in \(l_{\infty}(\mathcal{L}\times\mathcal{H}_{0})\) with the speed \((n\phi(\gamma h)/w_{n}^{2})\) and the good rate function \(I_{1,x}^{\gamma}(\cdot)\).

The proof of Lemma 7.8 is postponed until Section 8.

Proof of Corollary 4.1

By choosing \(c_{l}(x)=1\) and \(d_{l}(x)=-r^{l}(x)\), the sequence \(\mathfrak{B}_{n}(x,l,\varrho h)\) in (7.23) may be then written as

$$\mathfrak{B}_{n}(x,l,\varrho h)=\bigg{(}\widehat{r}_{n,2}^{l}(x,\varrho h)-r^{l}(x)\bigg{)}-r^{l}(x)\bigg{(}\widehat{r}_{n,1}^{l}(x,\varrho h)-1\bigg{)}.$$

The subsequent statements show that the processes

$$\Bigg{\{}w_{n}(\widehat{r}_{n}^{l}(x,\varrho h)-{r}^{l}(x,\varrho h)):(l,\varrho)\in\mathcal{L}\times\mathcal{H}_{0}\Bigg{\}}$$

and

$$\Bigg{\{}w_{n}\mathfrak{B}_{n}(x,l,\varrho h):(l,\varrho)\in\mathcal{L}\times\mathcal{H}_{0}\Bigg{\}}$$

are exponentially contiguous. Indeed, we have

$$\mathbb{P}\Bigg{(}w_{n}\sup_{(l,\varrho)\in\mathcal{L}\times\mathcal{H}_{0}}\bigg{|}\widehat{r}_{n}^{l}(x,\varrho h)-{r}^{l}(x,\varrho h)-\mathfrak{B}_{n}(x,l,\varrho h)\bigg{|}>\eta\Bigg{)}$$
$${}\leq\mathbb{P}\Bigg{(}w_{n}\sup_{(l,\varrho)\in\mathcal{L}\times\mathcal{H}_{0}}\bigg{|}\mathfrak{B}_{n}(x,l,\varrho h)\bigg{(}\frac{1}{\widehat{r}_{n,1}^{l}(x,\varrho h)}-1\bigg{)}\bigg{|}>\eta,\inf_{\varrho\in\mathcal{H}_{0}}\widehat{r}_{n,1}^{l}(x,\varrho h)\geq 1/2\Bigg{)}$$
$${}+\mathbb{P}\Bigg{(}\inf_{\varrho\in\mathcal{H}_{0}}\widehat{r}_{n,1}^{l}(x,\varrho h)<1/2\Bigg{)}$$
$${}\leq\mathbb{P}\Bigg{(}\sqrt{w_{n}}\sup_{(l,\varrho)\in\mathcal{L}\times\mathcal{H}_{0}}|\mathfrak{B}_{n}(x,l,\varrho h)|>\sqrt{\eta}\Bigg{)}+\mathbb{P}\Bigg{(}\sqrt{w_{n}}\sup_{\varrho\in\mathcal{H}_{0}}|\widehat{r}_{n,1}^{l}(x,\varrho h)-1|>\sqrt{\eta}/2\Bigg{)}$$
$${}+\mathbb{P}\Bigg{(}\sup_{\varrho\in\mathcal{H}_{0}}|\widehat{r}_{n,1}^{l}(x,\varrho h)-1|>\frac{1}{2}\Bigg{)}.$$

Since \(\lim\limits_{n\rightarrow\infty}w_{n}=\infty\), for \(n\) large enough, it follows that

$$\mathbb{P}\Bigg{(}w_{n}\sup_{(l,\varrho)\in\mathcal{L}\times\mathcal{H}_{0}}\bigg{|}\widehat{r}_{n}^{l}(x,\varrho h)-{r}^{l}(x,\varrho h)-\mathfrak{B}_{n}(x,l,\varrho h)\bigg{|}>\eta\Bigg{)}$$
$${}\leq 3\max\Bigg{\{}\mathbb{P}\Big{(}\sqrt{w_{n}}\sup_{(l,\varrho)\in\mathcal{L}\times\mathcal{H}_{0}}|\mathfrak{B}_{n}(x,l,\varrho h)|>\sqrt{\eta}\Big{)},\mathbb{P}\Big{(}\sqrt{w_{n}}\sup_{\varrho\in\mathcal{H}_{0}}|\widehat{r}_{n,1}^{l}(x,\varrho h)-1|>\sqrt{\eta}/2\Big{)}\Bigg{\}}.$$

Now, by Lemma 7.8 the sequence

$$\Bigg{\{}w_{n}\mathfrak{B}_{n}(x,l,\varrho h):(l,\varrho)\in\mathcal{L}\times\mathcal{H}_{0}\Bigg{\}}$$

satisfies a large deviation principle with the speed \((n\phi(\gamma h)/w_{n}^{2})\) and the good rate function \(I^{\gamma}_{1,x}(\cdot)\), there exists a constant \(c_{1}>0\) such that

$$\limsup_{n\rightarrow\infty}\frac{w_{n}}{n\phi(\gamma h)}\log\mathbb{P}\Bigg{(}\sqrt{w_{n}}\sup_{(l,\varrho)\in\mathcal{L}\times\mathcal{H}_{0}}|\mathfrak{B}_{n}(x,l,\varrho h)|>\sqrt{\eta}\Bigg{)}<-c_{1}.$$

Moreover, an application of Theorem 3.2 guarantees the existence of a real \(c_{2}>0\) such that

$$\limsup_{n\rightarrow\infty}\frac{w_{n}}{n\phi(\gamma h)}\log\mathbb{P}\Bigg{(}\sqrt{w_{n}}\sup_{\varrho\in\mathcal{H}_{0}}|\widehat{r}_{n,1}^{l}(x,\varrho h)-1|>\sqrt{\eta}/2\Bigg{)}<-c_{2}.$$

We then deduce that

$$\limsup_{n\rightarrow\infty}\frac{w_{n}^{2}}{n\phi(\gamma h)}\log\mathbb{P}\Bigg{(}w_{n}\sup_{(l,\varrho)\in\mathcal{L}\times\mathcal{H}_{0}}\bigg{|}\widehat{r}_{n}^{l}(x,\varrho h)-{r}^{l}(x,\varrho h)-\mathfrak{B}_{n}(x,l,\varrho h)\bigg{|}>\eta\Bigg{)}=-\infty,$$

which means that the process

$$\Bigg{\{}w_{n}(\widehat{r}_{n}^{l}(x,\varrho h)-{r}^{l}(x,\varrho h)):(l,\varrho)\in\mathcal{L}\times\mathcal{H}_{0}\Bigg{\}}$$

and

$$\Bigg{\{}w_{n}\mathfrak{B}_{n}(x,l,\varrho h):(l,\varrho)\in\mathcal{L}\times\mathcal{H}_{0}\Bigg{\}}$$

are exponentially contiguous. Thus, Corollary 4.1 follows by making use of Lemma 7.8. \(\Box\)

Proof of Corollary 4.2

To see how Corollary 4.2 follows from Theorem 3.2, take in this case

$$\mathcal{L}=\Bigg{\{}{\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{\{y\leq t\}}:t\in\mathbb{R}\Bigg{\}},\quad\mathcal{D}=\Bigg{\{}-F(t|\cdot):t\in\mathbb{R}\Bigg{\}}\quad\textrm{and}\quad c_{l}(x)=1,$$

for each \(x\in\mathcal{J}\). Under the Condition (C2), it becomes evident that the set of functions \(\mathcal{D}\) constitutes a collection of uniformly equicontinuous functions in \((\mathcal{J},d)\). Consequently, \(\mathcal{D}\) forms a set of uniformly bounded functions. Applying the Arzéla–Ascoli theorem (refer to, for instance, Theorem IV.6.7 in [56]), it follows that \(\mathcal{D}\) is a compact set within \(l_{\infty}(\mathcal{J})\). It is worth noting that we can employ this theorem even when \((\mathcal{J},d)\) is a totally bounded pseudo-metric space, not necessarily a compact one. Now, employing the same line of reasoning as in the proof of Corollary 4.1, the proof is successfully established. \(\Box\)

Proof of Corollary 4.3

By Lemma 3.9.23 of [131], it follows that the inverse map \(\Phi:G\mapsto G^{-1}\) as a map \(D_{1}[F^{-1}(p|x)-\varepsilon,F^{-1}(q|x)+\varepsilon]\mapsto l_{\infty}[p,q]\) is Hadamard differentiable at \(F(\ \cdot|x)\) tangentially to \(C[F^{-1}(p|x)-\varepsilon,F^{-1}(q|x)+\varepsilon]\), and the derivative is the map

$$\psi\mapsto-\psi\left(F^{-1}(\cdot|x)\right)/f\left(F^{-1}(\cdot|x)\right).$$

Therefore, by Theorem 3.1 of [67] and Corollary 4.2, we conclude that

$$\left\{w_{n}\left(\widehat{q}_{n,\alpha}(x,\varrho)-q_{\alpha}(x)\right),(\alpha,\varrho)\in l_{\infty}([p,q]\times\mathcal{H}_{0})\right\}$$

satisfies the LDP in \(l_{\infty}([p,q]\times\mathcal{H}_{0})\) with speed \(n\phi(\gamma h)/w_{n}^{2}\) and the rate function \(I_{\gamma,x}^{EQ}(\cdot).\) \(\Box\)

PROOF OF THE TECHNICAL LEMMAS

Proof of Lemma 7.1

Note that, the condition (A1) and (A2)(i) implies the equality (7.1), and this last equality with \(M_{x,l}(v)=M_{0}(v)\) gives that for any \((x,\varrho)\in\mathcal{J}\times\mathcal{H}_{0}\),

$$\frac{\mathbf{E}[M_{0}(Y_{1})\Delta_{1}(x,\varrho h)]}{\phi(\varrho h)}=\int L_{x,\varrho}(v,h)M_{0}(v)g(v)\,dv,$$

where

$$L_{x,\varrho}(v,h)=K(1)\Big{(}f_{v}(x)+\frac{g_{x,v}(\varrho h)}{\phi(\varrho h)}\Big{)}-\int\limits_{0}^{1}K^{\prime}(u)\frac{\phi(u\varrho h)}{\phi(\varrho h)}\Big{(}f_{v}(x)+\frac{g_{x,v}(u\varrho h)}{\phi(u\varrho h)}\Big{)}\,du.$$

Now, observe that, for any \((x,\varrho)\in\mathcal{J}\times\mathcal{H}_{0}\),

$$\bigg{|}\int L_{x,\varrho}(v,h)M_{0}(v)g(v)\,dv-\alpha_{0}\int M_{0}(v)f_{v}(x)g(v)\,dv\bigg{|}$$
$${}\leq K(1)\int\frac{g_{x,v}(\varrho h)}{\phi(\varrho h)}M_{0}(v)g(v)\,dv$$
$${}+\bigg{(}\int M_{0}(v)f_{v}(x)g(v)\,dv\bigg{)}\int\limits_{0}^{1}|K^{\prime}(u)|\Big{|}\frac{\phi(u\varrho h)}{\phi(\varrho h)}-\tau_{0}(u)\Big{|}\,du$$
$${}+\int\int\limits_{0}^{1}|K^{\prime}(u)|\frac{g_{x,v}(u\varrho h)}{\phi(u\varrho h)}M_{0}(v)g(v)\,du\,dv.$$

Making use of the conditions (A1)–(A2) and (B3), we derive

$$\lim_{n\rightarrow\infty}\sup_{(x,\varrho)\in\mathcal{J}\times\mathcal{H}_{0}}\bigg{|}\frac{\mathbf{E}[M_{0}(Y_{1})\Delta_{1}(x,\varrho h)]}{\phi(\varrho h)}-\alpha_{0}\int M_{0}(v)f_{v}(x)g(v)\,dv\bigg{|}=0.$$

This completes the proof of Lemma 7.1. \(\Box\)

Proof of Lemma 7.2

Making use of the conditions (A1)–(A2) and (B3)–(B4)(i), and applying Lemma 7.1, there exists a positive constant \(C_{1}\) such that, for every pair \(l,l^{\prime}\in\mathcal{L}\), every \(x\) in \(\mathcal{J}\), every \(\varrho\) in \(\mathcal{H}_{0}\), and for sufficiently large \(n\), the following holds:

$$\mathbf{E}\left[\bigg{\{}(c_{l}(x)l(Y)-c_{l^{\prime}}(x)l^{\prime}(Y)){\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{\{L(Y)\leq\mathcal{M}^{\textrm{inv}}(n)\}}+(d_{l}(x)-d_{l^{\prime}}(x))\bigg{\}}^{2}\frac{\big{(}\Delta_{1}(x,\varrho h)\big{)}^{2}}{(\mathbf{E}[\Delta_{1}(x,\varrho h)])^{2}/\phi(\gamma h)}\right]$$
$${}\leq C_{1}\frac{\phi(\gamma h)}{\phi(\varrho h)}\int\Bigg{\{}(c_{l}(x)l(v)-c_{l^{\prime}}(x)l^{\prime}(v)){\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{\{L(v)\leq\mathcal{M}^{\textrm{inv}}(n)\}}+(d_{l}(x)-d_{l^{\prime}}(x))\Bigg{\}}^{2}g(v)dv.$$

According to (A2)(ii), we obtain, uniformly for \(\varrho\) in \(\mathcal{H}_{0}\),

$$\frac{\phi(\varrho h)}{\phi(\gamma h)}=\tau(\varrho,\gamma)+o(1).$$

By combining the last equation with the observation that \(0<\tau_{0}\left(\displaystyle\frac{\vartheta_{1}}{\vartheta_{2}}\right)\leq\tau_{0}\left(\displaystyle\frac{\varrho}{\gamma}\right)\), we can deduce the existence of a positive constant \(C_{2}\) such that, for sufficiently large \(n,\)

$$C_{2}\phi(\gamma h)\leq\inf_{\varrho\in\mathcal{H}_{0}}\phi(\varrho h).$$

Henceforth, there exists a positive constant \(C>0\) such that, for every pair \(l,l^{\prime}\in\mathcal{L}\), every \(\varrho\) in \(\mathcal{H}_{0}\), and for sufficiently large \(n\),

$$\mathbf{E}\left[\bigg{\{}(c_{l}(x)l(Y)-c_{l^{\prime}}(x)l^{\prime}(Y)){\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{\{L(Y)\leq\mathcal{M}^{\textrm{inv}}(n)\}}+(d_{l}(x)-d_{l^{\prime}}(x))\bigg{\}}^{2}\frac{\big{(}\Delta_{1}(x,\varrho h)\big{)}^{2}}{(\mathbf{E}[\Delta_{1}(x,\varrho h)])^{2}/\phi(\gamma h)}\right]$$
$${}\leq C\int\Bigg{\{}(c_{l}(x)l(v)-c_{l^{\prime}}(x)l^{\prime}(v)){\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{\{L(v)\leq\mathcal{M}^{\textrm{inv}}(n)\}}+(d_{l}(x)-d_{l^{\prime}}(x))\Bigg{\}}^{2}g(v)dv.$$
(8.1)

Given the Condition (B1) on the class \(\mathcal{L}\), which states that it is totally bounded with respect to the distance \(d_{Q}(\cdot,\cdot)\), where \(Q(\cdot)\) is a distribution with density \(g(\cdot)\), it follows that for any \(\delta>0\), there exists a finite subclass \(\mathcal{L}_{1}\subset\mathcal{L}\) such that

$$\sup_{l\in\mathcal{L}}\min_{l^{\prime}\in\mathcal{L}_{1}}\int(l(v)-l^{\prime}(v))^{2}g(v)dv<\delta.$$

Furthermore, due to the compactness of the function classes \(\mathcal{C}\) and \(\mathcal{D}\), we can identify finite subclasses \(\mathcal{L}_{2}\subset\mathcal{L}\) and \(\mathcal{L}_{3}\subset\mathcal{L}\) such that

$$\sup_{l\in\mathcal{L}}\min_{l^{\prime}\in\mathcal{L}_{2}}||c_{l}-c_{l^{\prime}}||_{\mathcal{J}}\vee\sup_{l\in\mathcal{L}}\min_{l^{\prime}\in\mathcal{L}_{3}}||d_{l}-d_{l^{\prime}}||_{\mathcal{J}}<\delta.$$

Therefore, employing the observation that both

$$\sup_{l\in\mathcal{L}}||c_{l}||_{\mathcal{J}}<\infty\quad\textrm{and}\quad\sup_{l\in\mathcal{L}}\int(l(v))^{2}g(v)dv<\infty.$$

After a brief arithmetic manipulation and selecting a sufficiently small \(\delta>0\), we obtain

$$\sup_{l\in\mathcal{L}}\min_{l_{1}^{\prime},l_{2}^{\prime},l_{3}^{\prime}}\sup_{x\in\mathcal{J}}\int\Bigg{\{}(c_{l}(x)l(v)-c_{l_{2}^{\prime}}(x)l_{1}^{\prime}(v)){\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{\{L(v)\leq\mathcal{M}^{\textrm{inv}}(n)\}}+(d_{l}(x)-d_{l_{3}^{\prime}}(x))\Bigg{\}}^{2}g(v)dv\leq\epsilon/2C,$$

where the minimum is taken over \(\mathcal{L}_{1}\times\mathcal{L}_{2}\times\mathcal{L}_{3}\). Now, consider any triple \((l_{1}^{\prime},l_{2}^{\prime},l_{3}^{\prime})\in\mathcal{L}_{1}\times\mathcal{L}_{2}\times\mathcal{L}_{3}\) for which there exists \(l^{\prime}\in\mathcal{L}\) such that

$$\sup_{x\in\mathcal{J}}\int\Bigg{\{}(c_{l^{\prime}}(x)l^{\prime}(v)-c_{l_{2}^{\prime}}(x)l_{1}^{\prime}(v)){\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{\{L(v)\leq\mathcal{M}^{\textrm{inv}}(n)\}}+(d_{l^{\prime}}(x)-d_{l_{3}^{\prime}}(x))\Bigg{\}}^{2}g(v)dv\leq\epsilon/2C,$$

select one of them to form the desired subclass \(\mathcal{L}_{\epsilon}\). To conclude the proof, it is enough to apply the triangle inequality. \(\Box\)

Proof of Lemma 7.5

Set, for \(z=(x,l,\varrho)\in\mathcal{J}\times\mathcal{L}\times\mathcal{H}_{0}\),

$$\mathbf{W}_{n}^{\prime\prime}(z)=\frac{1}{n\mathbf{E}[\Delta_{1}(x,\varrho h)]}\sum_{i=1}^{n}c_{l}(x)l(Y_{i}){\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{\{L(Y_{i})>\mathcal{M}^{\textrm{inv}}(n)\}}\Delta_{i}(x,\varrho h).$$

Observe first that \(s/\mathcal{M}(s)\) is a decreasing function. In turn, this implies, for any \(v\) such that \(L(v)\geq\mathcal{M}^{\textrm{inv}}(n)\), that

$$|l(v)|=\frac{|l(v)|}{L(v)}\frac{L(v)}{\mathcal{M}(L(v))}\mathcal{M}(L(v))\leq\Bigg{\{}\frac{\mathcal{M}^{\textrm{inv}}(n)}{n}\Bigg{\}}\mathcal{M}(L(v)).$$

Thus, for \(n\) large enough, uniformly in \((x,l,\varrho)\in\mathcal{J}\times\mathcal{L}\times\mathcal{H}_{0}\), we have

$$\mathbf{E}\Big{[}\big{|}\mathbf{W}_{n}^{\prime\prime}(z)\big{|}\Big{]}\leq\Bigg{\{}\frac{\mathcal{M}^{\textrm{inv}}(n)}{n}\Bigg{\}}C_{\mathcal{L}}\mathbf{E}\Bigg{\{}\mathcal{M}(L(Y))\frac{\Delta_{1}(x,\varrho h)}{\mathbf{E}[\Delta_{1}(x,\varrho h)]}\Bigg{\}}.$$

Thus, by conditions (A1)–(A2) and (B3)–(B4)(i) and using Lemma 7.1, there exists two strictly positive constants \(C_{1},C_{2}\) such that, for any \(x\in\mathcal{J}\) and \(\varrho\in\mathcal{H}_{0}\), we have

$$C_{1}\phi(\varrho h)\leq\mathbf{E}[\Delta_{1}(x,\varrho h)]$$
(8.2)

and

$$\mathbf{E}[\mathcal{M}(L(Y))\Delta_{1}(x,\varrho h)]\leq C_{2}\phi(\varrho h).$$
(8.3)

Hence, we readily infer

$$\sup_{z\in\mathcal{J}\times\mathcal{L}\times\mathcal{H}_{0}}w_{n}\mathbf{E}\Big{[}\big{|}\mathbf{W}_{n}^{\prime\prime}(z)\big{|}\Big{]}\leq\frac{w_{n}\mathcal{M}^{\textrm{inv}}(n)}{n}\frac{C_{2}C_{\mathcal{L}}}{C_{1}},$$

which by (A3) converges to \(0\) as \(n\rightarrow\infty\). Considering now the inequality (8.2) and the boundedness of the kernel \(K(\cdot)\) in (A1), for any for \(z\in\mathcal{J}\times\mathcal{L}\times\mathcal{H}_{0}\), we have

$$\big{|}\mathbf{W}_{n}^{\prime\prime}(z)\big{|}\leq\frac{C_{\mathcal{L}}}{C_{1}}\frac{1}{n\phi(\varrho h)}\sum_{i=1}^{n}L(Y_{i}){\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{\{L(Y_{i})>\mathcal{M}^{\textrm{inv}}(n)\}}.$$

By (A2)(ii), we have, uniformly in \(\varrho\in\mathcal{H}_{0}\),

$$\frac{\phi(\varrho h)}{\phi(\gamma h)}=\tau(\varrho,\gamma)+o(1).$$

Combining this with the fact that \(0<\tau_{0}(\displaystyle\frac{\vartheta_{1}}{\vartheta_{2}})\leq\tau_{0}(\displaystyle\frac{\varrho}{\gamma})\) implies that there exists a constant \(C_{2}>0\) such that, for \(n\) large enough,

$$C_{2}\phi(\gamma h)\leq\inf_{\varrho\in\mathcal{H}_{0}}\phi(\varrho h).$$

Therefore, for any \(\eta>0\) and \(n\) large enough, we have

$$\mathbb{P}\Bigg{(}\sup_{z\in\mathcal{J}\times\mathcal{L}\times\mathcal{H}_{0}}w_{n}|\mathbf{W}_{n}(z)-\tilde{\mathbf{W}}_{n}(z)|\geq\eta\Bigg{)}\leq\mathbb{P}\Bigg{(}\sup_{z\in\mathcal{J}\times\mathcal{L}\times\mathcal{H}_{0}}w_{n}|\mathbf{W}^{\prime\prime}_{n}(z)|\geq\eta/2\Bigg{)}$$
$${}\leq\mathbb{P}\Bigg{(}\frac{C_{\mathcal{L}}}{C_{1}}\frac{w_{n}}{nC_{2}\phi(\gamma h)}\sum_{i=1}^{n}L(Y_{i}){\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{\{L(Y_{i})>\mathcal{M}^{\textrm{inv}}(n)\}}\geq\eta/2\Bigg{)}$$
$${}\leq\mathbb{P}\Big{(}\max_{1\leq i\leq n}L(Y_{i})>\mathcal{M}^{\textrm{inv}}(n)\Big{)}$$
$${}\leq n\mathbb{P}\Big{(}L(Y)>\mathcal{M}^{\textrm{inv}}(n)\Big{)}.$$

The application of the exponential Tchebychev inequality with \(t\) chosen as in the condition (B4)(i) yields to

$${\lim\limits_{n\rightarrow\infty}\frac{w_{n}^{2}}{n\phi(\gamma h)}\log\bigg{(}n\mathbb{P}\Big{(}L(Y)>\mathcal{M}^{\textrm{inv}}(n)\Big{)}\bigg{)}}$$
$${}\leq\lim\limits_{n\rightarrow\infty}\Bigg{(}\frac{w_{n}^{2}}{n\phi(\gamma h)}\log n-t\frac{w_{n}^{2}}{n\phi(\gamma h)}\mathcal{M}^{\textrm{inv}}(n)+\frac{w_{n}^{2}}{n\phi(\gamma h)}\log\Big{(}\mathbf{E}\big{[}e^{tL(Y)}\big{]}\Big{)}\Bigg{)},$$
$${}\leq\lim\limits_{n\rightarrow\infty}\Bigg{(}(1-t)\frac{w_{n}^{2}}{n\phi(\gamma h)}\max\big{(}\mathcal{M}^{\textrm{inv}}(n),\log n\big{)}+\frac{w_{n}^{2}}{n\phi(\gamma h)}\log\Big{(}\mathbf{E}\big{[}e^{tL(Y)}\big{]}\Big{)}\Bigg{)},$$

which by Assumptions (A3) and (B4)(ii) converges to \(-\infty\) as \(n\rightarrow\infty\). \(\Box\)

Proof of Lemma 7.6

Similarly as in the proof of Lemma 5 in [58], it follows that the class

$$\tilde{\mathfrak{M}_{1}}=\Bigg{\{}(u,v)\rightarrow{\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{\{L(v)\leq t\}}K(d(x,u)/\varrho h),\varrho\in\mathcal{H}_{0},x\in\mathcal{J},t>0\Bigg{\}}$$

satisfies for any probability measure \(Q\) on Borel subsets of \(\mathcal{J}\times\mathbb{R}\) the condition

$$N(\epsilon\kappa,\tilde{\mathfrak{M}_{1}},d_{Q})\leq\tilde{C}\epsilon^{-\tilde{\nu}},\quad 0<\epsilon<1,$$

where \(\tilde{C},\tilde{\nu}>0\) are suitable positive constants. Next, consider the function class

$$\tilde{\mathfrak{M}_{2}}=\Bigg{\{}(u,v)\mapsto\phi(\gamma h)\frac{c_{l}(x)l(v)}{\mathbf{E}[\Delta_{1}(x,\varrho h)]}:\varrho\in\mathcal{H}_{0},l\in\mathcal{L},x\in\mathcal{J}\Bigg{\}}.$$

Applying the same reasoning as presented in the proof of Lemma \(5\) in [58] and leveraging the Vapnik–Červonenkis property of \(\mathcal{L}\), along with the inequality (8.2) and the bounded nature of the classes \(\mathcal{C}\), it follows that the class \(\tilde{\mathfrak{M}_{2}}\) possesses a polynomial covering number. Consequently, by referring to Lemma A.1 in [58], it can be deduced that the product class \(\tilde{\mathfrak{M}_{1}}\cdot\tilde{\mathfrak{M}_{2}}\) exhibits a polynomial covering number. Now, let’s consider the class

$$\tilde{\mathfrak{M}_{3}}=\Bigg{\{}(u,v)\mapsto\phi(\gamma h)\frac{d_{l}(x)}{\mathbf{E}[\Delta_{1}(x,\varrho h)]}:\varrho\in\mathcal{H}_{0},l\in\mathcal{L},x\in\mathcal{J}\Bigg{\}}.$$

Utilizing inequality (8.2), the bounded nature of the classes \(\mathcal{D}\), and invoking Lemma A.\(1\) from [58], it can be established that the product class \(\mathcal{K}\cdot\tilde{\mathfrak{M}_{3}}\) possesses a polynomial covering number. Consequently, the class resulting from the summation of \(\tilde{\mathfrak{M}_{1}}\cdot\tilde{\mathfrak{M}_{2}}\) and \(\mathcal{K}\cdot\tilde{\mathfrak{M}_{3}}\) also exhibits a polynomial covering number. Finally, it can be concluded that the class \(\mathfrak{M}\) satisfies this covering property as well. The measurability is straightforwardly derived from the kernel function’s continuity, the separability of \((\mathcal{J},d)\), and the fact that the functions belong to the classes \(\mathcal{C}\) and \(\mathcal{D}\). \(\Box\)

Proof of Lemma 7.7

Recall the Eq. (7.21). Note that, for any \((x_{1},x_{2})\in\mathcal{J}^{2}\) and any \((z_{1},z_{2})\in\big{(}\mathcal{L}\times\mathcal{H}_{0}\big{)}^{2}\),

$${{\textrm{Var}}(B_{n,x_{1},z_{1}}(X,Y)-B_{n,x_{2},z_{2}}(X,Y))}$$
$${}\leq 2\mathbb{E}\Bigg{[}\phi(\gamma h)\Big{\{}\bar{\mathcal{M}}_{x_{1},l_{1}}(Y)-\bar{\mathcal{M}}_{x_{1},l_{2}}(Y)\Big{\}}\frac{\Delta_{1}(x_{1},\varrho_{1}h)}{\mathbf{E}[\Delta_{1}(x_{1},\varrho_{1}h)]}\Bigg{]}^{2}$$
$${}+2\mathbb{E}\Bigg{[}\phi(\gamma h)\bigg{\{}\frac{\bar{\mathcal{M}}_{x_{1},l_{2}}(Y)\Delta_{1}(x_{1},\varrho_{1}h)}{\mathbf{E}[\Delta_{1}(x_{1},\varrho_{1}h)]}-\frac{\bar{\mathcal{M}}_{x_{2},l_{2}}(Y)\Delta_{1}(x_{2},\varrho_{2}h)}{\mathbf{E}[\Delta_{1}(x_{2},\varrho_{2}h)]}\bigg{\}}\Bigg{]}^{2}$$
$${}:=I+II.$$

According to Lemma 7.2, for sufficiently large \(n\), there exists a constant \(C>0\) such that

$$I\leq 2Cd_{\mathcal{L}}(l_{1},l_{2})\phi(\gamma h).$$

Now, observe that

$$II\leq 4\phi(\gamma h)^{2}\mathbb{E}\Bigg{[}\bigg{(}[c_{l_{2}}(x_{1})-c_{l_{2}}(x_{2}])l_{2}(Y){\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{\{L(Y)\leq\mathcal{M}^{\textrm{inv}}(n)\}}+[d_{l_{2}}(x_{1})-d_{l_{2}}(x_{2})]\bigg{)}\frac{\Delta_{1}(x_{1},\varrho_{1}h)}{\mathbf{E}[\Delta_{1}(x_{1},\varrho_{1}h)]}\Bigg{]}^{2}$$
$${}+4\phi(\gamma h)^{2}\mathbb{E}\Bigg{[}\bigg{(}c_{l_{2}}(x_{2})l_{2}(Y){\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{\{L(Y)\leq\mathcal{M}^{\textrm{inv}}(n)\}}+d_{l_{2}}(x_{2})\bigg{)}\bigg{[}\frac{\Delta_{1}(x_{2},\varrho_{2}h)}{\mathbf{E}[\Delta_{1}(x_{2},\varrho_{2}h)]}-\frac{\Delta_{1}(x_{1},\varrho_{1}h)}{\mathbf{E}[\Delta_{1}(x_{1},\varrho_{1}h)]}\bigg{]}\Bigg{]}^{2}$$
$${}:=III+IV.$$

Using the fact that the classes \(\mathcal{C}\) and \(\mathcal{D}\) are uniformly equicontinuous, it follows, for any \(\epsilon>0\), that there exists \(\delta>0\) such that for any \(x_{1},x_{2}\in\mathcal{J}\), with \(d(x_{1},x_{2})\leq\delta\), and

$$\sup_{l\in\mathcal{L}}|c_{l}(x_{1})-c_{l}(x_{2})|\vee\sup_{l\in\mathcal{L}}|d_{l}(x_{1})-d_{l}(x_{2})|\leq\epsilon.$$

Hence, we have

$$III\leq 4(\epsilon\phi(\gamma h))^{2}\mathbb{E}\Bigg{[}\bigg{(}L(Y)+1\bigg{)}\frac{\Delta_{1}(x_{1},\varrho_{1}h)}{\mathbf{E}[\Delta_{1}(x_{1},\varrho_{1}h)]}\Bigg{]}^{2}.$$

Using the same arguments as in (8.1), by the Conditions (A1)–(A2) and (B3)–(B4)(i), and applying Lemma 7.1 with the observation that \(0<\tau_{0}\left(\displaystyle\frac{\vartheta_{1}}{\vartheta_{2}}\right)\leq\tau_{0}\left(\displaystyle\frac{\varrho_{1}}{\gamma}\right)\), there exists a positive constant \(C_{1}\) such that

$$III\leq C_{1}\epsilon^{2}\phi(\gamma h).$$

Similarly, and using the fact that the classes \(\mathcal{C}\) and \(\mathcal{D}\) are uniformly bounded, for suitable finite constants \(C_{1}\) and \(C_{2}\), we have

$$IV\leq 8\phi(\gamma h)^{2}\mathbb{E}\Bigg{[}\bigg{(}c_{l_{2}}(x_{2})l_{2}(Y){\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{\{L(Y)\leq\mathcal{M}^{\textrm{inv}}(n)\}}+d_{l_{2}}(x_{2})\bigg{)}\bigg{[}\frac{\Delta_{1}(x_{2},\varrho_{2}h))}{\mathbf{E}[\Delta_{1}(x_{2},\varrho_{2}h)]}-\frac{\Delta_{1}(x_{1},\varrho_{1}h)}{\mathbf{E}[\Delta_{1}(x_{2},\varrho_{2}h)]}\bigg{]}\Bigg{]}^{2}$$
$${}+8\phi(\gamma h)^{2}\mathbb{E}\Bigg{[}\bigg{(}c_{l_{2}}(x_{2})l_{2}(Y){\mathchoice{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.0mu l}{\rm 1\mskip-4.5mu l}{\rm 1\mskip-5.0mu l}}_{\{L(Y)\leq\mathcal{M}^{\textrm{inv}}(n)\}}+d_{l_{2}}(x_{2})\bigg{)}\bigg{[}\frac{\Delta_{1}(x_{1},\varrho_{1}h))}{\mathbf{E}[\Delta_{1}(x_{2},\varrho_{2}h)]}-\frac{\Delta_{1}(x_{1},\varrho_{1}h)}{\mathbf{E}[\Delta_{1}(x_{1},\varrho_{1}h)]}\bigg{]}\Bigg{]}^{2}.$$

By (7.17) and (B4)(i), and by using the fact that the classes \(\mathcal{C}\) and \(\mathcal{D}\) are uniformly bounded, for suitable finite constants \(C_{1}\) and \(C_{2}\), we have

$${IV}$$
$${}\leq C_{1}\mathbf{E}\Bigg{[}\bigg{(}L(Y)+1\bigg{)}\bigg{[}\Delta_{1}(x_{2},\varrho_{2}h)-\Delta_{1}(x_{1},\varrho_{1}h)\bigg{]}\Bigg{]}^{2}+\frac{C_{2}}{\phi(\gamma h)}\Bigg{[}\mathbf{E}\Big{[}\Delta_{1}(x_{2},\varrho_{2}h)-\Delta_{1}(x_{1},\varrho_{1}h)\Big{]}\Bigg{]}^{2}$$
$${}\leq 2C_{1}\left\{\mathbf{E}\Bigg{[}\bigg{(}L(Y)+1\bigg{)}\bigg{[}\Delta_{1}(x_{2},\varrho_{2}h)-\Delta_{1}(x_{2},\varrho_{1}h)\bigg{]}\Bigg{]}^{2}+\mathbf{E}\Bigg{[}\bigg{(}L(Y)+1\bigg{)}\bigg{[}\Delta_{1}(x_{2},\varrho_{1}h)-\Delta_{1}(x_{1},\varrho_{1}h)\bigg{]}\Bigg{]}^{2}\right\}$$
$${}+2\frac{C_{2}}{\phi(\gamma h)}\left\{\Bigg{[}\mathbf{E}\Big{[}\Delta_{1}(x_{2},\varrho_{2}h)-\Delta_{1}(x_{2},\varrho_{1}h)\Big{]}\Bigg{]}^{2}+\Bigg{[}\mathbf{E}\Big{[}\Delta_{1}(x_{2},\varrho_{1}h)-\Delta_{1}(x_{1},\varrho_{1}h)\Big{]}\Bigg{]}^{2}\right\}.$$

Now, by (A1)(i), we obtain, for some constant \(C\),

$$\mathbf{E}\Bigg{[}\bigg{(}L(Y)+1\bigg{)}\bigg{[}\Delta_{1}(x_{2},\varrho_{2}h)-\Delta_{1}(x_{2},\varrho_{1}h)\bigg{]}\Bigg{]}^{2}$$
$${}=\int\limits_{0}^{1}\int\left(L(v)+1\right)^{2}\left(K\left(\frac{u}{\varrho_{2}}\right)-K\left(\frac{u}{\varrho_{1}}\right)\right)^{2}\,d\mathbb{P}\left(\frac{d(x_{2},X_{1})}{h}\leq u,Y\leq v\right)$$
$${}\leq C\frac{(\varrho_{2}-\varrho_{1})^{2}}{\vartheta_{1}^{2}}\int\limits_{0}^{1}\int\left(L(v)+1\right)^{2}u^{2}\,d\mathbb{P}\left(\frac{d(x_{2},X_{1})}{h}\leq u|Y=v\right)g(v)\,dv.$$

Using the same arguments as in (7.1), we get

$$\mathbf{E}\Bigg{[}\bigg{(}L(Y)+1\bigg{)}\bigg{[}\Delta_{1}(x_{2},\varrho_{2}h)-\Delta_{1}(x_{2},\varrho_{1}h)\bigg{]}\Bigg{]}^{2}\leq C\,(\varrho_{2}-\varrho_{1})^{2}\phi(h)\int\limits_{0}^{1}\Gamma_{x_{2}}(v,h)\left(L(v)+1\right)^{2}g(v)\,dv,$$

where

$$\Gamma_{x}(v,h)=f_{v}(x)+\frac{g_{x,v}(h)}{\phi(h)}-2\int\limits_{0}^{1}u\frac{\phi(uh)}{\phi(h)}\Big{(}f_{v}(x)+\frac{g_{x,v}(uh)}{\phi(uh)}\Big{)}\,du.$$

By the Conditions (A1)–(A2) and (B4)(i), It follows that

$$\mathbf{E}\Bigg{[}\bigg{(}L(Y)+1\bigg{)}\bigg{[}\Delta_{1}(x_{2},\varrho_{2}h)-\Delta_{1}(x_{2},\varrho_{1}h)\bigg{]}\Bigg{]}^{2}\leq C\,|\varrho_{2}-\varrho_{1}|\phi(\gamma h).$$
(8.4)

By using the same arguments as above, it follows that

$$\mathbf{E}\Bigg{[}\bigg{(}L(Y)+1\bigg{)}\bigg{[}\Delta_{1}(x_{2},\varrho_{1}h)-\Delta_{1}(x_{1},\varrho_{1}h)\bigg{]}\Bigg{]}^{2}\leq C\,\left(\frac{d(x_{1},x_{2})}{h}\right)^{2}\phi(\gamma h),$$

which implies, by using the fact that \(d(x_{1},x_{2})\leq\delta\phi(\gamma h)\) and \(\phi(\gamma h)/h=O(1)\),

$$\mathbf{E}\Bigg{[}\bigg{(}L(Y)+1\bigg{)}\bigg{[}\Delta_{1}(x_{2},\varrho_{1}h)-\Delta_{1}(x_{1},\varrho_{1}h)\bigg{]}\Bigg{]}^{2}\leq C\,\delta\phi(\gamma h).$$
(8.5)

Similarly

$$\mathbf{E}\Big{|}\Delta_{1}(x,\varrho_{2}h)-\Delta_{1}(x,\varrho_{1}h)\Big{|}\leq C\,|\varrho_{2}-\varrho_{1}|\phi(\gamma h)$$
(8.6)

and

$$\mathbf{E}\Big{|}\Delta_{1}(x_{2},\varrho_{1}h)-\Delta_{1}(x,\varrho_{1}h)\Big{|}\leq C\,\delta\phi(\gamma h).$$
(8.7)

Finally, from (8.4)–(8.7), we deduce

$$IV\leq C\,\phi(\gamma h)\left(\delta+|\varrho_{2}-\varrho_{1}|\right),$$

which completes the proof of the Lemma 7.7. \(\Box\)

Proof of Lemma 7.8

Observe, for any \((x,z)\in\mathcal{J}\times\mathcal{L}\times\mathcal{H}_{0}\), that

$$\mathfrak{B}_{n}(x,l,\varrho h)=W_{n}(x,z)+c_{l}(x)(\mathbf{E}[\widehat{r}_{n,2}^{l}(x,\varrho h)]-r^{l}(x))$$
$${}=:W_{n}(x,z)+V_{n}(x,z),$$

where \(z=(l,\varrho)\). To prove Lemma 7.8, we have to show that

$$\lim\limits_{n\rightarrow\infty}w_{n}\sup_{z\in\mathcal{L}\times\mathcal{H}_{n}}V_{n}(x,z)=0.$$

Now, for any \((x,l,\varrho)\in\mathcal{J}\times\mathcal{L}\times\mathcal{H}_{0}\), observe that

$${\mathbf{E}[\widehat{r}_{n,2}^{l}(x,\varrho h)-r^{l}(x)]}$$
$${}=\sum_{i=1}^{n}\frac{\mathbf{E}[l(Y_{i})\Delta_{i}(x,\varrho h)]}{n\mathbf{E}[\Delta_{1}(x,\varrho h)]}-{r}^{l}(x)=\frac{\mathbf{E}[l(Y_{1})\Delta_{1}(x,\varrho h)]}{\mathbf{E}[\Delta_{1}(x,\varrho h)]}-{r}^{l}(x)$$
$${}=\frac{\mathbf{E}[\mathbf{E}[l(Y_{1})\Delta_{1}(x,\varrho h)|X_{1}]]}{\mathbf{E}[\Delta_{1}(x,\varrho h)]}-{r}^{l}(x)=\frac{\mathbf{E}[{r}^{l}(X_{1})\Delta_{1}(x,\varrho h)]}{\mathbf{E}[\Delta_{1}(x,\varrho h)]}-{r}^{l}(x)$$
$${}=\frac{\mathbf{E}\Bigg{[}\bigg{(}{r}^{l}(X_{1})-{r}^{l}(x)\bigg{)}\Delta_{1}(x,\varrho h)\Bigg{]}}{\mathbf{E}[\Delta_{1}(x,\varrho h)]}.$$

Since the kernel function \(K(\cdot)\) is \([0,1]\)-supported by condition (A1), it follows that

$$\left|{r}^{l}(X_{1})-{r}^{l}(x)\right|\Delta_{1}(x,\varrho h))\leq\sup_{\{x^{\prime}:d(x^{\prime},x)\leq\varrho h\}}\left|{r}^{l}(x^{\prime})-{r}^{l}(x)\right|\Delta_{1}(x,\varrho h).$$

The use of the Condition (C1) allows us to obtain the claimed result. \(\Box\)