1 Introduction

Motivated by numerous applications, the theory of U-statistics (introduced in the seminal work by Hoeffding (1948)) and U-processes have received considerable attention in the past decades. U-processes are useful for solving complex statistical problems. Examples are density estimation, nonparametric regression tests and goodness-of-fit tests. More precisely, U-processes appear in statistics in many instances, e.g., as the components of higher order terms in von Mises expansions. In particular, U-statistics play a role in the analysis of estimators (including function estimators) with varying degrees of smoothness. For example, Stute (1993) applies a.s. uniform bounds for \(\mathbb {P}\)-canonical U-processes to the analysis of the product limit estimator for truncated data. Arcones and Wang (2006) present two new tests for normality based on U-processes. Making use of the results of Giné and Mason (2007a), Giné and Mason (2007b), Schick et al. (2011) introduced new tests for normality which used as test statistics weighted L1-distances between the standard normal density and local U-statistics based on standardized observations. Joly and Lugosi (2016) discussed the estimation of the mean of multivariate functions in case of possibly heavy-tailed distributions and introduced the median-of-means, which is based on U-statistics. U-processes are important tools for a broad range of statistical applications such as testing for qualitative features of functions in nonparametric statistics (Lee et al. 2009; Ghosal et al. 2000; Abrevaya and Jiang, 2005) and establishing limiting distributions of M-estimators (see, e.g., Arcones and Giné 1993; Sherman 1993; Sherman 1994; de la Peña and Giné1999). Halmos (1946), von Mises (1947) and Hoeffding (1948), who provided (amongst others) the first asymptotic results for the case that the underlying random variables are independent and identically distributed. Under weak dependency assumptions asymptotic results are for instance shown in Borovkova et al. (2001), in Denker and Keller (1983) or more recently in Leucht (2012) and in more general setting in Leucht and Neumann (2013), Bouzebda and Nemouchi (2019), Bouzebda and Nemouchi (2022). For excellent resource of references on the U-statistics and U-processes, the interested reader may refer to Borovskikh (1996), Koroljuk and Borovskich (1994), Lee (1990), Arcones and Giné (1995), Arcones et al. (1994) and Arcones and Giné (1993). A profound insight into the theory of U-processes is given by de la Peña and Giné (1999). In this paper, we consider the so-called conditional U-statistics introduced by Stute (1991). These statistics may be viewed as generalizations of the Nadaraya-Watson (Nadaraja, 1964; Watson, 1964) estimates of a regression function.

To be more precise, let us consider a sequence of independent and identically distributed random vectors \(\{(\mathbf { X}_{i},\mathbf { Y}_{i}), i\in \mathbb {N}^{*}\}\) with \(\mathbf { X}_{i} \in \mathbb {R}^{d}\) and \(\mathbf { Y}_{i} \in \mathbb {R}^{q}\), d,q ≥ 1. Let \(\mathbb {P}_{\mathbf {X}}=\mathbb {P}\) be an unknown marginal Borel probability distribution in \(\mathbb {R}^{d}\). Let \( \varphi : \mathbb {R}^{qm}\rightarrow \mathbb {R}\) be a measurable function. In this paper, we are primarily concerned with the estimation of the conditional expectation, or regression function of φ(Y1,…,Ym) evaluated at (X1,…,Xm) = t, given by

$$ \begin{array}{@{}rcl@{}} r^{(m)}(\varphi,\mathbf{ t})&\! = &\mathbb{E}\left( \varphi(\mathbf{ Y}_{1},\ldots,\mathbf{ Y}_{m})\!\mid\! (\mathbf{ X}_{1},\ldots,\mathbf{ X}_{m}) = \mathbf{ t}\right), ~~\text{for}~~\mathbf{ \!\!t} \in \mathbb{R}^{dm}, \end{array} $$
(1.1)

whenever it exists, i.e., \(\mathbb {E}\left (\left |\varphi (\mathbf { Y}_{1},\ldots ,\mathbf { Y}_{m})\right |\right )<\infty \). We now introduce a kernel function \(K:\mathbb {R}^{dm}\rightarrow \mathbb {R}\). Stute (1991) presented a class of estimators for r(m)(φ,t), called the conditional U-statistics, which is defined for each \(\mathbf { t}\in \mathbb {R}^{dm}\) to be :

$$ \widehat{r}_{n}^{(m)}(\varphi,\mathbf{t};h_{n}) = \frac{\displaystyle\sum\limits_{(i_{1},\ldots,i_{m})\in I(m,n)}\varphi({ Y}_{i_{1}},\ldots,{ Y}_{i_{m}})\mathit{K}\!\left( \!\frac{\mathbf{ t}_{1} - \mathbf{ X}_{i_{1}}}{h_{n}}{}\!\right){\cdots} \mathit{K}\!\left( \!\frac{\mathbf{ t}_{m} - \mathbf{ X}_{i_{m}}}{h_{n}}{}\!\right)}{\displaystyle\sum\limits_{(i_{1},\ldots,i_{m})\in I(m,n)}K\left( \frac{ \mathbf{ t}_{1}-\mathbf{ X}_{i_{1}}}{h_{n}}\right){\cdots} K\left( \frac{\mathbf{ t}_{m}-\mathbf{ X}_{i_{m}}}{h_{n}}\right)}, $$
(1.2)

where:

$$ I(m,n)=\left\{\mathbf{ i}=(i_{1},\ldots,i_{m}): 1\leq i_{j}\leq n ~~\text{ and }~~i_{j}\neq i_{r} ~~\text{ if }~~ j\neq r \right\}, $$
(1.3)

is the set of all m-tuples of different integers between 1 and n and {hn}n≥ 1 denotes a sequence of positive constants converging to zero and \(n{h^{m}_{n}} \rightarrow \infty \). For notational simplicity, we let h = hn. In the particular case m = 1, the r(m)(φ,t) is reduced to \(r^{(1)}(\varphi ,\mathbf { t})=\mathbb {E}(\varphi (\mathbf { Y})|\mathbf { X}=\mathbf { t})\) and Stute’s estimator becomes the Nadaraya-Watson estimator of r(1)(φ,t) given by:

$$ \widehat{r}_{n}^{(1)}(\varphi, \mathbf{ t}, h)= \left. \sum\limits_{i=1}^{n}\varphi(\mathbf{ Y}_{i})K\left( \frac{\mathbf{ X}_{i}-\mathbf{ t}}{h}\right) \middle/ \sum\limits_{i=1}^{n}K\left( \frac{\mathbf{ X}_{i}-\mathbf{ t}}{h}\right) \right.. $$

The work of Sen (1994) was devoted to estimate the rate of the uniform convergence in t of \(\widehat {r}_{n}^{(m)}(\varphi ,\mathbf {t};h)\) to r(m)(φ,t). In the paper of Prakasa Rao and Sen (1995), the limit distributions of \(\widehat {r}_{n}^{(m)}(\varphi ,\mathbf { t};h)\) are discussed and compared with those obtained by Stute. Harel and Puri (1996) extend the results of Stute (1991), under appropriate mixing conditions, to weakly dependent data and have applied their findings to verify the Bayes risk consistency of the corresponding discrimination rules. Stute (1996) proposed symmetrized nearest neighbour conditional U-statistics as alternatives to the usual kernel-type estimators. An important contribution is given in the paper Dony and Mason (2008) where a much stronger form of consistency holds, namely, uniform in t and in bandwidth consistency (i.e., h ∈ [an,bn] where \(a_{n}<b_{n}\rightarrow 0\) at some specific rate) of \(\widehat {r}_{n}^{(m)}(\varphi ,\mathbf { t};h)\). In addition, uniform consistency is also established over \(\varphi \in {\mathscr{F}}\) for a suitably restricted class \({\mathscr{F}}\). The main tool in their result is the use of the local conditional U-process investigated in Giné and Mason (2007a). In the last decades, empirical process theory has provided very useful and powerful tools to analyze the large sample properties of several nonparametric estimators of functionals of the distribution, such as the regression function and the density function, refer to, van der Vaart and Wellner (1996) and Kosorok (2008). Nolan and Pollard (1987) were the first to introduce the notion of uniform in bandwidth consistency for kernel density estimators and they applied empirical process methods in their study. In the series of papers, Deheuvels (2000), Deheuvels and Mason (2004), Einmahl and Mason (2005), Dony and Mason (2008), Maillot and Viallon (2009), Mason and Swanepoel (2011), Bouzebda and Elhattab(2009, 2010), Bouzebda (2012), Bouzebda et al. (2018, 2021), Bouzebda and Nemouchi (2020), Bouzebda and El-hadjali (2020), Bouzebda and Nezzal (2022) the authors established uniform consistency results for such kernel estimators, where h varies within suitably chosen intervals indexed by n. More precisely, we will consider one of the most commonly used classes of estimators that is formed by the so-called kernel-type estimators. There are basically no restrictions on the choice of the kernel function in our setup, apart from satisfying some mild conditions that we will give after. The selection of the bandwidth, however, is more problematic. It is worth noticing that the choice of the bandwidth is crucial to obtain a good rate of consistency, for example, it has a big influence on the size of the estimate’s bias. In general, we are interested in the selection of bandwidth that produces an estimator which has a good balance between the bias and the variance of the considered estimators. It is then more appropriate to consider the bandwidth varying according to the criteria applied and the available data and location, which cannot be achieved by using classical methods. The interested reader may refer to Mason (2012) for more details and discussion on the subject. In the present paper, we develop methods that permit the study of the kernel-type under nonrestrictive conditions. It is worth noticing that the high-dimensional data sets have several unfortunate properties that make them hard to analyze. The phenomenon that the computational and statistical efficiency of statistical techniques degrade rapidly with the dimension is often referred to as the “curse of dimensionality”. Density and regression estimation on manifolds has received much less attention than the “full-dimensional” counterpart. However, understanding density estimation in situations where the intrinsic dimension can be much lower than the ambient dimension is becoming ever more important: modern systems can capture data at an increasing resolution while the number of degrees of freedom stays relatively constant. One of the limiting aspects of density (regression)-based approaches is their performance in high dimensions.

We know that the notion of the intrinsic dimension, say dM, has been studied in the statistical machine learning literature so as to establish fast estimation rates in high-dimensional kernel regression settings. There are numerous known techniques for doing so e.g. Kégl (2002), Levina and Bickel (2004), Hein and Audibert (2005), Farahmand et al. (2007). We first introduce a concept proposed by Kim et al. (2018, 2019), the so-called volume dimension, to characterize the intrinsic dimension of the underlying distribution. More specifically, the volume dimension dvol is the decay rate of the probability of vanishing Euclidean balls. Let ∥⋅∥ be the Euclidean 2-norm. For \(\mathbf { x}\in \mathbb {R}^{d}\) and r > 0, we use the notation \(\mathbb {B}_{\mathbb {R}^{d}}(\mathbf { x},r)\) for the open Euclidean ball centered at x and radius r, i.e.,

$$\mathbb{B}_{\mathbb{R}^{d}}(\mathbf{ x},r)=\left\{ \mathbf{y}\in \mathbb{R}^{d}:\|\mathbf{ y}-\mathbf{ x}\|<r\right\}.$$

When a probability distribution \(\mathbb {P}\) has a bounded density fX(⋅) supported on a well-behaved manifold M of dimension dM, it is known that, for any point xM, the measure on the ball \(\mathbb {B}_{\mathbb {R}^{d}}(\mathbf { x},r)\) centered at x and radius r decays as

$$ \mathbb{P}\left( \mathbb{B}_{\mathbb{R}^{d}}(\mathbf{ x},r)\right)\sim r^{d_{M}}, $$

when r is small enough. From this, Kim et al. (2018) define the volume dimension of a probability distribution \(\mathbb {P}\) to be the maximum possible exponent rate that can dominate the probability volume decay on balls, i.e., fix a subset \(\mathbb {X}\subset \mathbb {R}^{d}\), then

$$ d_{\text{vol}}(\mathbb{P}):=\sup\left\{ \nu\geq 0: \limsup_{r\to 0}\sup_{\mathbf{x}\in\mathbb{X}}\frac{\mathbb{P}(\mathbb{B}_{\mathbb{R}^{d}}(\mathbf{x},r))}{r^{\nu}}<\infty\right\}. $$
(1.4)

The primary purpose of the present work is to extend the work of Kim et al. (2018) to the more general estimators, including the kernel density estimator as a particular case (studied in Kim et al. 2018), this generalization is far from being trivial and harder to control some complex classes of functions, which form a basically unsolved open problem in the literature. We aim to fill this gap in the literature by combining results Kim et al. (2018) with techniques developed in Einmahl and Mason (2005) and Dony and Mason (2008). The present extends our previous work in Bouzebda and El-hadjali (2020) in several directions; the main one is that the present paper considers the conditional U-statistics, including the regression treated in the last mentioned paper. More precisely, a uniform in bandwidth consistency results for some general kernel-type estimators are established. However, as will be seen later, the problem requires much more than “simply” combining ideas from the existing results. Delicate mathematical derivations will be required to cope with the empirical processes that we consider in this extended setting. In addition, we will consider the nonparametric Inverse Probability of Censoring Weighted (I.P.C.W.) estimators of the multivariate regression function under random censorship and obtain uniform in bandwidth consistency results that are of independent interest.

An outline of the remainder of our paper is as follows. In the forthcoming section, we introduce the mathematical framework and provide our main results concerning the uniform in bandwidth consistency of conditional U-statistics adaptive to intrinsic dimension extending the setting of the work of Dony and Mason (2008). In Section 3, we consider the conditional U-statistics in the right censored data framework. Examples of U-statistics kernel are provided in Section 3.1. In Section 4, we present how to select the bandwidth through the cross-validation procedures. An application to the nonparametric discrimination problem is discussed in Section 4.1. Some concluding remarks are given in Section 5. To prevent interrupting the presentation flow, all proofs are gathered in Section 6. A few relevant technical results are given in the ?? for easy reference.

2 Main Results

For a fixed integer mn, consider a class \({\mathscr{F}}_{q}\) of measurable functions \(g : \mathbb {R}^{qm} \rightarrow \mathbb {R}\) defined on \(\mathbb {R}^{qm}\), such that \(\mathbb {E}g^{2}(\mathbf {Y}_{1},\ldots , \mathbf {Y}_{m}) <\infty \), which satisfies the conditions, (F.i)–(F.iii) given below. First, to avoid measurability problems, we assume that

$$ \mathscr{F}_{q} ~\text{is~a~pointwise~measurable~class,} $$
(F.i)

that is, there exists a countable subclass \({\mathscr{F}}_{0}\) of \({\mathscr{F}}_{q}\) such that we can find, for any function \(g \in {\mathscr{F}}_{q}\), a sequence of functions \(g_{m}\in {\mathscr{F}}_{0}\) for which

$$g_{m}(z) \rightarrow g(z),~~ z \in \mathbb{R}^{qm}.$$

This condition is discussed in van der Vaart and Wellner (1996). We also assume that \({\mathscr{F}}_{q}\) has a measurable envelope function

$$ F(\mathbf{ y}) \geq \sup_{g\in \mathscr{F}_{q}}|g(\mathbf{ y})|~~\text{for}~~\mathbf{ y}\in\mathbb{R}^{qm}. $$
(F.ii)

Notice that condition (F.i) implies that the supremum in Eq. F.ii is measurable. Finally, we assume that \({\mathscr{F}}_{q}\) is of VC-type, with characteristics A and v (“VC” for Vapnik and Červonenkis), meaning that for some A ≥ 3 and v1 ≥ 1,

$$ \mathcal{N}(\mathscr{F}_{q},L_{2}(Q),\varepsilon)\leq \left( \frac{A\|F\|_{L_{2}(Q)}}{\varepsilon}\right)^{v_{1}},~~\text{for}~~0<\varepsilon\leq 2\|F\|_{L_{2}(Q)}, $$
(F.iii)

where Q is any probability measure on \((\mathbb {R}^{qm},{\mathscr{B}})\), where \({\mathscr{B}}\) represents the σ-field of Borel sets of \(\mathbb {R}^{qm}\), such that \(\|F\|_{L_{2}(Q)}< \infty \), and where for ε > 0, \(\mathcal {N}({\mathscr{F}}_{q},L_{2}(Q),\varepsilon )\) is defined as the smallest number of L2(Q)-open balls of radius ε required to cover \({\mathscr{F}}_{q}\). (If Eq. F.iii holds for \({\mathscr{F}}_{q}\), then we say that the VC-type class \({\mathscr{F}}_{q}\) admits the characteristics A and v1). In this section, we follow Kim et al. (2018) for weakening the conditions on the kernel and making it adaptive to the intrinsic dimension of the underlying distribution and without assumptions on the distribution. It is worth noticing that for general distributions such as the one with support is a lower-dimensional manifold, the usual change of variables argument is no longer directly applicable. However, we can provide a bound based on the volume dimension under an integrability condition on the kernel, given below. Let K(⋅) be a kernel function defined on \(\mathbb {R}^{dm}\), that is a measurable function satisfying

$$ {\int}_{\mathbb{R}^{d}}K(\mathbf{t})d\mathbf{t}=1. $$

Assumption 1.

(Integrability condition) Let \(\|K\|_{\infty }=\sup _{\mathbf {x}\in \mathbb {R}^{d}}|K(\mathbf {x})|=:\kappa <\infty ,\) and fix k > 0. We have: either dvol = 0 or

$$ \ {\int}_{0}^{\infty}x^{d_{\text{vol}}-1}\sup_{\| \mathbf{t} \|\geq x}|K|^{k}(\mathbf{t})dx < \infty. $$
(K.ii)

Remark 2.1.

(Kim et al. 2018) It is important to emphasize that Assumption 1 is weak, as it is satisfied by commonly used kernels. For instance, if the kernel function K(x) decays at a polynomial rate strictly faster than dvol/k (which is at most d/k) as \(\mathbf { x}\to \infty \), that is, if

$$ \limsup_{\mathbf{ x}\to\infty} \left\Vert \mathbf{ x}\right\Vert^{d_{\text{vol}}/k+\epsilon}K(x)<\infty $$

for any 𝜖 > 0, the integrability condition Eq. K.ii is satisfied. Also, if the kernel function K(x) is spherically symmetric, that is, if there exists \(\widetilde {K}:[0,\infty )\to \mathbb {R}\) with \(K(\mathbf { x})=\widetilde {K}(\left \Vert \mathbf { x}\right \Vert _{2})\), then the integrability condition Eq. K.ii is satisfied provided \(\left \Vert K\right \Vert _{k} <\infty \). Kernels with bounded support also satisfy the condition Eq. K.ii. Thus, most of the commonly used kernels including Uniform, Epanechnikov, and Gaussian kernels satisfy the above integrability condition.

Now, we consider the class of functions

$$ \mathscr{K }:=\left\{(\mathbf{t},h)\mapsto K\left( \frac{\mathbf{t}-\cdot}{h}\right): \mathbf{t}\in\mathbb{X}, h\geq l_{n}\right\}, $$

with \(\|K\|_{2}<\infty \).

Assumption 2.

Assume that \({\mathscr{K}}\) is bounded VC-class with envelope κ and dimension ν2, i.e., there exists positive number A2 ≥ 1 and ν2 ≥ 1 such that, for every probability measure \(\mathbb {Q}\) on \(\mathbb {R}^{d}\) and for every \(\varepsilon \in (0,\|K\|_{\infty })\), the covering numbers \(\mathcal {N}(\mathcal {K },L_{2}(\mathbb {Q}),\varepsilon )\) satisfies

$$ \mathcal{N}(\mathcal{K },L_{2}(\mathbb{Q}),\varepsilon)\leq \left( \frac{A_{2}\|K\|_{\infty}}{\varepsilon}\right)^{\nu_{2}}. $$
(K.iii)

Furthermore, let

$$ \widetilde{\mathbf{K}}(\mathbf{t}):=\prod\limits_{j=1}^{m}K(\mathbf{t}_{j}), ~~~~\mathbf{t}= (\mathbf{t}_{1},\ldots,\mathbf{t}_{m}) $$
(K.iv)

denote the product kernel. Next, if \((S, \mathcal {S})\) is a measurable space, define the general U-statistic with kernel \(H:S^{k}\rightarrow \mathbb {R}\) based on S-valued random variables Z1,⋯ ,Zn as

$$ U_{n}^{(k)}(H):= \frac{(n-k)!}{n!}\sum\limits_{i\in I(k,n)}H(Z_{i_{1}},\ldots, Z_{i_{k}}),~~~~~~~1\leq k\leq n, $$
(2.1)

where I(k,n) is defined as in Eq. 1.3 with m = k. Note that we do not require H to be symmetric here. For a bandwidth 0 < h and \(g\in {\mathscr{F}}_{q}\), consider the U-kernel

$$G_{g,h,\mathbf{t}}(\mathbf{x}, \mathbf{y}):=g(\mathbf{y})\widetilde{\mathbf{K}}_{h}(\mathbf{t}-\mathbf{x})~~~~~~~~~~~~~~~\mathbf{x}, \mathbf{t} \in \mathbb{R}^{dm} ~~~~\text{and}~~~~ \mathbf{y} \in \mathbb{R}^{qm},$$

where, as usual, Kh(z) = hdK(z/h), \(z\in \mathbb {R}^{d}\), and for the sample (X1,Y1), …,(Xn,Yn), define

$$U_{n}(g,h,\mathbf{t}):= U_{n}^{(m)}(G_{g,h,\mathbf{t}})=\frac{(n-m)!}{n!}\sum\limits_{i\in {I_{n}^{m}}}G_{g,h,\mathbf{t}}(\mathbf{X}_{i}, \mathbf{Y}_{i}),$$

where, throughout this paper, we shall use the notation

$$ \begin{array}{@{}rcl@{}} &&{}\mathbf{X} = (\mathbf{X}_{1},\ldots, \mathbf{X}_{m})\in \mathbb{R}^{dm}~~~~\text{and}~~~~~~\mathbf{X}_{\mathbf{i}}=(\mathbf{X}_{i_{1}},\ldots, \mathbf{X}_{i_{k}})\in \mathbb{R}^{dk}~~~~\mathbf{i} \in \mathbf{I}_{n}^{k},\\ &&{}\mathbf{Y} = (\mathbf{Y}_{1},\ldots, \mathbf{Y}_{m})\in \mathbb{R}^{qm}~~~~\text{and}~~~~~~\mathbf{Y}_{\mathbf{i}}=(\mathbf{Y}_{i_{1}},\ldots, \mathbf{Y}_{i_{k}})\in \mathbb{R}^{dk}~~~~\mathbf{i} \in \mathbf{I}_{n}^{k}. \end{array} $$

Now, introduce the U-statistic process

$$ u_{n}(g,h,\mathbf{t}):=\sqrt{n}\{U_{n}(g,h,\mathbf{t})-\mathbb{E}U_{n}(g,h,\mathbf{t})\}. $$
(2.2)

We denote by I and J two fixed subsets of \(\mathbb {R}^{d}\) such that

$$ \mathbf{I}=\prod\limits_{j=1}^{d}\left[a_{j}, b_{j}\right] \subset \mathbf{J}=\prod\limits_{j=1}^{d}\left[c_{j}, d_{j}\right] \subset \mathbb{R}^{d}, $$

where

$$ -\infty<c_{j}<a_{j}<b_{j}<d_{j}<\infty \text { for } j=1, \ldots, d. $$

Introduce the class of functions defined on the compact subset Jm of \(\mathbb {R}^{dm}\),

$$ \mathcal{M}=\left\{r^{(m)}(\varphi,\cdot) \widetilde{f}(\cdot): \varphi \in \mathscr{F}_{q}\right\}, $$

where r(m)(φ,⋅) is defined in Eq. 1.1 and the function \(\widetilde {f}: \mathbb {R}^{dm} \rightarrow \mathbb {R}\) is defined as

$$ \widetilde{f}(\mathbf{t}):={\int}_{\mathbb{R}^{qm}} f\left( \mathbf t_{1}, \mathbf y_{1}\right) {\cdots} f\left( \mathbf t_{m}, \mathbf y_{m}\right) \mathrm{d}\mathbf y_{1} {\ldots} \mathrm{d} \mathbf y_{m}=f_{\mathbf X}\left( \mathbf t_{1}\right) {\ldots} f_{\mathbf X}\left( \mathbf t_{m}\right) ,$$

where \(f\left (\cdot , \cdot \right )\) denote the joint density of (X,Y). We fix a subset \(\mathbb {X}\subset \mathbb {R}^{d}\) on which we are considering the uniform convergence of the kernel regression estimator. We first characterize the intrinsic dimension of the distribution \(\mathbb {P}\), proposed by Kim et al. (2018), by its rate of the probability volume growth on balls. If a probability distribution has a positive measure on a manifold with a positive reach, then the volume dimension is always between 0 and the manifold’s dimension. In particular, the volume dimension of any probability distribution is between 0 and the ambient dimension d.

Lemma 2.2.

(Kim et al. 2018)Let \(\mathbb {P}\) be a probability distribution on \(\mathbb {R}^{d}\), and dvol be its volume dimension. Then for any ν ∈ [0,dvol), there exists a constant \(C_{\nu , \mathbb {P}}\) depending only on \(\mathbb {P}\) and ν such that for all \(\mathbf { x}\in \mathbb {X}\) and r > 0,

$$ \frac{\mathbb{P}(\mathbb{B}_{\mathbb{R}^{d}}(\mathbf{ x},r))}{r^{\nu}}\leq C_{\nu,\mathbb{P}}. $$
(2.3)

For the exact optimal rate, we impose conditions on how the probability volume decay in Eq. 2.3.

Assumption 3.

Let \(\mathbb {P}\) be a probability distribution on \(\mathbb {R}^{d}\), and dvol be its volume dimension. For ν ∈ [0,dvol), we assume that

$$ \limsup\limits_{r\rightarrow 0}\sup_{\mathbf{x}\in\mathbb{X}} \frac{\mathbb{P}(\mathbb{B}_{\mathbb{R}^{d}}(\mathbf{ x},r))}{r^{\nu}} <\infty . $$
(2.4)

Assumption 4.

Let \(\mathbb {P}\) be a probability distribution on \(\mathbb {R}^{d}\), and dvol be its volume dimension. For ν ∈ [0,dvol), we assume that

$$ \sup_{\mathbf{x}\in\mathbb{X}} \liminf\limits_{r\rightarrow 0} \frac{\mathbb{P}(\mathbb{B}_{\mathbb{R}^{d}}(\mathbf{ x},r))}{r^{\nu}} >0. $$
(2.5)

These assumptions are in fact weak and hold for common probability distributions. In particular, Assumptions 3 and 4 hold when the probability distribution has a bounded density with respect to the d-dimensional Lebesgue measure. By combining Assumption 1 and Lemma (2.0.2) of Kim et al. (2018), we can bound \(\mathbb {E}_{\mathbb {P}}\left [K^{2}\right ]\) in terms of the volume dimension dvol.

Lemma 2.3.

Let \((\mathbb {R}^{d},\mathbb {P})\) be a probability space and let \(X \sim \mathbb {P}\). For any kernel K(⋅) satisfying Assumption 1 with k > 0, the expectation of the k-moment of the kernel is upper bounded as

$$ \mathbb{E}_{\mathbb{P}}\left[\left| K\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\right|^{k}\right]\leq C_{k,\mathbb{P},K,\varepsilon}h^{d_{\text{vol}}-\varepsilon} $$
(2.6)

for any ε ∈ (0,dvol), where \(C_{k,\mathbb {P},K,\varepsilon }\) is a constant depending only on \(k,\mathbb {P},K\) and ε. Further, if dvol = 0 or under Assumption 1 in Kim et al. (2018), ε can be 0 in Eq. 2.6.

We give an example from Kim et al. (2018) of an unbounded density. In this case, the volume dimension is strictly smaller than the dimension of the support, which illustrates why the dimension of the support is not enough to characterize the dimensionality of a distribution.

Example 2.4.

(Kim et al. 2018) Let \(\mathbb {P}\) be a distribution on \(\mathbb {R}^{d}\) having a density p with respect to the d-dimensional Lebesgue measure. Fix β < d, and suppose \(p:\mathbb {R}^{d}\to \mathbb {R}\) is defined as

Then, for each fixed r > 0,

$$ \begin{array}{@{}rcl@{}} \sup_{\mathbf{x}\in\mathbb{R}^{d}}\mathbb{P}(\mathbb{B}_{\mathbb{R}^{d}}(\mathbf{ x},r)) & =\mathbb{P}(\mathbb{B}_{\mathbb{R}^{d}}(\mathbf{ 0},r))=r^{d-\beta}. \end{array} $$

Hence from definition in Eq. 1.4, the volume dimension is

$$ d_{\text{vol}}(\mathbb{P})=d-\beta. $$

In our setting, we will use

$$ \begin{array}{@{}rcl@{}} &&\widehat{r}_{n}^{(m)}(\varphi,\mathbf{t};h)\\&&= \left\{ \begin{array}{lc} \frac{\displaystyle\sum\limits_{(i_{1},\ldots,i_{m})\in I(m,n)}\varphi({ Y}_{i_{1}},\ldots,{ Y}_{i_{m}})K\left( \frac{\mathbf{ t}_{1}- \mathbf{ X}_{i_{1}}}{h}\right){\cdots} K\left( \frac{\mathbf{ t}_{m}-\mathbf{ X}_{i_{m}}}{h}\right)}{\displaystyle\sum\limits_{(i_{1},\ldots,i_{m})\in I(m,n)}K\left( \frac{ \mathbf{ t}_{1}-\mathbf{ X}_{i_{1}}}{h}\right){\cdots} K\left( \frac{\mathbf{ t}_{m}-\mathbf{ X}_{i_{m}}}{h}\right)},&\\\text{if}~~\displaystyle \sum\limits_{(i_{1},\ldots,i_{m})\in I(m,n)}K\left( \frac{ \mathbf{ t}_{1}-\mathbf{ X}_{i_{1}}}{h}\right){\cdots} K\left( \frac{\mathbf{ t}_{m}-\mathbf{ X}_{i_{m}}}{h}\right)\neq 0,\\ \displaystyle\frac{(n-m)!}{n!}\sum\limits_{(i_{1},\ldots,i_{m})\in I(m,n)}\varphi({ Y}_{i_{1}},\ldots,{ Y}_{i_{m}})\\ \text{if}~~\displaystyle \sum\limits_{(i_{1},\ldots,i_{m})\in I(m,n)}K\left( \frac{ \mathbf{ t}_{1}-\mathbf{ X}_{i_{1}}}{h}\right){\cdots} K\left( \frac{\mathbf{ t}_{m}-\mathbf{ X}_{i_{m}}}{h}\right)= 0. \end{array} \right. \end{array} $$

It is clear that \(\widehat {r}_{n}^{(m)}(\varphi ,\mathbf {t};h)\) can be rewritten, for all \(\varphi \in {\mathscr{F}}\), as

$$\widehat{r}_{n}^{(m)}(\varphi,\mathbf{t};h)=\frac{{\sum}_{\mathbf{i} \in {I_{n}^{m}}} \varphi(\mathbf{Y}_{\mathbf{i}})\widetilde{K}_{h}(\mathbf{t}- \mathbf{X}_{\mathbf{i}})}{{\sum}_{\mathbf{i} \in {I_{n}^{m}}}\widetilde{K}_{h}(\mathbf{t}- \mathbf{X}_{\mathbf{i}})}=\frac{U_{n}(\varphi,\mathbf{t};h)}{U_{n}(1,\mathbf{t};h)},$$

where we denote by Un(1,t;h) the U-statistic Un(φ,t;h) with φ ≡ 1. To prove the uniform consistency of \(\widehat {r}_{n}^{(m)}(\varphi ,\mathbf {t};h)\) to r(m)(φ,t), we shall consider another, more appropriate, centering factor than the expectation \(\mathbb {E} \widehat {r}_{n}^{(m)}(\varphi ,\mathbf { t};h)\), which may not exist or may be difficult to compute. Define the centring

$$ \widehat{\mathbb{E}} \widehat{r}_{n}^{(m)}(\varphi,\mathbf{ t};h):=\frac{\mathbb{E}U_{n}(\varphi,\mathbf{t};h)}{\mathbb{E}U_{n}(1,\mathbf{t};h)}. $$
(2.7)

This centering permits us to derive results on the convergence rates of the process

$$\widehat{r}_{n}^{(m)}(\varphi,\mathbf{ t};h)-\widehat{\mathbb{E}} \widehat{r}_{n}^{(m)}(\varphi,\mathbf{ t};h)$$

to zero and the consistency of \( \widehat {r}_{n}^{(m)}(\varphi ,\mathbf { t};h)\) uniform in t and in bandwidth.

Theorem 2.5.

Let \(l_{n} = c(\log n/n)^{1/dm}\) for c > 0. If the class of functions \({\mathscr{F}}_{q}\) is bounded, in the sense that for some \(0 < M < \infty \),

$$ F(\mathbf{ y})<M,~~\text{for}~~\mathbf{y} \in \mathbb{R}^{qm}. $$
(2.8)

Fix ε ∈ (0,dvol). Further, if dvol = 0 or under Assumption 3, ε can be 0. Suppose

$$\limsup_{n}\frac{\displaystyle\left( \log\left( \frac{1}{l_{n}}\right)\right)_{+}+\log\left( \frac{2}{\delta}\right)}{ \displaystyle nl_{n}^{d_{\text{vol}}-\varepsilon}}< \infty.$$

Then we can infer, under the above mentioned assumptions on \({\mathscr{F}}_{q}\) and Assumptions 12 and 4, that for all δ > 0, there exists a constant \(0<\mathfrak {C}_{1}<\infty \) such that we have with probability at least 1 − δ

$$ \sup_{h\geq l_{n}}\sup_{g\in \mathscr{F}_{q}}\sup_{\mathbf{ t}\in\mathbb{R}^{dm}}\frac{\sqrt{nl_{n}^{m(2d-d_{\text{vol}}+\varepsilon)}}| U_{n}(g,\mathbf{t};h)-\widehat{\mathbb{E}} U_{n}(g,\mathbf{ t};h)|}{\sqrt{|\log l_{n}|\vee \log (2/\delta)}}\leq \mathfrak{C}_{1}. $$
(2.9)

Theorem 2.6.

Let \(a_{n} = c((\log n/n)^{1-2/p})^{1/dm}\) for c > 0. If \({\mathscr{F}}_{q}\) is unbounded, but satisfies, for some p > 2,

$$ \mu_{p}:=\sup_{\mathbf{x}\in\mathbb{R}^{dm}}\mathbb{E}(F^{p}(\mathbf{Y})\mid \mathbf{ X}=\mathbf{x})<\infty, $$
(2.10)

then we can infer, under the above mentioned assumptions on \({\mathscr{F}}_{q}\) and and Assumptions 12 and 4, that for all c > 0 and 0 < b0 < 1, there exists a constant \(0<\mathfrak {C}_{2}<\infty \) such that

$$ \limsup_{n\rightarrow \infty}\sup_{a_{n}\leq h\leq b_{0}}\sup_{g\in \mathscr{F}_{q}}\sup_{\mathbf{ t}\in\mathbb{R}^{dm}}\frac{\sqrt{nh^{{m(2d-d_{\text{vol}}+\varepsilon)}}}| U_{n}(g,\mathbf{t};h)-\widehat{\mathbb{E}} U_{n}(g,\mathbf{ t};h)|}{\sqrt{|\log h|\vee \log\log n}}\leq \mathfrak{C}_{2}, $$

for any ε ∈ (0,dvol).

We mention that Kim et al. (2018) do not need continuity of the density for their results. (Of course, continuity of the density is crucial for controlling the bias.) Some related results on uniform convergence over compact subsets have been obtained by Bouzebda and El-hadjali (2020) for a much larger class of estimators including kernel estimators for regression functions among others. In this general setting, however, it is often not possible to obtain the convergence uniformly over \(\mathbb {R}^{d}\). Density estimators are in that sense somewhat exceptional.

Theorem 2.7.

Besides being bounded, suppose that the marginal density function fX of X is continuous and strictly positive on the interval I. Assume that the class of functions \({\mathscr{M}}\) is uniformly equicontinuous. It then follows that for all sequences 0 < bn < 1 with \(b_{n} \rightarrow 0\),

$$ \sup_{0<h \leq b_{n}} \sup_{\varphi \in \mathscr{F}_{q}} \sup_{\mathbf{t} \in \mathbf{I}^{m}}\left|\widehat{\mathbb{E}}\widehat{r}^{(m)}_{n}(\varphi,\mathbf{t};h)-r^{(m)}(\varphi,\mathbf{ t})\right|=o(1) ,$$

where Im = I ×… ×I.

Corollary 2.8.

Besides being bounded, suppose that the marginal density function fX of X is continuous and strictly positive on the interval I. It then follows, under the above mentioned assumptions on \({\mathscr{F}}_{q}\) and and Assumptions 12 and 4, that for all c > 0 and all sequences 0 < bn < 1 with \(a_{n}^{\prime }\leq b_{n} \rightarrow 0\), there exists a constant \(0<\mathfrak {C}_{3}<\infty \) such that

$$ \limsup_{n\rightarrow \infty}\sup_{a^{\prime}_{n}\leq h\leq b_{0}}\sup_{\varphi\in \mathscr{F}_{q}}\sup_{\mathbf{ t}\in\mathbf{I}^{m}}\frac{\sqrt{nl_{n}^{^{m(2d-d_{\text{vol}}+\varepsilon)}}}| \widehat{r}_{n}^{(m)}(\varphi,\mathbf{t};h)-\widehat{\mathbb{E}} \widehat{r}_{n}^{(m)}(\varphi,\mathbf{t};h)|}{\sqrt{{|\log h|\vee \log \log n}}}\leq \mathfrak{C}_{3}, $$

where for any ε ∈ (0,dvol) and \(a_{n}^{\prime }\) is either an or ln, depending on whether the class \({\mathscr{F}}_{q}\) is bounded or not.

We can now state the main result of this section which follows easily from Theorems 2.5 and 2.6.

Corollary 2.9.

Under the conditions of Theorems 2.5 and 2.6 on fX and the class of functions \({\mathscr{F}}_{q}\) and Assumptions 12 and 4, it follows that for all sequences \(0<a_{n}^{\prime }\leq \widetilde a_{n}\leq b_{n}<1\) satisfying \(b_{n}\rightarrow 0\) and \(n\widetilde a_{n} /\log n\rightarrow \infty \),

$$ \sup_{\widetilde a_{n} \leq h \leq b_{n}}\sup_{\varphi\in \mathscr{F}_{q}}\sup_{\mathbf{t} \in \mathbf I^{m}}|\widehat{r}_{n}^{(m)}(\varphi,\mathbf{t};h)-r^{(m)}(\varphi,\mathbf{ t})|\rightarrow 0, a.s. $$

Remark 2.10.

Under additional, weak regularity conditions on \(\mathbb {P}\), the value of ε can be taken equal to 0 (2.9). Under the assumption that the distribution has a bounded Lebesgue density, dvol = d so our result recovers existing results in literature in terms of rates of convergence, in particular the results presented in Dony and Mason (2008). Our results complement those Dony and Mason (2008) by relaxing the condition on the kernel functions as it was done by Kim et al. (2018). At this point, we mention that our results are stated in the multivariate setting and more importantly are adaptative to the dimension volume. This alleviates the problem of the curse of dimension. To be more precise, it is well known that the estimation problems of a regression function are especially hard in the case when the dimension of the explanatory X is large. It is worth noticing that one consequence of this is that the optimal minimax rate of convergence n− 2k/(2k+d) for the estimation of a k times differentiable regression function converges to zero rather slowly if the dimension d of X is large compared to k. To circumvent the so-called curse of dimensionality, the only way is to impose additional assumptions on the regression functions. The simplest way is to consider the linear models but this rather restrictive parametric assumption can be extended in several ways. An idea is to consider the additive models to simplify the problem of regression estimation by fitting only functions to the data which have the same additive structure. In projection pursuit one generalizes this further by assuming that the regression function is a sum of univariate functions applied to projections of x in various directions, we note that this includes the single index models as particular cases, the interested reader may refer to Györfi et al. (2002, Chapter 22) for more rigorous developments of such techniques. Other ways to be investigated are the semi-parametric models, considered intermediary models between linear and nonparametric ones, aiming to combine the flexibility of nonparametric approaches with the interpretability of the parametric ones, for details on these methods for functional data, one can refer to Ling and Vieu (2018, Section 4.2) and the reference therein.

Remark 2.11.

We note that the main problem in using an estimator such as in Eq. 1.2 is to choose properly the smoothing parameter h. The uniform in bandwidth consistency results given in Corollary 2.9 shows that any choice of h between an and bn ensures the consistency of \(\widehat {r}_{n}^{(m)}(\varphi ,\mathbf {t};h)\). Namely, the fluctuation of the bandwidth in a small interval does not affect the consistency of the nonparametric estimator \(\widehat {r}_{n}^{(m)}(\varphi ,\mathbf {t};h)\) of r(m)(φ,t).

Remark 2.12.

For notational convenience, we have chosen the same bandwidth sequence for each margins. This assumption can be dropped easily. If one wants to make use of the vector bandwidths (see, in particular, Chapter 12 of Devroye and Lugosi (2001)). With obvious changes of notation, our results and their proofs remain true when h is replaced by a vector bandwidth \(\mathbf { h}_{n} = (h^{(1)}_{n}, \ldots , h^{(d)}_{n})\), where \({\min \limits } h^{(i)}_{n} > 0\). In this situation we set \(h={\prod }_{i=1}^{d} h^{(i)}\), and for any vector v = (v1,…,vd) we replace v/h by (v1/h(1),…,v1/h(d)). For ease of presentation, we chose to use real-valued bandwidths throughout.

Remark 2.13.

In the sequel, we will need to symmetrize the functions Gg,h,t(x,y). To do this, we have

$$\bar{G}_{g,h,\mathbf{t}}(\mathbf{x}, \mathbf{y}):= \frac{1}{m!}\sum\limits_{\sigma \in {\mathbf{I}_{m}^{m}}}G_{g,h,\mathbf{t}}(\mathbf{x}_{\sigma}, \mathbf{y}_{\sigma})= \frac{1}{m!}\sum\limits_{\sigma \in \mathbf{I}_{m}^{m}}g(\mathbf{y}_{\sigma})\widetilde{\mathbf{K}}_{h}(\mathbf{t}-\mathbf{x}_{\sigma}),$$

where \(\mathbf {x}_{\sigma }:=(x_{\sigma _{1}}, \ldots , x_{\sigma _{m}})\) et \(\mathbf {y}_{\sigma }:=(y_{\sigma _{1}}, \ldots , y_{\sigma _{m}})\). Obviously, after symmetrization we have

$$\mathbb{E}(G_{g,h,\mathbf{t}}(\mathbf{x}, \mathbf{y}))= \mathbb{E}(\bar{G}_{g,h,\mathbf{t}}(\mathbf{x}, \mathbf{y}))~~~~~\text{and}~~~~~~U_{n}^{(m)}(\bar{G}_{g,h,\mathbf{t}}(\cdot, \cdot))=U_{n}(g,\mathbf{t};h)$$

So the U-statistic process in Eq. 2.2 may be redefined using the symmetrized kernels, hence we consider

$$ u_{n}^{(m)}(g,h,\mathbf{t}):=\sqrt{n}\{U_{n}^{(m)}(\bar{G}_{g,h,\mathbf{t}})-\mathbb{E}U_{n}^{(m)}(\bar{G}_{g,h,\mathbf{t}} )\}. $$
(2.11)

For more details, consult for instance the book of de la Peña and Giné (1999).

3 Extension to the Censored Case

Consider a triple (Y,C,X) of random variables defined in \(\mathbb {R} \times \mathbb {R} \times \mathbb {R}^{d}\). Here Y is the variable of interest, C is a censoring variable and X is a concomitant variable. Throughout, we will use Maillot and Viallon (2009) notation and we work with a sample {(Yi,Ci,Xi)1≤in} of independent and identically distributed replication of (Y,C,X), n ≥ 1. Actually, in the right censorship model, the pairs (Yi,Ci), 1 ≤ in, are not directly observed and the corresponding information is given by \(Z_{i} := \min \limits \{Y_{i},C_{i}\}\) and , 1 ≤ in. Accordingly, the observed sample is

$$\mathcal{D}_{n} = \{(Z_{i}, \delta_{i}, \mathbf{ X}_{i}), i = 1,\ldots,n\}.$$

Survival data in clinical trials or failure time data in reliability studies, for example, are often subject to such censoring. To be more specific, many statistical experiments result in incomplete samples, even under well-controlled conditions. For example, clinical data for surviving most types of disease are usually censored by other competing risks to life which result in death. In the sequel, we impose the following assumptions upon the distribution of (X,Y ). Denote by \(\mathcal { I}\) a given compact set in \(\mathbb {R}^{d}\) with nonempty interior and set, for any α > 0,

$$\mathcal I_{\alpha}=\left\{\mathbf{ x}: \inf_{\mathbf{ u}\in \mathcal{I}}\|\mathbf{ x}-\mathbf{ u}\| \leq \alpha\right\}.$$

We will assume that, for a given α > 0, (X,Y ) [resp. X] has a density function fX,Y [resp. fX] with respect to the Lebesgue measure on \(\mathcal I_{\alpha } \times \mathbb {R}\) [resp. \(\mathcal I_{\alpha }\)]. For \(-\infty < t < \infty \), set

$$F_{Y}(t) = \mathbb{P}(Y \leq t), ~~G(t) = \mathbb{P}(C \leq t), ~~\text{and}~~H(t) = \mathbb{P}(Z \leq t),$$

the right-continuous distribution functions of Y, C and Z respectively. For any right-continuous distribution function L defined on \(\mathbb {R}\), denote by

$$T_{L} = \sup\{t \in \mathbb{R} : L(t) < 1\}$$

the upper point of the corresponding distribution. Now consider a pointwise measurable class \({\mathscr{F}}\) of real measurable functions defined on \(\mathbb {R}\), and assume that \({\mathscr{F}}\) is of VC-type. We recall the regression function of ψ(Y ) evaluated at X = x, for \(\psi \in {\mathscr{F}}\) and \(\mathbf { x} \in \mathcal I_{\alpha }\), given by

$$ r^{(1)}(\psi,\mathbf{ x})=\mathbb{E}(\psi(Y)\mid \mathbf{ X}=\mathbf{ x}), $$

when Y is right-censored. To estimate r(1)(ψ,⋅), we make use of the Inverse Probability of Censoring Weighted (I.P.C.W.) estimators have recently gained popularity in the censored data literature (see Kohler et al. (2002), Carbonez et al. (1995), Brunel and Comte (2006)). The key idea of I.P.C.W. estimators is as follows. Introduce the real-valued function Φψ(⋅,⋅) defined on \(\mathbb {R}^{2}\) by

(3.1)

Assuming the function G(⋅) to be known, first note that Φψ(Yi,Ci) = δiψ (Zi)/(1 − G(Zi)) is observed for every 1 ≤ in. Moreover, under the Assumption (\({\mathscr{I}}\)) below,

(\({\mathscr{I}}\)):

C and (Y,X) are independent.

We have

(3.2)

Therefore, any estimate of \(r^{(1)}({\Phi }_{\psi },\cdot )\), which can be built on fully observed data, turns out to be an estimate for r(1)(ψ,⋅) too. Thanks to this property, most statistical procedures known to provide estimates of the regression function in the uncensored case can be naturally extended to the censored case. For instance, kernel-type estimates are particularly easy to construct. Set, for \(\mathbf { x}\in \mathcal {I}\), hln, 1 ≤ in,

$$ \begin{array}{@{}rcl@{}} \overline{\omega}_{n,K,h,i}^{(1)}(\mathbf{x}):=K\left( \frac{\mathbf{ x}-\mathbf{ X}_{i}}{h}\right)\Big/\sum\limits_{j=1}^{n}K\left( \frac{\mathbf{ x}-\mathbf{ X}_{j}}{h}\right). \end{array} $$
(3.3)

We assume that h satisfies (H.1). In view of Eqs. 3.13.2, and 3.3, whenever G(⋅) is known, a kernel estimator of r(1)(ψ,⋅) is given by

$$ \begin{array}{@{}rcl@{}} \breve{r}_{n}^{(1)}(\psi,\mathbf{x};h)=\sum\limits_{i=1}^{n}\overline{\omega}_{n,K,h,i}^{(1)}(\mathbf{x})\frac{\delta_{i}\psi(Z_{i})}{1-G(Z_{i})}. \end{array} $$
(3.4)

The function G(⋅) is generally unknown and has to be estimated. We will denote by \(G^{*}_{n}(\cdot )\) the Kaplan-Meier estimator of the function G(⋅) (Kaplan and Meier, 1958). Namely, adopting the conventions

$$\prod\limits_{\emptyset} = 1$$

and 00 = 1 and setting

we have

$$ G_{n}^{*}(u)=1-\prod\limits_{i:Z_{i}\leq u}\left\{\frac{N_{n}(Z_{i})-1}{N_{n}(Z_{i})}\right\}^{(1-\delta_{i})},~~\text{for}~~ u \in \mathbb{R}. $$

Given this notation, we will investigate the following estimator of r(1)(ψ,⋅)

$$ \begin{array}{@{}rcl@{}} \breve{r}_{n}^{(1)*}(\psi,\mathbf{x};h)=\sum\limits_{i=1}^{n}\overline{\omega}_{n,K,h,i}^{(1)}(\mathbf{x})\frac{\delta_{i}\psi(Z_{i})}{1-G_{n}^{*}(Z_{i})}, \end{array} $$
(3.5)

refer to Kohler et al. (2002) and Maillot and Viallon (2009). Adopting the convention 0/0 = 0, this quantity is well defined, since \(G_{n}^{*}(Z_{i})=1\) if and only if Zi = Z(n) and δ(n) = 0, where Z(k) is the k th ordered statistic associated with the sample (Z1,…,Zn) for k = 1,…,n and δ(k) is the δj corresponding to Zk = Zj. When the variable of interest is right-censored, functional of the (conditional) law can generally not be estimated on the complete support (see Brunel and Comte 2006). To obtain our results, we will work under the following assumptions.

(A.1):

, where τ < TH and \({\mathscr{F}}_{1}\) is a pointwise measurable class of real measurable functions defined on \(\mathbb {R}\) and of type VC.

(A.2):

The class of functions \({\mathscr{F}}\) has a measurable and uniformly bounded envelope function Υ with,

$$ {\Upsilon}(y_{1},\ldots,y_{m})\geq \sup_{\psi \in \mathscr{F}}\mid\psi (y_{1},\ldots,y_{m})\mid , y_{i}\leq T_{H}.$$
(A.3):

The class of functions \({\mathscr{M}}\) is relatively compact concerning the sup- norm topology on \(\mathcal I_{\alpha }^{m}\).

In what follows, we will study the uniform convergence of \(\widetilde {m}^{*}_{\psi ,n,h}(\mathbf {x})\) centred by the following centring factor

$$\widehat{\mathbb{E}} \breve{r}_{n}^{(1)*}(\psi,\mathbf{x};h)=\frac{\displaystyle\mathbb{E}\left( \psi(Y)K\left( \frac{\mathbf{x}-\mathbf{X}}{h}\right)\right)}{\displaystyle\mathbb{E}\left( K\left( \frac{\mathbf{x}-\mathbf{X}}{h}\right)\right)}.$$

This choice is justified by the fact that, under hypothesis (\({\mathscr{I}}\)) we have

(3.6)

Let us assume the following conditions.

(H.1):

h 0,0 < h < 1, and \(n h^{d} \uparrow \infty \);

(H.2):

\( n h^{d} / \log n \rightarrow \infty \) as \(n \rightarrow \infty ;\)

(H.3):

\( \log \left (1 / h\right ) / \log \log n \rightarrow \infty \) as \(n \rightarrow \infty ;\)

We now have all the ingredients to state the result corresponding to the censored case. Let \(\left \{h_{n}^{\prime }\right \}_{n \geq 1}\) and \(\left \{h_{n}^{\prime \prime }\right \}_{n \geq 1}\) be two sequences of positive constants fulfilling Assumptions (H.1-H.3) with

$$ 0<h_{n}^{\prime} \leq h_{n}^{\prime \prime}<1. $$

Bouzebda and El-hadjali (2020) showed under assumptions (A.13), for m = 1, (I), Assumption 3, for any kernel K(⋅) satisfying Assumptions 1 and 2, with probability at least 1 − δ,

$$ \sup\limits_{h_{n}^{\prime}\leq h \leq h_{n}^{\prime\prime}}\sup\limits_{\mathbf{ x}\in\mathcal{I}}\left| \breve{r}_{n}^{(1)*}(\psi,\mathbf{x};h)-\widehat{\mathbb{E}}(\breve{r}_{n}^{(1)*}(\psi,\mathbf{x};h))\right| \leq \mathfrak{C}_{4}\sqrt{\frac{\log\left( 1 / l_{n}\right)_{+}+\log\left( 2 / \delta\right)}{nl_{n}^{2d-d_{\text{vol}}+\varepsilon}}}, $$
(3.7)

for some positive constant \(\mathfrak {C}_{4}\). A right-censored version of an unconditional U-statistic with a kernel of degree m ≥ 1 is introduced by the principle of a mean preserving reweighting scheme in Datta et al. (2010). Stute and Wang (1993) have proved almost sure convergence of multi-sample U-statistics under random censorship and provided application by considering the consistency of a new class of tests designed for testing equality in distribution. To overcome potential biases arising from right-censoring of the outcomes and the presence of confounding covariates, Chen and Datta (2019) proposed adjustments to the classical U-statistics. Yuan et al. (2017) proposed a different way in the estimation procedure of the U-statistic by using a substitution estimator of the conditional kernel given the observed data. To our best knowledge, the problem of the estimation of the conditional U-statistics was opened up to the present, and it gives and main motivation to the study of this section. A natural extension of the function defined in Eq. 3.1 is given by

From this, we have an analogous relation to Eq. 3.2 given by

An analogue estimator to Eq. 1.2 in the censored case is given by

$$ \begin{array}{@{}rcl@{}} \breve{r}_{n}^{(m)}(\psi,\mathbf{t};h)=\sum\limits_{(i_{1},\ldots,i_{m})\in I(m,n)}\frac{\displaystyle \delta_{i_{1}}\cdots\delta_{i_{m}}\psi({ Z}_{i_{1}},\ldots,{ Z}_{i_{m}})}{\displaystyle (1-G(Z_{i_{1}})\cdots(1-G(Z_{i_{m}}))}\overline{\omega}_{n,K,h,\mathbf{ i}}^{(m)}(\mathbf{t}), \end{array} $$
(3.8)

where, for i = (i1,…,im) ∈ I(m,n),

$$ \begin{array}{@{}rcl@{}} \overline{\omega}_{n,K,h,\mathbf{ i}}^{(m)}(\mathbf{t}):=\frac{\displaystyle K\left( \frac{\mathbf{ t}_{1}- \mathbf{ X}_{i_{1}}}{h}\right){\cdots} K\left( \frac{\mathbf{ t}_{m}-\mathbf{ X}_{i_{m}}}{h}\right)}{\displaystyle\sum\limits_{(i_{1},\ldots,i_{m})\in I(m,n)}K\left( \frac{ \mathbf{ t}_{1}-\mathbf{ X}_{i_{1}}}{h}\right){\cdots} K\left( \frac{\mathbf{ t}_{m}-\mathbf{ X}_{i_{m}}}{h}\right)}. \end{array} $$
(3.9)

The estimator that we will investigate is given by

$$ \breve{r}_{n}^{(m)*}(\psi,\mathbf{t};h)=\sum\limits_{(i_{1},\ldots,i_{m})\in I(m,n)}\frac{\displaystyle \delta_{i_{1}}\cdots\delta_{i_{m}}\psi({ Z}_{i_{1}},\ldots,{ Z}_{i_{m}})}{\displaystyle (1-G_{n}^{*}(Z_{i_{1}})\cdots(1-G_{n}^{*}(Z_{i_{m}}))}\overline{\omega}_{n,K,h,\mathbf{ i}}^{(m)}(\mathbf{t}). $$
(3.10)

Theorem 3.1.

Let \(a_{n} = c((\log n/n)^{1-2/p})^{1/dm}\) for c > 0. If the class of functions \({\mathscr{F}}\) is bounded, in the sense (A.2). Fix ε ∈ (0,dvol). Further, if dvol = 0 or under Assumption 3, ε can be 0. Suppose that

$$\limsup_{n}\frac{\displaystyle\left( \log\left( \frac{1}{l_{n}}\right)\right)_{+}+\log\left( \frac{2}{\delta}\right)}{ \displaystyle nl_{n}^{d_{\text{vol}}-\varepsilon}}< \infty.$$

Then we can infer, under the above mentioned assumptions on \({\mathscr{K}}\), (A.1-2) (\({\mathscr{I}}\)), that for all c > 0 and 0 < b0 < 1, there exists a constant \(0<\mathfrak {C}_{5}<\infty \) such that

$$ \sup_{h\geq l_{n}}\sup_{\varphi\in \mathscr{F}}\sup_{\mathbf{ t}\in\mathcal{I}^{m}}\frac{\sqrt{nl_{n}^{(2d-d_{\text{vol}}+\varepsilon)m}}| \breve{r}_{n}^{(m)*}(\psi,\mathbf{t};h)-\widehat{\mathbb{E}}\breve{r}_{n}^{(m)*}(\psi,\mathbf{t};h)|}{\sqrt{|\log l_{n}|\vee \log (2/\delta)}}\leq \mathfrak{C}_{5}. $$

Proposition 3.2.

Under assumptions (A.1-3), (\({\mathscr{I}}\)) and for any kernel K(⋅) satisfying Assumptions 12 and 3. Let \(\left \{h_{n}^{\prime }\right \}_{n \geq 1}\) and \(\left \{h_{n}^{\prime \prime }\right \}_{n \geq 1}\) be two sequences of positive constants fulfilling Assumptions (H.1-H.3) with

$$ 0<h_{n}^{\prime} \leq h_{n}^{\prime \prime}<1. $$

With probability at least 1 − δ, there exists a constant \(0<\mathfrak {C}_{6}<\infty \) such that

$$ \begin{array}{@{}rcl@{}} &&\sup\limits_{h_{n}^{\prime}\leq h \leq h_{n}^{\prime\prime}}\sup\limits_{\mathbf{ x}\in\mathcal{I}^{m}}\sup\limits_{ \psi\in\mathscr{F}}\left|\breve{r}_{n}^{(m)*}(\psi,\mathbf{t};h)-\widehat{\mathbb{E}}\breve{r}_{n}^{(m)*}(\psi,\mathbf{t};h)\right| \\&&\leq \mathfrak{C}_{6}\sqrt{\frac{\log\left( 1 / l_{n}\right)_{+}+\log\left( 2 / \delta\right)}{nl_{n}^{(2d-d_{\text{vol}}+\varepsilon)m}}}. \end{array} $$
(3.11)

3.1 Examples of U-statistics

Example 3.3.

Let \(\widehat {{Y}_{1}{Y}_{2}}\) denote the oriented angle between Y1,Y2T, T is the circle of radius 1 and center 0 in \(\mathbb {R}^{2}\). Let:

Silverman (1978) has used this kernel to propose a U-process to test uniformity on the circle.

Example 3.4.

Hoeffding (1948) introduced the parameter

$$ \triangle={\int}_{-\infty}^{\infty} {\int}_{-\infty}^{\infty} D^{2}(y_{1}, y_{2}) d F(y_{1}, y_{2}), $$

where \(D(y_{1}, y_{2})=F(y_{1}, y_{2})-F(y_{1}, \infty ) F(\infty , y_{2})\) and F(⋅,⋅) is the distribution function of Y1 and Y2. The parameter △ has the property that △ = 0 if and only if Y1 and Y2 are independent. From Lee (1990), an alternative expression for △ can be developed by introducing the functions

$$ \psi\left( y_{1}, y_{2}, y_{3}\right)=\left\{\begin{array}{ll} 1 & \text { if } y_{2} \leq y_{1}<y_{3} \\ 0 & \text { if } y_{1}<y_{2}, y_{3} \text { or } y_{1} \geq y_{2}, y_{3} \\ -1 & \text { if } y_{3} \leq y_{1}<y_{2} \end{array}\right. $$

and

$$ \begin{array}{@{}rcl@{}} \varphi\left( y_{1,1}, y_{1,2} , {\ldots} , y_{5,1}, y_{5,2}\right)&=&\frac{1}{4} \psi\left( y_{1,1}, y_{1,2}, y_{1,3}\right) \psi\left( y_{1,1}, y_{1,4}, y_{1,5}\right)\\&& \psi\left( y_{1,2}, y_{2,2}, y_{3,2}\right) \psi\left( y_{1,2}, y_{4,2}, y_{5,2}\right). \end{array} $$

We have

$$ \triangle=\int {\ldots} \int \varphi\left( y_{1,1}, y_{1,2} , {\ldots} , y_{5,1}, y_{5,2}\right)d F\left( y_{1,1}, y_{1,2}\right) {\ldots} d F\left( y_{1,5}, y_{2,5}\right). $$

We have

$$ \begin{array}{@{}rcl@{}} \lefteqn{ r^{(5)}\left( \varphi,t_{1},t_{2},t_{3},t_{4},{ t}_{5}\right)}\\&=&\mathbb{E}\left( \varphi((Y_{1,1},Y_{1,2}),\ldots,(Y_{5,1},Y_{5,2}))\mid X_{1}=X_{2}= X_{3}=X_{4}=X_{5}=t\right). \end{array} $$

The corresponding U-statistics may be used to test the conditional independence.

Example 3.5.

For m = 3, let , the corresponding U-Statistic corresponds to the Hollander-Proschan test-statistic (Hollander and Proschan, 1972).

Example 3.6.

For

$$\varphi({Y}_{1},{Y}_{2})=\frac{1}{2}({Y}_{1}-{Y}_{2})^{2},$$

we obtain the variance of Y.

Example 3.7.

For

$$\varphi({Y}_{1},{Y}_{2})=\frac{1}{2}({Y}_{1}-{Y}_{2})^{2},$$

we obtain:

$$ \begin{array}{@{}rcl@{}} r^{(2)}(\varphi({Y_{1}}, {Y_{2}})\mid {t}_{1},{t}_{2})={\text{Var}}({Y}_{1}\mid{X}_{1}={t}_{1}). \end{array} $$

Example 3.8.

Let Y = (Y1,Y2) such that Y2 is a smooth curve, Y2L2([0,1]) and \(Y_{1} \in \mathbb {R}\) has a continuous distribution. For

which can be used to treat the problem of testing for conditional association between a functional variable belonging to Hilbert space and a scalar variable. More precisely, this gives the conditional Kendall’s Tau type statistics.

4 The Bandwidth Selection Criterion

Many methods have been established and developed to construct, in asymptotically optimal ways, bandwidth selection rules for nonparametric kernel estimators especially for Nadaraya-Watson regression estimator we quote among them Hall (1984), Härdle and Marron (1985). This parameter has to be selected suitably, either in the standard finite-dimensional case, or in the infinite dimensional framework for ensuring good practical performances. Following Dony and Mason (2008), the leave-one-out cross-validation procedure allows to define, for any fixed i = (i1,…,im) ∈ I(m,n):

$$ \begin{array}{@{}rcl@{}} &&{}\widehat{r}_{n,\mathbf{i}}^{(m)}(\varphi,\mathbf{t};h)\\&&{}=\frac{\displaystyle\sum\limits_{(j_{1},\ldots,j_{m})\in I(m,n)(\mathbf{i})}\varphi({ Y}_{j_{1}},\ldots,{ Y}_{j_{m}})K\left( \frac{\mathbf{ t}_{1} - \mathbf{ X}_{j_{1}}}{h}\right){\cdots} K\left( \frac{\mathbf{ t}_{m} - \mathbf{ X}_{j_{m}}}{h}\right)}{\displaystyle\sum\limits_{(i_{1},\ldots,i_{m})\in I(m,n)}K\left( \frac{ \mathbf{ x}_{1}-\mathbf{ X}_{j_{1}}}{h}\right){\cdots} K\left( \frac{\mathbf{ x}_{m}-\mathbf{ X}_{j_{m}}}{h}\right)}, \end{array} $$
(4.1)

where

$$I(m,n)(\mathbf{i}):=\left\{\mathbf{j}\in I(m,n) \text{and} \mathbf{j}\neq \mathbf{i}\right\}= I(m,n)\backslash \{\mathbf{i}\}.$$

In order to minimize the quadratic loss function, we introduce the following criterion, we have for some (known) non-negative weight function \(\mathcal {W}(\cdot ):\)

$$ CV\left( \varphi, h\right):=\frac{(n-k)!}{n!}\sum\limits_{\mathbf{i}\in I(m,n)}\left( \varphi\left( \mathbf{Y}_{\mathbf{i}}\right)- \widehat{r}_{n,\mathbf{i}}^{(m)}(\varphi,\mathbf{X}_{\mathbf{i}};h)\right)^{2}\widetilde{\mathcal{W}}\left( \mathbf{X}_{\mathbf{i}}\right), $$
(4.2)

where

$$\widetilde{\mathcal{W}}\left( \mathbf{t}\right):=\prod\limits_{i=1}^{m}\mathcal{W}(t_{i}). $$

A natural way for choosing the bandwidth is to minimize the precedent criterion, so let’s choose \(\widehat {h}_{n} \in [a_{n},b_{n}]\) minimizing among h ∈ [an,bn]:

$$\sup_{\varphi\in \mathscr{F}}CV\left( \varphi, h\right),$$

we can conclude, by Corollary 2.9, that :

$$ \sup_{\varphi\in \mathscr{F}}\sup_{\mathbf{t} \in \mathbf{ I}} \left|\widehat{r}_{n}^{(m)}(\varphi,\mathbf{t};\widehat{h}_{n})-r^{(m)}(\varphi,\mathbf{ t}) \right|\longrightarrow 0 \qquad \text{p.s.} $$

The main interest of our results is the possibility to derive the asymptotic properties of our estimate even if the bandwidth parameter is a random variable, like in the last equation. One can replace (4.2) by

$$ CV\left( \varphi, h\right):=\frac{(n-k)!}{n!}\sum\limits_{\mathbf{i}\in I(m,n)}\left( \varphi\left( \mathbf{Y}_{\mathbf{i}}\right)- \widehat{r}_{n,\mathbf{i}}^{(m)}(\varphi,\mathbf{X}_{\mathbf{i}};h)\right)^{2}\widehat{\mathcal{W}}\left( \mathbf{X}_{\mathbf{i}}, \mathbf{ t}\right), $$
(4.3)

where

$$\widehat{\mathcal{W}}\left( \mathbf{ s},\mathbf{t}\right):=\prod\limits_{i=1}^{m}\widehat{W}(s_{i}, t_{i}).$$

In practice, one takes for iI(m,n), the uniform global weights \(\widetilde {\mathcal {W}}\left (\mathbf {X}_{\mathbf {i}}\right )= 1\), and the local weights

$$ \widehat{W}(\mathbf{X}_{\mathbf{i}}, \mathbf t)=\left\{ \begin{array}{ccl} 1 & \text{if}& \|\mathbf{X}_{\mathbf i}- \mathbf t\| \leq h, \\ 0 & & \text{otherwise}. \end{array}\right. $$

For sake of brevity, we have just considered the most popular method, that is, the cross-validated selected bandwidth. This may be extended to any other bandwidth selector such as the bandwidth based on Bayesian ideas (Shang, 2014).

4.1 Discrimination

Now, we apply the results to the problem of discrimination described in Section 3 of Stute (1994b), refer to also to Stute (1994a). We will use a similar notation and setting. Let φ(⋅) be any function taking at most finitely many values, say 1,…,M. The sets

$$ A_{j}=\left\{(y_{1},\ldots,y_{m}): \varphi(y_{1},\ldots,y_{k})=j\right\},~~1\leq j\leq M $$

then yield a partition of the feature space. Predicting the value of φ(Y1,…, Ym) is tantamount to predicting the set in the partition to which (Y1,…,Ym) belongs. For any discrimination rule g, we have

$$ \mathbb{P}(g(\mathbf{ X})=\varphi(\mathbf{ Y}))\leq \sum\limits_{j=1}^{M}{\int}_{\{\mathbf{ x}:g(\mathbf{ x})=j\}}\max m^{j}(\mathbf{ x})d\mathbb{P}(\mathbf{ x}), $$

where

$$ m^{j}(\mathbf{ x})=\mathbb{P}(\varphi(\mathbf{ Y})=j\mid \mathbf{ X}=\mathbf{ x}), ~~\mathbf{ x}\in\mathbb{R}^{d}. $$

The above inequality becomes equality if

$$ g_{0}(\mathbf{ x})=\arg \max_{1\leq j\leq M}m^{j}(\mathbf{ x}). $$

g0(⋅) is called the Bayes rule, and the pertaining probability of error

$$ \mathbf{ L}^{*}=1-\mathbb{P}(g_{0}(\mathbf{ X})=\varphi(\mathbf{ Y}))=1-\mathbb{E}\left\{\max_{1\leq j\leq M}m^{j}(\mathbf{ x})\right\} $$

is called the Bayes risk. Each of the above unknown function mj’s can be consistently estimated by one of the methods discussed in the preceding sections. Let, for 1 ≤ jM,

(4.4)

Set

$$ g_{0,n}(\mathbf{ x})=\arg \max_{1\leq j\leq M}{m^{j}_{n}}(\mathbf{ x}). $$

Let us introduce

$$ \mathbf{ L}^{*}_{n}=\mathbb{P}(g_{0,n}(\mathbf{ X})\neq \varphi(\mathbf{ Y})). $$

Then, one can show that the discrimination rule g0,n(⋅) is asymptotically Bayes’ risk consistent

$$ \mathbf{ L}^{*}_{n} \rightarrow \mathbf{ L}^{*}. $$

5 Concluding Remarks and Future Works

In the present work, we have used general methods based upon empirical process techniques to prove uniform in bandwidth consistency for kernel-type estimators of the conditional U-statistics. We have considered an extended setting when the dimension can be lower than the ambient dimension. In addition, our work complements the paper Kim et al. (2018) by considering other examples of kernel estimates. Our proof relies on the work of Dony and Mason (2008). Our results extend and complement the last cited reference by establishing the convergence rate adaptive to the volume dimension. Our results are especially useful to establish uniform consistency of data-driven bandwidth kernel-type function estimators. The interest in doing so would be to extend our work to k-nearest neighbours estimators. Presently it is beyond reasonable hope to achieve this program without new technical arguments. We will not treat the uniform consistency of such estimators in the present paper, and leave this for future investigation.

6 Mathematical Developments

This section is devoted to the proof of our results. The previously defined notation continues to be used in what follows.

Our main tool to analyze \(u_{n}^{(m)}(g,h,\mathbf { t})\) will be the Hoeffding decomposition, which we recall here for the reader’s convenience.

6.1 Hoeffding Decomposition

The Hoeffding decomposition, Hoeffding (1948), states the following, which is easy to check,

$$ u_{n}^{(m)}(g,h,\mathbf{ t})=\sqrt{n}\sum\limits_{k=1}^{m} \binom{m}{k} U_{n}^{(k)}(\pi_{k}\overline{G}_{g,h, \mathbf{ t}}(\cdot,\cdot)), $$
(6.1)

where the k th Hoeffding projection for a (symmetric) function \(L:S^{m}\times S^{m} \rightarrow \mathbb {R}\) with respect to \(\mathbb {P}\) is defined for xk = (x1,…,xk) ∈ Sk and yk = (y1,…,yk) ∈ Sk as

$$\pi_{k}L(\mathbf{x}_{k}, \mathbf{y}_{k}):= (\delta_{(x_{1},y_{1})}-\mathbb{P})\times {\ldots} \times (\delta_{(x_{k},y_{k})}-\mathbb{P})\times \mathbb{P}^{m-k}(L),$$

where \(\mathbb {P}\) is any probability measure on \((S, \mathcal {S})\) and for measures \(\mathbb {Q}_{i}\) on S we have

$$ \mathbb{Q}_{1}{\cdots} \mathbb{Q}_{m}h=\int{\ldots} \int h(x_{1},\ldots,x_{m})d\mathbb{Q}_{1}(x_{1}){\cdots} d\mathbb{Q}_{m}(x_{m}). $$

Considering (Xi,Yi),i ≥ 1, i.i.d.-\(\mathbb {P}\) and assuming \(L \in L_{2}(\mathbb {P}^{m})\), this is an orthogonal decomposition and

$$\mathbb{E}[\pi_{k}L(\mathbf{X}_{k},\mathbf{Y}_{k})\mid(X_{2},Y_{2})),\ldots, (X_{k}, Y_{k})]=0, k\geq 1,$$

where we denote Xk and Yk for (X1,…,Xk) and (Y1,…,Yk), respectively. Thus the kernels πkL are canonical for \(\mathbb {P}\). Also, πk,k ≥ 1, are nested projections, that is, πkπl = πk if kl, and

$$ \mathbb{E}[(\pi_{k}L)^{2}(\mathbf{X}_{k},\mathbf{Y}_{k})] \leq \mathbb{E} [(L-\mathbb{E}L)^{2}(\mathbf{X},\mathbf{Y})] \leq \mathbb{E}L^{2}(\mathbf{X},\mathbf{Y}). $$
(6.2)

For more details, consult de la Peña and Giné (1999). The proofs of our results are largely inspired from Dony and Mason (2008), Kim et al. (2018) and Bouzebda and El-hadjali (2020).

6.2 Proof of Theorem 2.5: the Bounded Case

6.3 Linear Term

To establish the relation (2.9), we need to study the linear term (the first term) of Eq. 6.1, given by

$$ m\sqrt{n}U_{n}^{(1)}(\pi_{1}\bar{G}_{g,h,\mathbf{t}}(\cdot,\cdot))=\frac{m}{\sqrt{n}}\sum\limits_{i=1}^{n}\pi_{1}\bar{G}_{g,h,\mathbf{t}}(X_{i},Y_{i}). $$

Keeping in mind the fact that the class \({\mathscr{F}}_{q}\) is a VC-type class of functions with an envelope function F and the class \({\mathscr{K}}\) is a VC-type with envelope κ, which implies that the class of functions on \(\mathbb {R}^{qm}\times \mathbb {R}^{dm}\) given by \(\{h^{dm}G_{g,h,\mathbf { t}}(\cdot ,\cdot ):g \in {\mathscr{F}}_{q},\mathbf {t} \in \mathbb {R}^{dm}\}\) is of VC-type (via Lemma A.1 in Einmahl and Mason (2000)), as well as the class

$$ \mathcal{G}=\{h^{dm}\overline{G}_{g,h,\mathbf{ t}}(\cdot,\cdot):g \in \mathscr{F}_{q},h\geq l_{n},\mathbf{t} \in \mathbb{R}^{dm}\}, $$
(6.3)

for which we denote the VC-type characteristics by A and v, and the envelope function by

$$ \widetilde{F}(\mathbf{ y})\equiv\widetilde{F}(\mathbf{ x},\mathbf{ y}) =\kappa^{m}\sum\limits_{\sigma\in {I_{m}^{m}}}F(\mathbf{ y}_{\sigma}),~~~ \mathbf{ y}\in \mathbb{R}^{qm}. $$
(6.4)

By considering the following class of functions on \(\mathbb {R}^{dk}\times \mathbb {R}^{qk}\), for k = 1,…,m,

$$ \mathcal{G}^{(k)}=\{h^{dm}\pi_{k}\overline{G}_{g,h,\mathbf{ t}}(\cdot,\cdot):g \in \mathscr{F}_{q}, h\geq{l_{n}},\mathbf{t} \in \mathbb{R}^{dm}\}, $$
(6.5)

and following Giné and Mason (2007a) one can show that each class \(\mathcal {G}^{(k)}\) is of VC-type with characteristics A and v and envelope function

$$ \mathbf{ F}_{k}\leq 2^{k}\|\widetilde{\mathbf{ F}}\|_{\infty}. $$
(6.6)

Recall that the sample (Xi,Yi),1 ≤ in is i.i.d. and from the definition of the Hoeffding projections, for all \((x,y) \in \mathbb {R}^{d} \times \mathbb {R}^{q}\), we get

$$ \begin{array}{@{}rcl@{}} \pi_{1}\overline{G}_{g,h, \mathbf{ t}}(x,y)\!\!\!&=&\!\!\! \mathbb{E}\left( \overline{G}_{g,h, \mathbf{ t}}((x,X_{2},\ldots,X_{m}), (y,Y_{2},\ldots,Y_{m}))\right)\\&&\!\!\!-\mathbb{E}\left( \overline{G}_{g,h, \mathbf{ t}} (\mathbf{X},\mathbf{Y})\right)\\ \!\!\!&=&\!\!\!\mathbb{E}\left( \overline{G}_{g,h, \mathbf{ t}} (\mathbf{ X},\mathbf{ Y})\mid (X_{1},Y_{1})=(x,y)\right)-\mathbb{E}\left( \overline{G}_{g,h, \mathbf{t}} (\mathbf{X},\mathbf{Y})\right). \end{array} $$

Introduce the following function on \(\mathbb {R}^{d} \times \mathbb {R}^{q}\):

$$ \begin{array}{@{}rcl@{}} S_{g,h,\mathbf{ t}}:\mathbb{R}^{d}\times \mathbb{R}^{q}&\rightarrow& \mathbb{R}\\ (x,y)&\mapsto& mh^{dm}\mathbb{E}\left( \overline{G}_{g,h, \mathbf{ t}} (\mathbf{ X},\mathbf{ Y})\mid (X_{1},Y_{1})=(x,y)\right). \end{array} $$

Making use of this notation, we can write

$$ mh^{dm}\pi_{1}\overline{G}_{g,h, \mathbf{ t}}(x,y)= S_{g,h,\mathbf{ t}}(x,y)-\mathbb{E}(S_{g,h,\mathbf{ t}} (X_{1},Y_{1})). $$

For all \(g \in {\mathscr{F}}_{q}\), hln and \(\mathbf {t} \in \mathbb {R}^{dm}\), the linear term of the decomposition in Eq. 6.1 times hdm is given by

$$ \begin{array}{@{}rcl@{}} m\sqrt{n}h^{dm} U^{(1)}_{n}(\pi_{1}\overline{G}_{g,h,\mathbf{ t}}) &=&\frac{1}{\sqrt{n}}\sum\limits_{i=1}^{n}\{S_{g,h,\mathbf{ t}}(X_{i},Y_{i})-\mathbb{E}(S_{g,h,\mathbf{ t}} (X_{i},Y_{i}))\}\\ &=:&\alpha_{n}(S_{g,h,\mathbf{ t}}), \end{array} $$

where we recall that the last expression is the empirical process αn(⋅) based on the sample (X1,Y1),…,(Xn,Yn) and we set for \(\mathbf {t}\in \mathbf {I}\subset \mathbb {X}\), \(g \in {\mathscr{F}}_{q}\) and hln the class of normalised functions on \(\mathbb {R}^{dm} \times \mathbb {R}^{qm}\),

$$ \begin{array}{@{}rcl@{}} \mathcal{S}_{n}=\left\{S_{g,h,\mathbf{ t}}(\cdot, \cdot): g \in \mathscr{F}_{q}, h\geq l_{n}, \mathbf{ t}\in \mathbf{I} \subset \mathbb{X}\right\}. \end{array} $$
(6.7)

Now, we have to bound Sg,h,t. From Eq. 2.8, we get

$$ \begin{array}{@{}rcl@{}} \left|S_{g,h,\mathbf{ t}}(x, y)\right| \leq m h^{dm} M\left\|K\right\|_{\infty}^{m}. \end{array} $$

In order to bound the VC dimension of \( \mathcal {S}_{n}\), we remark that \( \mathcal {S}_{n}=m\mathcal {G}^{(1)}\) is VC-type with characteristics A and v as defined in Eq. 6.5 for k = 1. For reader convenience, we give more details. Let us give the bound for the VC dimension of \(\mathcal {S}_{n}\). Fix \(\eta <l_{n}^{-dm}\left \| S_{g,h,\mathbf {t}}\right \|_{\infty }\) and a probability measure \(\mathbb {Q}\) on \(\mathbb {R}^{d}\). Suppose

$$\left[l_{n} ,\left( \frac{\eta}{2\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}}\right)^{-1/dm}\right]$$

is covered by balls of the form

$$\left\lbrace\left( h_{i}-\frac{\sqrt{n}\eta l_{n}^{dm+1}}{3dm\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}},h_{i}+\frac{\sqrt{n}\eta l_{n}^{dm+1}}{3dm\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}}\right), 1 \leq i \leq N_{1}\right\rbrace,$$

and \(\left (\mathcal {S}_{n}, \mathbb {L}_{2}(\mathbb {Q})\right )\) is covered b

$$\left\lbrace\mathbb{B}_{\mathbb{L}_{2}(\mathbb{Q})}\left( K_{j},\frac{l_{n}^{dm}\eta}{3mM}\right)\cup \mathbb{B}_{\mathbb{L}_{2}(\mathbb{Q})}\left( g_{k},\widetilde{\varepsilon}\right),1 \leq j \leq N_{2}, 1 \leq k \leq N_{3} \right\rbrace, $$

where

$$\widetilde{\varepsilon}\leq \frac{l_{n}^{dm}\eta}{3mM\left\|K\right\|^{m}_{\infty}}.$$

For 1 ≤ iN1,1 ≤ jN2 and 1 ≤ kN3, we let

$$S_{i,j,k}=\frac{1}{\sqrt{n}h_{i}^{dm}}S_{j,k}=\frac{m}{h_{i}^{dm}}\mathbb{E}\left( g_{k}(\mathbf{Y})\widetilde{K}_{j}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mid (X_{1},Y_{1})=(x,y)\right) .$$

Also, choose \(b_{0}>\left (\frac {\sqrt {n}\eta }{2\left \|S_{g,h,\mathbf {t}}\right \|_{\infty }}\right )^{-1/dm},\) \( \mathbf {t}_{0}\in \mathbf {I}, g_{0}\in {\mathscr{F}}_{q}\) and let

$$S_{0}=\frac{1}{\sqrt{n}b_{0}}S_{g_{0}, b_{0},\mathbf{t}_{0}}.$$

We will show that

$$ \begin{array}{@{}rcl@{}} &&\left\lbrace\mathbb{B}_{\mathbb{L}_{2}(\mathbb{Q})}\left( S_{i,j,k},\eta\right): 1 \leq i \leq N_{1},1 \leq j \leq N_{2} \text{and}\right.\\&&\left. 1 \leq k \leq N_{3}\right\rbrace\cup \left\lbrace\mathbb{B}_{\mathbb{L}_{2}(\mathbb{Q})}\left( S_{0},\eta\right)\right\rbrace \text{covers} \mathcal{S}_{n}. \end{array} $$
(6.8)

For the first case when \(h\leq \left (\frac {\sqrt {n}\eta }{2\left \|S_{g,h,\mathbf {t}}\right \|_{\infty }}\right )^{-1/dm}\), find hi, Kj and gk with

$$ \begin{array}{@{}rcl@{}}h&\in&\left( h_{i}-\frac{\sqrt{n}\eta l_{n}^{dm+1}}{3dm\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}},h_{i}+\frac{\sqrt{n}\eta l_{n}^{dm+1}}{3dm\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}}\right) ,\\ g & \in& \mathbb{B}_{\mathbb{L}_{2}(\mathbb{Q})}\left( g_{k},\widetilde{\varepsilon}\right),\\ K&\in&\mathbb{B}_{\mathbb{L}_{2}(\mathbb{Q})}\left( K_{j},\frac{l_{n}^{dm}\eta}{3mM}\right). \end{array} $$

Then the distance between \(\frac {1}{\sqrt {n}h^{dm}}S_{g,h,\mathbf {t}}\) and \(\frac {1}{\sqrt {n}h_{i}^{dm}}S_{j,k}\) is upper bounded as follows

$$ \begin{array}{@{}rcl@{}} \lefteqn{ \left\|\frac{1}{\sqrt{n}h^{dm}}S_{g,h,\mathbf{ t}}-\frac{1}{\sqrt{n}h_{i}^{dm}}S_{j,k}\right\|_{\mathbb{L}_{2}(\mathbb{Q})}}\\&=&\!\!\!\left\| \frac{1}{h^{dm}}m\mathbb{E}(\overline{G}_{g,h, \mathbf{ t}} (\mathbf{ X},\mathbf{ Y})\mid (X_{1},Y_{1})=(x,y)) -\frac{1}{\sqrt{n}h_{i}^{dm}}S_{j,k}(X_{1},Y_{1})\right\|_{\mathbb{L}_{2}(\mathbb{Q})}\\ &\leq& \!\!\!\left\| \frac{m}{h^{dm}}\mathbb{E}\left( g(\mathbf{Y})\widetilde{K}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mid (X_{1},Y_{1})=(x,y)\right)\right.\\&&\left.\!\!\!-\frac{m}{h_{i}^{dm}}\mathbb{E}\left( g(\mathbf{Y})\widetilde{K}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mid (X_{1},Y_{1})=(x,y)\right)\right\|_{\mathbb{L}_{2}(\mathbb{Q})}\\ &&\!\!\!+ \left\| \frac{m}{h_{i}^{dm}}\mathbb{E}\left( g(\mathbf{Y})\widetilde{K}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mid (X_{1},Y_{1})=(x,y)\right)-\frac{1}{\sqrt{n}h_{i}^{dm}}S_{i,k}\right\|_{\mathbb{L}_{2}(\mathbb{Q})}\\ &\leq&\!\!\! \left\| \frac{1}{\sqrt{n}h^{dm}}S_{g,h,\mathbf{ t}}-\frac{1}{\sqrt{n}h_{i}^{dm}}S_{g,h,\mathbf{ t}}\right\|_{\mathbb{L}_{2}(\mathbb{Q})}\\ &&\!\!\!+\left\| \frac{1}{\sqrt{n}h_{i}^{dm}}S_{g,h,\mathbf{ t}}-\frac{m}{h_{i}^{dm}}\mathbb{E}\left( g(\mathbf{Y})\widetilde{K}_{j}\left( \frac{\mathbf{t} - \mathbf{X}}{h}\right)\mid (X_{1},Y_{1}) = (x,y)\right)\right\|_{\mathbb{L}_{2}(\mathbb{Q})}\\ &&\!\!\!+\left\|\frac{m}{h_{i}^{dm}}\mathbb{E}\left( g(\mathbf{Y})\widetilde{K}_{j}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mid (X_{1},Y_{1})=(x,y)\right)- \right.\\&&\!\!\!\left.\frac{m}{h_{i}^{dm}}\mathbb{E}\left( g_{k}(\mathbf{Y})\widetilde{K}_{j}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mid (X_{1},Y_{1})=(x,y)\right)\right\|_{\mathbb{L}_{2}(\mathbb{Q})}. \end{array} $$
(6.9)

Now the first term of Eq. 6.9 is upper bounded as

$$ \begin{array}{@{}rcl@{}} &&\left\| \frac{1}{\sqrt{n}h^{dm}}S_{g,h,\mathbf{t}}-\frac{1}{\sqrt{n}h_{i}^{dm}}S_{g,h,\mathbf{t}}\right\|_{\mathbb{L}_{2}(\mathbb{Q})}\\&=& \left|\frac{1}{h^{dm}}-\frac{1}{h_{i}^{dm}}\right|\frac{ \left\|S_{g,h,\mathbf{t}}\right\|_{\mathbb{L}_{2}(\mathbb{Q})}}{\sqrt{n}}\\ &=&|h_{i}-h| \sum\limits_{k=0}^{dm-1} h_{i}^{k-dm}h^{-1-k}\frac{\left\|S_{g,h,\mathbf{t}}\right\|_{\mathbb{L}_{2}(\mathbb{Q})}}{\sqrt{n}} \\ &\leq&|h_{i}-h| dml_{n}^{-dm-1}\frac{1}{\sqrt{n}}\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}< \frac{\eta}{3}. \end{array} $$
(6.10)

Also, the second term of Eq. 6.9 is upper bounded as

$$ \begin{array}{@{}rcl@{}} &&\left\|\frac{1}{\sqrt{n}h_{i}^{dm}}S_{g,h,\mathbf{ t}}-\frac{m}{h_{i}^{dm}}\mathbb{E}\left( g(\mathbf{Y})\widetilde{K}_{j}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mid (X_{1},Y_{1})=(x,y)\right)\right\|_{\mathbb{L}_{2}(\mathbb{Q})}\\ &&=\frac{m}{h_{i}^{dm}}\left\|\mathbb{E}\left( g(\mathbf{Y})\widetilde{K}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mid (X_{1},Y_{1})=(x,y)\right)\right.\\ &&\qquad\left.-\mathbb{E}\left( g(\mathbf{Y})\widetilde{K}_{j}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mid (X_{1},Y_{1})=(x,y)\right)\right\|_{\mathbb{L}_{2}(\mathbb{Q})}\\ &&\leq \frac{m M}{h_{i}^{dm}}\left\|\mathbb{E}\left( \widetilde{K}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mid (X_{1},Y_{1})=(x,y)\right)\right.\\&&\left.-\mathbb{E}\left( \widetilde{K}_{j}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mid (X_{1},Y_{1})=(x,y)\right)\right\|_{\mathbb{L}_{2}(\mathbb{Q})}\\ &&\leq \frac{\eta}{3}. \end{array} $$
(6.11)

The last term of Eq. 6.9 is upper bounded as

$$ \begin{array}{@{}rcl@{}} \lefteqn{\left\|\frac{m}{h_{i}^{dm}}\mathbb{E}\left( g(\mathbf{Y})\widetilde{K}_{j}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mid (X_{1},Y_{1})=(x,y)\right) \right.}\\&&\left.- \frac{m}{h_{i}^{dm}}\mathbb{E}\left( g_{k}(\mathbf{Y})\widetilde{K}_{j}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mid (X_{1},Y_{1})=(x,y)\right)\right\|_{\mathbb{L}_{2}(\mathbb{Q})} \\&\leq& ml_{n}^{-dm}\|K\|_{\infty}^{m}\left\|g-g_{k}\right\|_{\mathbb{L}_{2}(\mathbb{Q})}\\ &<& ml_{n}^{-dm}\|K\|_{\infty}^{m}\widetilde{\varepsilon}\leq\frac{\eta}{3}. \end{array} $$
(6.12)

By combining the Eqs. 6.106.11 and 6.12 to 6.9, we readily obtain the following bound

$$ \left\|\frac{1}{\sqrt{n}h^{dm}}S_{g,h,\mathbf{ t}}-\frac{1}{\sqrt{n}h_{i}^{dm}}S_{j,k}\right\|_{\mathbb{L}_{2}(\mathbb{Q})}< \eta.$$

For the second case when \(h>\left (\frac {\sqrt {n}\eta }{2\left \|S_{g,h,\mathbf {t}}\right \|_{\infty }}\right )^{-1/dm}\), we have

$$\left\| \frac{1}{\sqrt{n}h^{dm}}S_{g,h,\mathbf{t}}\right\|_{\mathbb{L}_{2}(\mathbb{Q})}\leq \left\| \frac{1}{\sqrt{n}h^{dm}}S_{g,h,\mathbf{t}}\right\|_{\infty}<\frac{\eta}{2},$$

holds, and hence

$$\left\| \frac{1}{\sqrt{n}h^{dm}}S_{g,h,\mathbf{t}}-S_{0}\right\|_{\mathbb{L}_{2}(\mathbb{Q})}\leq \left\| \frac{1}{\sqrt{n}h^{dm}}S_{g,h,\mathbf{t}}\right\|_{\mathbb{L}_{2}(\mathbb{Q})}+\left\| S_{0}\right\|_{\mathbb{L}_{2}(\mathbb{Q})}<\eta.$$

Therefore Eq. 6.8 is shown. Hence by combining Eqs. F.iiF.iii and 2.8 with Lemma 9.9, p.160 of Kosorok (2008), gives that every probability measure \(\mathbb {Q}\) on \(\mathbb {R}^{d}\) and for every \(\eta \in \left (0, \sqrt {n}h^{-dm}\left \|S_{g,h,\mathbf {t}}\right \|_{\infty }\right )\), the covering number \(\mathcal {N}(\mathcal {S}_{n},\eta )\) is upper bounded as

$$ \begin{array}{@{}rcl@{}} \lefteqn{\sup_{\mathbb{Q}}\mathcal{N}(\mathcal{S}_{n},L_{2}(\mathbb{Q}),\eta)}\\ &\leq &\mathcal{N}\left( \left[l_{n} ,\left( \frac{\eta}{2\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}}\right)^{-1/dm}\right],|\cdot|,\frac{\sqrt{n}\eta l_{n}^{dm+1}}{3dm\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}}\right)\\&&\sup_{\mathbb{Q}}\mathcal{N}\left( m\mathscr{F}_{q} ,L_{2}(\mathbb{Q}), \widetilde{\varepsilon}\right)\\ &&\sup_{\mathbb{Q}}\mathcal{N}\left( \mathscr{K}^{m} ,L_{2}(\mathbb{Q}),\frac{l_{n}^{dm}\eta}{3mM})\right)+1\\ &\leq& \frac{{3dm\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}}}{\eta l_{n}^{dm+1}}\left( \frac{2\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}}{\eta}\right)^{1/dm}\left( \frac{3A_{1}mM\|K\|^{m}_{\infty}}{\eta l_{n}^{dm}}\right)^{2\nu_{1} -1}\\&&\left( \frac{3A_{2}m M\left\| K\right\|_{\infty}}{ \eta l_{n}^{dm} }\right)^{m\nu_{2}}+1\\ &\leq &\left( \frac{3Adm\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}}{\eta l_{n}^{dm}}\right)^{2\nu_{1}+m\nu_{2}-1}\left[\left( \frac{3d^{2-2\nu_{1}-m\nu_{2}}\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}}{\eta l_{n}^{dm+1}}\right)\right. \\ &&\left( \frac{2\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}}{\eta}\right)^{1/dm}+\left.\left( \frac{3Adm\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}}{\eta l_{n}^{dm}}\right)^{-2\nu_{1}-m\nu_{2}+1}\right]\\ &\leq&\left( \frac{3Adm\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}}{\eta l_{n}^{dm}}\right)^{2\nu_{1}+m\nu_{2}-1}, \end{array} $$
(6.13)

for some finite constant \(0<A<\infty \). For p > 2, note that assumption (2.8) implies that

$$ \sup\limits_{\mathbf{x}\in\mathbf{I}}\mathbb{E}(F^{p}(\mathbf{Y})\mid \mathbf{X}=\mathbf{x})< M^{p}<\infty. $$

From Lemma 2.3 and using Jensen’s inequality, we observe that, for pk

$$ \begin{array}{@{}rcl@{}} \mathbb{E}\left[S_{g,h,\mathbf{t}}(X,Y)\right]^{k}&\leq & m^{k}n^{\frac{k}{2}}h^{kdm}\mathbb{E}\left( \overline{G}^{k}_{g,h, \mathbf{ t}} (\mathbf{ X},\mathbf{ Y})\right)\\ &\leq& m^{k}n^{\frac{k}{2}}\mathbb{E}\left( \widetilde{K}^{k}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mathbb{E}\left( g^{k}(\mathbf{Y})\mid \mathbf{X}=\mathbf{x}\right)\right), \end{array} $$

where

$$\mathbb{E}(g^{k}(\mathbf{Y}))\mid \mathbf{X}=\mathbf{x})\leq \sup\limits_{\mathbf{x}\in\mathbf{J}}\mathbb{E}(F^{k}(\mathbf{Y})\mid \mathbf{X}=\mathbf{x}).$$

Then, we have

$$ \mathbb{E}\left[S_{g,h,\mathbf{t}}(X,Y)\right]^{2}\leq M^{p}m^{2} \mathbb{E}\left[\prod\limits_{j=1}^{m} K^{2}\left( \frac{\mathbf{t}_{j}-\mathbf{X}}{h}\right)\right]. $$

By Hölder inequality and using once more Lemma 2.3 we obtain

$$ \mathbb{E}\left[\prod\limits_{j=1}^{m} K^{2}\left( \frac{\mathbf{t}_{j}-\mathbf{X}}{h}\right)\right] \leq\prod\limits_{j=1}^{m} \left( \mathbb{E}\left[K^{2p_{j}}\left( \frac{\mathbf{t}_{j}-\mathbf{X}}{h}\right)\right] \right)^{1/p_{j}}, $$

where

$$\sum\limits_{j=1}^{m}\frac{1}{p_{j}}=1, ~~0<p_{j}\leq \infty, j=1,\ldots,m.$$

Hence, from Lemma 2.3, we have

$$ \mathbb{E}\left[\prod\limits_{j=1}^{m} K^{2}\left( \frac{\mathbf{t}_{j}-\mathbf{X}}{h}\right)\right] \leq \widetilde{ C}_{\mathbb{P},K,\varepsilon}^{m} h^{m(d_{\text{vol}}-\varepsilon)}, $$
(6.14)

where

$$ \widetilde{ C}_{\mathbb{P},K,\varepsilon}=\max_{1\leq j\leq m} C_{2p_{j},\mathbb{P},K,\varepsilon}. $$

Then, we readily obtain

$$ \begin{array}{@{}rcl@{}} \mathbb{E}\left[S_{g,h,\mathbf{t}}(X,Y)\right]^{2}&\leq & \widetilde{ C}_{\mathbb{P},K,\varepsilon}^{m} M^{p}m^{2} h^{m(d_{\text{vol}}-\varepsilon)}. \end{array} $$
(6.15)

Now from Eq. 6.15, applying Theorem 7.1 to the class \(\mathcal {S}_{n}\) defined in Eq. 6.7 gives that

$$\sup\limits_{S_{g,h,\mathbf{t}}\in \mathcal{S}_{n}}\left|\frac{1}{n}\sum\limits_{i=1}^{n}S_{g,h,\mathbf{ t}}(X_{i},Y_{i})-\mathbb{E}(S_{g,h,\mathbf{ t}} (X_{1},Y_{1}))\right|$$

is upper bounded with probability at least 1 − δ as

$$ \begin{array}{@{}rcl@{}} &&{}\sup\limits_{S_{g,h,\mathbf{t}}\in \mathcal{S}_{n}}\left|\frac{1}{n}\sum\limits_{i=1}^{n}S_{g,h,\mathbf{ t}}(X_{i},Y_{i})-\mathbb{E}(S_{g,h,\mathbf{ t}} (X_{1},Y_{1}))\right| \leq \displaystyle C_{A,\|\widetilde{\mathbf{ F}}\|_{\infty},\nu,d_{\text{vol}},\widetilde{ C}_{\mathbb{P},K,\varepsilon},m,\varepsilon}\\&&{}\times\left( \frac{\displaystyle\left( \log\left( \frac{1}{h}\right)\right)_{+}}{\displaystyle n} +\sqrt{\frac{\displaystyle h^{m(d_{\text{vol}}-\varepsilon)} \left( \log\left( \frac{1}{h}\right)\right)_{+}}{\displaystyle n }}\right.\\&&\left.{}+\sqrt{\frac{\displaystyle h^{m(d_{\text{vol}}-\varepsilon)} \log\left( \frac{2}{\delta}\right)}{ \displaystyle n }}+\frac{\log\left( \frac{\displaystyle 2}{\displaystyle \delta}\right)}{\displaystyle n}\right). \end{array} $$

Using the condition

$$\limsup_{n}\frac{\displaystyle\left( \log\left( \frac{1}{l_{n}}\right)\right)_{+}+\log\left( \frac{2}{\delta}\right)}{ \displaystyle nl_{n}^{d_{\text{vol}}-\varepsilon}}< \infty,$$

we conclude that

$$ \sup_{h\geq l_{n}}\sup_{g\in \mathscr{F}_{q}}\sup_{\mathbf{ t}\in\mathbf{I}}\frac{m\sqrt{nl_{n}^{m(2d-d_{\text{vol}}+\varepsilon)}}|U^{(1)}_{n}(\pi_{1}\overline{G}_{g,h,\mathbf{ t}}))|}{\sqrt{|\log l_{n}|\vee \log (2/\delta)}}\leq C_{1}. $$

6.4 The Other Terms of Eq. 6.1

We will follow the steps of the proof of Dony and Mason (2008). Now we consider the other terms of the Hoeffding decomposition (6.1) and show that is almost surely upper bounded, that is, for each k = 2,…,m,

$$ \sup_{h\geq l_{n}}\sup_{g\in \mathscr{F}_{q}}\sup_{\mathbf{ t}\in \mathbb{R}^{dm}}\frac{\binom{m}{k}\sqrt{nl_{n}^{m(2d-d_{\text{vol}}+\varepsilon)}}|U^{(k)}_{n}(\pi_{k}\overline{G}_{g,h,\mathbf{ t}}))|}{\sqrt{|\log l_{n}|\vee \log (2/\delta)}}\leq C_{2}. $$
(6.16)

By the fact that \(nl_{n}^{dm}=c^{dm}\log n\), this will be established if we can obtain that for each k = 2,…,m,

$$ \sup_{h\geq l_{n}}\sup_{g\in \mathscr{F}_{q}}\sup_{\mathbf{ t}\in\mathbb{R}^{dm}}\frac{\sqrt{nl_{n}^{m(2d-d_{\text{vol}}+\varepsilon)}}|U^{(k)}_{n}(\pi_{k}\overline{G}_{g,h,\mathbf{ t}}))|}{(\sqrt{|\log l_{n}|\vee \log (2/\delta)})^{k}}=O\left( \frac{1}{\sqrt{{l_{n}^{dm}n^{k-1}}}}\right). $$
(6.17)

To establish the uniform in bandwidth convergence rates, we have to use a blocking argument and a decomposition of the interval [ln,b0], for b0 large enough, into smaller intervals. For this, set n = 2, ≥ 0 and consider the intervals \({\mathscr{H}}_{\ell , j}:=[h_{\ell ,j-1},h_{\ell ,j}]\) where the boundaries are given by \(h^{m}_{\ell , j}:=2^{j}l_{n_{\ell }}^{m}\). By setting

$$L(\ell)=\max\{j:h_{_{\ell},j}\leq 2b_{0}\},$$

remark that

$$ [l_{n_{\ell}}, b_{0}]\subseteq \bigcup_{\ell=1}^{L(\ell)}\mathcal{H}_{\ell,j} \text{and} {L(\ell)\sim\log\left( \frac{n_{\ell}b_{0}}{c\log{n_{\ell}}}\right)/ \log2}, $$
(6.18)

implying, in particular, that \(L(\ell )\leq 2\log n_{\ell }\). This fact will be used tacitly to conclude some crucial steps of the proofs. Next, for 1 ≤ jL(), consider the class of functions on \(\mathbb {R}^{dm}\times \mathbb {R}^{qm}\),

$$\mathcal{G}_{\ell,j}:=\{h^{dm}\bar{G}_{g,h,\mathbf{t}}(\cdot, \cdot):g\in\mathcal{F}_{q},h\in\mathcal{H}_{\ell,j}, \mathbf{t}\in \mathbb{R}^{dm}\},$$

as well the class on \(\mathbb {R}^{dm}\times \mathbb {R}^{qm}\),

$$\mathcal{G}_{\ell,j}^{(k)}:=\left\{\frac{h^{dm}\pi_{k}\bar{G}_{g,h,\mathbf{t}}(\cdot, \cdot)}{M_{k}}:g\in\mathcal{F}_{q},h\in\mathcal{H}_{\ell,j}, \mathbf{t}\in \mathbb{R}^{dm}\right\},$$

where Mk = 2kκmM. Clearly, each class \(\mathcal {G}_{\ell ,j}\) is of VC-type with the same characteristics as \(\mathcal {G}^{k}\) (and thus as \(\mathcal {G}\)) with envelope function \(M_{k}^{-1}F_{k}\), where Fk is the envelope function of \(\mathcal {G}^{k}\). Notice that from Eqs. 2.8 and 6.6,

$$M_{k}\geq \sup_{\mathbf{x},\mathbf{y}\in\mathbb{R}^{k}}\{|\pi_{k}\bar{G}_{g,h,\mathbf{t}(\mathbf{x},\mathbf{y})}|:g\in\mathcal{F}_{q},{0<{h<1}}, \mathbf{t}\in \mathbb{R}^{dm}\}$$

and hence each function in \(\mathcal {G}_{\ell ,j}^{(k)}\) is bounded by 1. Define now for n− 1nn, = 1,2,…,

$$ \mathcal{U}_{n}(j,k,\ell)=n_{\ell}^{-k/2}\sup_{H\in\mathcal{G}_{\ell,j}^{(k)}}\left|\sum\limits_{\mathbf{i}\in {I_{n}^{k}}}H(\mathbf{X}_{\mathbf{i}},\mathbf{Y}_{\mathbf{i}})\right| . $$
(6.19)

From Theorem 4 of Giné and Mason (2007b) as in Dony and Mason (2008), we get for c = 1/2,r = 2 and all x > 0 that for any ≥ 1,

$$ \mathbb{P}\left\{\max_{n_{\ell-1}<n\leq n_{\ell}}\mathcal{U}_{n}(j,k,\ell)>x\right\}\leq\frac{2}{x} \mathbb{P}\left\{\mathcal{U}_{n_{\ell}}(j,k,\ell)>x/2\right\}^{1/2}\mathbb{E}[\mathcal{U}_{n_{\ell}}^{2}(j,k,\ell)]^{1/2}. $$
(6.20)

We shall apply an exponential inequality and a moment bound for U-statistics, due to, respectively, de la Peña and Giné (1999) and Giné and Mason (2007b), on the class \(\mathcal {G}_{\ell ,j}(k)\) to bound (6.20). To use these results, we must first derive some bounds. First, it is readily checked that

$$ \begin{array}{@{}rcl@{}} \mathcal{U}_{n}(j,k,\ell)&=& n_{\ell}^{-k/2}\sup_{H\in\mathcal{G}_{\ell,j}^{(k)}}\left|\sum\limits_{\mathbf{i}\in {I_{n}^{k}}}H(\mathbf{X}_{\mathbf{i}},\mathbf{Y}_{\mathbf{i}})\right|\\ &=& n_{\ell}^{-k/2}{n\choose k}{n\choose k}^{-1}\sup_{H\in\mathcal{G}_{\ell,j}^{(k)}}\left|\sum\limits_{\mathbf{i}\in {I_{n}^{k}}}H(\mathbf{X}_{\mathbf{i}},\mathbf{Y}_{\mathbf{i}})\right|\\ &=& n_{\ell}^{-k/2}{n\choose k}\|U_{n}^{(k)}(\pi_{k}G)\|_{\mathcal{G}_{\ell,j}^{(k)}}\\ &\leq& n_{\ell}^{k/2}\|U_{n}^{(k)}(\pi_{k}G)\|_{\mathcal{G}_{\ell,j}^{(k)}}, \end{array} $$
(6.21)

for all n− 1 < nn. Second, notice that in Assumption 2, the kernel K(⋅) is assumed to be bounded by κ and, for notational convenience in the proofs, to have support in [− 1/2,1/2], so that by assumption (2.8) and Mk = 2kκmM, for \(H\in \mathcal {G}_{\ell ,j}^{(k)}\), we have by Eqs. 6.2 and 6.14,

$$ \begin{array}{@{}rcl@{}} \mathbb{E}H^{2}(\mathbf{X},\mathbf{Y})&\leq& M_{k}^{-2}h^{2dm}\mathbb{E}\bar{G}^{2}_{g,h,\mathbf{t}}(\mathbf{X},\mathbf{Y})\\ &=& M_{k}^{-2}\mathbb{E}\left[g^{2}(\mathbf{Y})\Tilde{K}^{2}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\right]\\ &\leq& \widetilde{ C}_{k,\mathbb{P},K,\varepsilon}^{m}4^{-k}\kappa^{-2m}h^{m(d_{\text{vol}}-\varepsilon)}. \end{array} $$

For \(D_{m,k}= \widetilde { C}_{k,\mathbb {P},K,\varepsilon }^{m}4^{-k}\kappa ^{-2m}\), this gives us that

$$ \sup_{H\in\mathcal{G}_{\ell,j}^{(k)}}\mathbb{E}H^{2}(\mathbf{X},\mathbf{Y})\leq D_{m,k}h_{\ell,j}^{m(d_{\text{vol}}-\varepsilon)} =:\sigma^{2}_{\ell,j}. $$
(6.22)

Since πkπkL = πkL for all k ≤ 1, we can now apply Corollary 1 of Giné and Mason (2007b) to the class \(\mathcal {G}_{\ell ,j}^{(k)}\) with σ2 as in Eq. 6.22 and easily obtain that for some constant Ak,

$$ \begin{array}{@{}rcl@{}} \mathbb{E}\mathcal{U}^{2}_{n_{\ell}}(j,k,\ell)&\leq& n_{\ell}^{k}\|U_{n_{\ell}}^{(k)}(\pi_{k}G)\|_{\mathcal{G}_{\ell,j}^{(k)}}^{2} \leq {2^{k}A_{k}h_{\ell,j}^{m(d_{\text{vol}}-\varepsilon)}|\log(h_{\ell,j})|^{k}} \\&\leq& {2^{k}A_{k}h_{\ell,j}^{m(d_{\text{vol}}-\varepsilon)}|\log(l_{n})|^{k}}. \end{array} $$
(6.23)

To control the probability term in Eq. 6.20, we shall apply an exponential inequality to the same class \(\mathcal {G}_{\ell ,j}^{(k)}\) recall that each \(H \in \mathcal {G}_{\ell ,j}^{(k)}\) is bounded by 1. Setting

$$ y^{*}=C_{1,k}(|\log l_{n}|\vee \log\log (2/\delta))^{k/2}=:C_{1,k}\lambda_{n,k}, $$
(6.24)

where \(C_{1,k}< \infty \), Theorem 5.3.14 of de la Peña and Giné (1999) gives us constants C2,k,C3,k and C4,k such that for j = 1,…,L() and for any ρ > 1,

$$ \begin{array}{@{}rcl@{}} \mathbb{P}\left\{\mathcal{U}_{n_{\ell}}(j,k,\ell)>\rho^{k/2}y^{*}\right\}&\leq& C_{2,k}\exp\{-C_{3.k}\rho y^{*2/k}\}\\ &\leq& \exp\{-C_{4.k}\rho \log\log(2/\delta)\}, \end{array} $$
(6.25)

plugging the bounds Eqs. 6.23 and 6.25 into Eq. 6.20, we then get for some C5,k > 0, and ρ ≥ 2 and large enough,

$$ \begin{array}{@{}rcl@{}} &&\mathbb{P}\left\{\max_{n_{\ell-1}<n \leq n_{\ell}}\mathcal{U}_{n}(j,k,\ell)>2\rho^{k/2}y^{*}\right\}\\&\leq& \frac{(\log 2/\delta)^{-\rho C_{4,k}/2}\sqrt{2^{k}A_{k}h_{\ell,j}^{m(d_{\text{vol}}-\varepsilon)}|\log(l_{n})|^{k}}}{C_{1,k}\sqrt{\rho^{k}(|\log l_{n}|\vee \log (2/\delta))^{k}}} \\ &\leq& \sqrt{h_{\ell,j}^{m(d_{\text{vol}}-\varepsilon)}}\log(2/\delta)^{-\rho C_{5,k}}. \end{array} $$
(6.26)

Finally, note also that

$$ \begin{array}{@{}rcl@{}} n_{\ell}^{k/2}\|U_{n}^{(k)}(\pi_{k}G)\|_{\mathcal{G}_{\ell,j}(k)} &=&n_{\ell}^{k} {n\choose k}^{-1}n_{\ell}^{-k/2}\sup_{H\in\mathcal{G}_{\ell,j}^{(k)}}\left|\sum\limits_{\mathbf{i}\in {I_{n}^{k}}}H(\mathbf{X}_{\mathbf{i}},\mathbf{Y}_{\mathbf{i}})\right|\\&=&n_{\ell}^{k} {n\choose k}^{-1} \mathcal{U}_{n}(j,k,\ell) \\&\leq& k! \prod\limits_{j=0}^{k-1}\left( \frac{n_{\ell}}{n-j}\right) \mathcal{U}_{n}(j,k,\ell) \\&\leq& C_{k}M_{k} \mathcal{U}_{n}(j,k,\ell), \end{array} $$
(6.27)

for some Ck > 0. Therefore, by Eq. 6.18, for each k = 2,…,m and large enough,

$$ \begin{array}{@{}rcl@{}} &&\max_{n_{\ell-1}<n\leq n_{\ell}} A_{n,k}\\&:=&\max_{n_{\ell-1}<n\leq n_{\ell}}\sup_{{ l_{n}\leq h\leq b_{0}}}\sup_{g\in \mathscr{F}_{q}}\sup_{\mathbf{ t}\in\mathbf{I}}\frac{\sqrt{nl_{n}^{m(2d-d_{\text{vol}}+\varepsilon)}}|U^{(k)}_{n}(\pi_{k}\overline{G}_{g,h,\mathbf{ t}}))|}{\sqrt{(|\log l_{n}|\vee \log (2/\delta))^{k}}}\\ &\leq&\max_{n_{\ell-1}<n\leq n_{\ell}}\max_{1\leq j\leq L(\ell)}\sup_{h\in \mathcal{H}_{\ell,j}}\sup_{g\in \mathscr{F}_{q}}\sup_{\mathbf{ t}\in\mathbf{I}}\frac{\sqrt{n_{\ell}l_{n}^{m(2d-d_{\text{vol}}+\varepsilon)}}|U^{(k)}_{n}(\pi_{k}\overline{G}_{g,h,\mathbf{ t}}))|}{\sqrt{(|\log l_{n}|\vee \log (2/\delta))^{k}}} \\ &\leq& \frac{C_{k}M_{k}}{\sqrt{n_{\ell}^{k-1}}}\max_{n_{\ell-1}<n\leq n_{\ell}}\max_{1\leq j\leq L(\ell)}\frac{\mathcal{U}_{n_{\ell}}(j,k,\ell)}{\lambda_{n,k}}\\ &\leq& \frac{C_{k}M_{k}}{\sqrt{{l_{n_{\ell}}^{dm}n_{\ell}^{k-1}}}}\max_{n_{\ell-1}<n\leq n_{\ell}}\max_{1\leq j\leq L(\ell)}\frac{\mathcal{U}_{n_{\ell}}(j,k,\ell)}{\lambda_{n,k}}, \end{array} $$

where λn,k was defined as in Eq. 6.24. Now, recall that \(L(\ell )\leq 2\log (n_{\ell })\). Then Eq. 6.26 applied with ρ ≥ (2 + γ)/C5,k, γ > 0 and in combination with the above inequality and the obvious bound

$$\sqrt{{l_{n}^{dm}n^{k-1}}} A_{n,k }\leq \sqrt{{l_{n_{\ell}}^{dm}n_{\ell}^{k-1}}} A_{n,k},$$

valid for all n− 1 < nn, implies for C6,k ≥ 2ρk/2CkMkC1,k and for the choice δ = 2+ 1that for k = 2,…,m,

$$ \begin{array}{@{}rcl@{}} \mathbb{P}\left\{\max_{n_{\ell-1}<n \leq n_{\ell}}\sqrt{{l_{n}^{dm}n^{k-1}}} A_{n,k }>C_{6,k}\right\}&\leq& {\sum\limits_{j=1}^{L(\ell)}\sqrt{h_{\ell,j}^{m(d_{\text{vol}}-\varepsilon)}}\log(2/\delta)^{-\rho C_{5,k}}}\\ &\leq&{ L(\ell)\log(2/\delta)^{-\rho C_{5,k}}}\\ &\leq& { 2 (\ell \log2)^{-(1+\gamma)}}. \end{array} $$
(6.28)

This proves, via Borel-Cantelli, that Eq. 6.17 holds, which obviously implies Eq. 6.16 and hence complete the proof of Theorem 2.5.

6.5 Proof of Theorem 2.6: the Unbounded Case

To prove this theorem, we will need to truncate the conditional U-statistic un(g,h,t). If condition (2.8) is not satisfied, we consider bandwidths lying in the smaller interval \({\mathscr{H}}_{n_{\ell }}^{\prime }=[a_{n_{\ell }}^{\prime }, b_{0}]\), that may be divided into subintervals as follows

$$ \mathcal{H}^{\prime}_{\ell, j}:=[h^{\prime}_{\ell,j-1},h^{\prime}_{\ell,j}] $$
(6.29)

where the boundaries are given by \(h^{'dm}_{\ell , j}:=2^{j}a_{n_{\ell }}^{'dm}\). Note that it is straightforward to show that Eq. 6.18 remains valid if we replace h,j by \(h^{\prime }_{\ell ,j}\). In particular, we still have \(L(\ell )\leq 2\log {n_{\ell }}\), where L() is now defined as

$$L(\ell):=\max\{j:h^{\prime}_{\ell,j}\leq 2b_{0}\}.$$

Recall that n = 2, ≥ 0, and set for, ≥ 1,

$$ \gamma_{\ell}=n_{\ell}/\log n_{\ell}. $$
(6.30)

For an arbitrary 𝜖 > 0, we truncate each function \(\mathcal {G}\), either the envelope function as follows

where \(\tilde {F}\) is the symmetric envelope function of the class \(\mathcal {G}\) as defined in Eq. 6.4. un(g,h,t) can then also be decomposed for any n− 1 < nn since, from Eq. 2.11,

$$ \begin{array}{@{}rcl@{}} u_{n}(g,h,\mathbf{t})&=& \sqrt{n}\{U_{n}^{(m)}(\bar{G}^{(\ell)}_{g,h,\mathbf{t}})-\mathbb{E}U_{n}^{(m)}(\bar{G}^{(\ell)}_{g,h,\mathbf{t}} )\}+ \sqrt{n}\{U_{n}^{(m)}(\tilde{G}^{(\ell)}_{g,h,\mathbf{t}})\\&&-\mathbb{E}U_{n}^{(m)}(\tilde{G}^{(\ell)}_{g,h,\mathbf{t}} )\}\\ &=:& u_{n}^{(\ell)}(g,h,\mathbf{t})+\tilde{u}_{n}^{(\ell)}(g,h,\mathbf{t}). \end{array} $$

The term \(u_{n}^{(\ell )}(g,h,\mathbf {t})\) will be called the truncated part and \(\tilde {u}_{n}^{(\ell )}(g,h,\mathbf {t})\) the remainder part. To prove Theorem (2.6), we shall apply the Hoeffding decomposition to the truncated part and analyze each of the terms separately, while the remaining part can be treated directly using simple arguments based on standard inequalities. Note, for further use, that

$$ a_{n_{\ell}}^{'dm}=c^{dm}\gamma_{\ell}^{2/p-1}, ~~~~~~~\ell\geq 1. $$
(6.31)

6.5.1 Truncated Part

First, note that by Hoeffding decomposition (6.1), we need to consider the terms of

$$\sum\limits_{k=1}^{m} \binom{m}{k} U_{n}^{(k)}(\pi_{k}\overline{G}^{(\ell)}_{g,h, \mathbf{ t}}).$$

We shall start with the linear term in this decomposition. Following the same reasoning as in the previous section, we can show that \(\pi _{1}\overline {G}^{(\ell )}_{g,h, \mathbf {t}}\) is a centered conditional expectation and that the first term of Eq. 6.1 can be written as an empirical process based on the sample (X1,Y1),…,(Xn,Yn) and indexed by the class of functions

$$ \mathcal{S}_{\ell}^{\prime}:=\left\{S^{(\ell)}_{g,h,\mathbf{ t}}(\cdot, \cdot): g \in \mathscr{F}_{q}, h\in \mathcal{H}_{n_{\ell}}^{\prime}, \mathbf{ t}\in \mathbf{I} \subset \mathbb{X}\right\},$$

where \(h\in {\mathscr{H}}_{n_{\ell }}^{\prime }\) was defined at the beginning of this section and where

$$S^{(\ell)}_{g,h,\mathbf{ t}}(x,y)=mh^{dm}\mathbb{E}\left( \overline{G}^{(\ell)}_{g,h, \mathbf{ t}} (\mathbf{ X},\mathbf{ Y})\mid (X_{1},Y_{1})=(x,y)\right).$$

To show that \( \mathcal {S}^{\prime }_{\ell }\) is a VC-class, introduce the class of functions of \((\mathbf {x}, \mathbf {y})\in \mathbb {R}^{dm}\times \mathbb {R}^{qm}\),

Since both \(\mathcal {G}^{\prime }\) as defined below

$$ \mathcal{G}^{\prime}=\{h^{dm}\overline{G}_{g,h,\mathbf{ t}}(\cdot,\cdot):g \in \mathscr{F}_{q},{ 0<h<1},\mathbf{t} \in \mathbb{R}^{dm}\}, $$
(6.32)

and the class of functions of \(\mathbf {y} \in \mathbb {R}^{qm}\) given by are of VC-type (and note that \(\mathcal {I}\) has a bounded envelope function), we can apply Lemma A.1 in Einmahl and Mason (2000) to conclude that \(\mathcal {C}\) is also of VC-type. Therefore, so is the class of functions \(m \mathcal {C}^{(1)}\) on \(\mathbb {R}^{d+q}\), where \(\mathcal {C}^{(1)}\) consists of the π1 -projections of the functions in the class \(\mathcal {C}\). Thus, we see that \( \mathcal {S}^{\prime }_{\ell } \subset m \mathcal {C}^{(1)}\) and hence \( \mathcal {S}_{\ell }^{\prime }\) is of VC -type with the same characteristics as \(m \mathcal {C}^{(1)}\). Now, to find an envelope function for \( \mathcal {S}_{\ell }^{\prime }\), set \(\mathbf {t}_{j}:=\left (t_{1}, \ldots , t_{j-1}, t_{j+1}, \ldots , t_{m}\right ) \in \mathbb {R}^{d(m-1)}\) and Zj(u) := (Z1,…,Zj− 1,u,Zj+ 1,…, \(Z_{m}) \in \mathbb {R}^{qm}\) for \(u \in \mathbb {R}^{q}\) and \(\mathbf {Z} \in \mathbb {R}^{qm}\). We can then rewrite the function \(S_{g, h, \mathbf {t}}^{(\ell )}(x, y) \in \mathcal {S}_{\ell }^{\prime }\) as

where \(\mathbf {X}^{*}=\left (X_{2}, \ldots , X_{m}\right ) \in \mathbb {R}^{d(m-1)}\) and where (with a little abuse of notation here) the product kernel in (K.iii) is now defined for d(m − 1) -dimensional vectors, that is, \(\widetilde {K}(\mathbf {u})={\prod }_{i=1}^{m-1} K\left (u_{i}\right )\), \(\mathbf {u} \in \mathbb {R}^{d(m-1)} \). Hence, we can bound \(S_{g, h, \mathbf {t}}^{(\ell )}(x, y) \in \mathcal {S}_{\ell }^{\prime }\) simply as

$$ \begin{array}{@{}rcl@{}} \left|S_{g, h, \mathbf{t}}^{(\ell)}(x, y)\right|&\leq& K\left( \frac{t_{1}-x}{h}\right) \mathbb{E}\left[F\left( y, Y_{2}, \ldots, Y_{m}\right) \tilde{K}\left( \frac{\mathbf{t}_{1}-\mathbf{X}^{*}}{h}\right)\right] \\ &&+K\left( \frac{t_{2}-x}{h}\right) \mathbb{E}\left[F\left( Y_{2}, y, Y_{3}, \ldots, Y_{m}\right) \widetilde{K}\left( \frac{\mathbf{t}_{2}-\mathbf{X}^{*}}{h}\right) \right] \\ &&+\cdots+K\left( \frac{t_{m}-x}{h}\right) \mathbb{E}\left[F\left( Y_{2}, \ldots, Y_{m}, y\right) \widetilde{K}\left( \frac{\mathbf{t}_{m}-\mathbf{X}^{*}}{h}\right) \right]\\ &=:& G_{m}(x, y). \end{array} $$

We shall now apply the moment bound in Theorem 7.3 to the subclasses

$$ \mathcal{S}_{\ell, j}^{\prime}:=\left\{S_{g, h, \mathbf{t}}^{(\ell)}(\cdot, \cdot): g \in \mathcal{F}_{q}, h \in \mathcal{H}_{\ell, j}^{\prime}, \mathbf{t} \in \mathbb{R}^{dm}\right\}, \quad 1 \leq j \leq L(\ell), $$

where \({\mathscr{H}}_{\ell , j}^{\prime }\) was defined in Eq. 6.29. Since \( \mathcal {S}_{\ell , j}^{\prime } \subset \mathcal {S}_{\ell }^{\prime }\) for j = 1,…,L(), all of these subclasses are of VC-type, with the same envelope function and characteristics as the class \(m \mathcal {C}^{(1)}\) (which is independent of ), verifying (ii) in Theorem 7.3. For (i), recall that although all of the terms of the envelope function Gm(x,y) are different, their expectations are the same. Therefore, writing Y for \(\left (Y_{2}, \ldots , Y_{m}\right )\) and applying Minkowski’s inequality followed by Jensen’s inequality, we obtain from assumption (2.10) the following upper bound for the second moment of the envelope function:

$$ \begin{array}{@{}rcl@{}} \mathbb{E} {G_{m}^{2}}(X, Y)\!\!\!\!&=&\!\!\!\! \kappa^{2 m} \mathbb{E}_{Y}\left\{\mathbb{E}_{\mathbf{Y}^{*}}\left[F\left( Y, Y_{2}, \ldots, Y_{m}\right)\right]\right.\\ &&\left.\!\!\!\!\!+\mathbb{E}_{\mathbf{Y}^{*}}\left[F\left( Y_{2}, Y, Y_{3}, \ldots, Y_{m}\right)\right]+\cdots+\mathbb{E}_{\mathbf{Y}^{*}}\left[F\left( Y_{2}, \ldots, Y_{m}, Y\right)\right]\right\}^{2} \\ &\leq &\!\!\!\! m^{2} \kappa^{2 m} \mathbb{E} F^{2}\left( Y_{1}, \ldots, Y_{m}\right) \\ &\leq &\!\!\!\! m^{2} \kappa^{2 m} \mu_{p}^{2 / p}. \end{array} $$

Note, further, that by the symmetry of \(\widetilde {F}\),

so that Jensen’s inequality, the change of variable u = (tx)/h and the assumption in Eq. 2.10 give the following upper bound for the second moment of any function in \( \mathcal {S}_{\ell }^{\prime }\):

(6.33)

Therefore, with \(\beta \equiv m \mu _{p}^{1 / p}\left (\kappa ^{m} \vee C_{2,\mathbb {P},K,\varepsilon }^{1/2}\right )\), our previous calculations give us that

$$ \mathbb{E} {G_{m}^{2}}(X, Y) \leq \beta^{2} \quad \text { and } \quad \sup_{S \in \mathcal{S}_{\ell, j}^{\prime}} \mathbb{E} S^{2}(X, Y) \leq \beta^{2} h_{\ell, j}^{\prime m(d_{\text{vol}}-\varepsilon) }=: \sigma_{\ell, j}^{2}, $$

verifying condition (iii) as well. Finally, recall from Eq. 6.4 that since \(\mathcal {G}\) has envelope function \(\widetilde {F}(\mathbf {y})\), it holds for all \(x, y \in \mathbb {R}^{d+q}\) that

so that by taking ε > 0 small enough, Theorem 7.3 is now applicable. Thus, for an absolute constant \(A_{1}<\infty \), we have

$$ \begin{array}{@{}rcl@{}} \mathbb{E}\left\|\sum\limits_{i=1}^{n_{\ell}} \epsilon_{i} S\left( X_{i}, Y_{i}\right)\right\|_{\mathcal{S}_{\ell, j}^{\prime}} & \leq &A_{1} \sqrt{n_{\ell} h_{\ell, j}^{\prime {m}(d_{\text{vol}}-\varepsilon)}\left|\log h_{\ell, j}^{\prime}\right|} \\ & \leq& A_{1} \sqrt{n_{\ell} h_{\ell, j}^{\prime {m}(d_{\text{vol}}-\varepsilon)}\left( \left|\log h_{\ell, j}^{\prime}\right| \vee \log \log n_{\ell}\right)} \\ &=:& A_{1} \lambda_{j}^{\prime}(\ell), \end{array} $$
(6.34)

where \(\epsilon _{1}, \ldots , \epsilon _{n_{\ell }}\) are independent Rademacher variables, independent of \(\left (X_{i}, Y_{i}\right ), 1 \leq i \leq n_{\ell }\). Consequently, applying the exponential inequality of Talagrand (1994) to the class \( \mathcal {S}_{\ell , j}^{\prime }\) (see Theorem 7.5 in the Appendix) with \(M=m \varepsilon \gamma _{\ell }^{1 / p}, \sigma _{\mathcal {S}_{l, i}^{\prime }}^{2}=\beta ^{2} h_{\ell , j}^{\prime m(d_{\text {vol}}-\varepsilon )}\) and the moment bound in Eq. 6.34, we get, for an absolute constant \(A_{2}<\infty \) and all t > 0, that

$$ \begin{array}{@{}rcl@{}} \lefteqn{\mathbb{P}\left\{\max_{n_{\ell-1}<n \leq n_{\ell}}\left\|\sqrt{n} \alpha_{n}\right\|_{\mathcal{S}_{\ell, j}^{\prime}} \geq C_{1}\left( A_{1} \lambda_{j}^{\prime}(\ell)+t\right)\right\}} \\ &\leq& 2\left[\exp \left( -\frac{A_{2} t^{2}}{n_{\ell} \beta^{2} h_{\ell, j}^{\prime m(d_{\text{vol}}-\varepsilon)}}\right)+\exp \left( -\frac{A_{2} t}{m \varepsilon \gamma_{\ell}^{1 / p}}\right)\right]. \end{array} $$
(6.35)

Regarding the application of this inequality with \(t=\rho \lambda _{j}^{\prime }(\ell ), \rho >1\), note that it clearly follows from Eq. 6.31 and the definitions of \(h_{\ell , j}^{\prime }\) as in Eq. 6.29, γ as in Eq. 6.31 and \(\lambda _{j}^{\prime }(\ell )\) as in Eq. 6.34 that for all j ≥ 0,

$$ \begin{array}{@{}rcl@{}} \frac{\lambda_{j}^{\prime 2}(\ell)}{n_{\ell} h_{\ell, j}^{\prime m(d_{\text{vol}}-\varepsilon)}}=\left|\log h_{\ell, j}^{\prime}\right| \vee \log \log n_{\ell} \geq \log \log n_{\ell}, \\ \frac{\lambda_{j}^{\prime 2}(\ell)}{\gamma_{\ell}^{2 / p}}=2^{j} c^{dm} h_{\ell, j}^{\prime m(d_{\text{vol}}-d-\varepsilon)} \log n_{\ell}\left( \left|\log h_{\ell, j}^{\prime}\right| \vee \log \log n_{\ell}\right) \geq c^{dm}\left( \log \log n_{\ell}\right)^{2}. \end{array} $$

Consequently, Eq. 6.35, when applied with \(t=\rho \lambda _{j}^{\prime }(\ell )\) and any ρ > 1 with large enough, yields, for suitable constants \(A_{2}^{\prime }, A_{2}^{\prime \prime }\) and A3, the inequality

$$ \begin{array}{@{}rcl@{}} \lefteqn{\mathbb{P}\left\{\max_{n_{\ell-1}<n \leq n_{\ell}}\left\|\sqrt{n} \alpha_{n}\right\|_{\mathcal{S}_{\ell, j}^{\prime}} \geq C_{1}\left( A_{1}+\rho\right) \lambda_{j}^{\prime}(\ell)\right\}} \\ &\leq& 2\left[\exp \left( -A_{2}^{\prime} \rho^{2} \log \log n_{\ell}\right)+\exp \left( -A_{2}^{\prime \prime} \rho \log \log n_{\ell}\right)\right] \\ &\leq& 4\left( \log n_{\ell}\right)^{-A_{3} \rho}. \end{array} $$
(6.36)

Keeping in mind that \(m h^{dm} \sqrt {n} U_{n}^{(1)}\left (\pi _{1} \bar {G}_{g, h, \mathrm {t}}^{(\ell )}\right )\) is the empirical process αn \(\left (S_{g, h, \mathrm {t}}^{(\ell )}\right )\) indexed by the class \( \mathcal {S}_{\ell }^{\prime }\) and recalling Eq. 6.18, since dvolε < d, we obtain, for ≥ 1, that

$$ \begin{array}{@{}rcl@{}} \max_{n_{\ell-1}<n \leq n_{\ell}} A_{n, \ell}^{\prime} &:=&\max_{n_{\ell-1}<n \leq n_{\ell}} \sup_{{a_{n}^{\prime} \leq h \leq b_{0}}} \sup_{g \in \mathcal{F}} \sup_{\mathbf{t} \in \mathbb{R}^{dm}}\\&& \frac{m \sqrt{n h^{m(2d-d_{\text{vol}}+\varepsilon)}}\left|U_{n}^{(1)}\left( \pi_{1} \bar{G}_{g, h, \mathbf{t}}^{(\ell)}\right)\right|}{\sqrt{|\log h| \vee \log \log n}} \\ & \leq& \max_{n_{\ell-1}<n \leq n_{\ell}} \max_{\leq j \leq L(\ell)} \sup_{h \in \mathcal{H}_{\ell, j}^{\prime}} \sup_{g \in \mathcal{F}} \sup_{\mathbf{t} \in \mathbb{R}^{dm}} \\&&\frac{2 \sqrt{2^{\frac{d_{\text{vol}}-\varepsilon}{d}}}\left|\sqrt{n} \alpha_{n}\left( S_{g, h, \mathbf{t}}^{(\ell)}\right)\right|}{\sqrt{n_{\ell} h_{\ell, j}^{\prime m(d_{\text{vol}}-\varepsilon)}\left( \left|\log h_{\ell, j}^{\prime}\right| \vee \log \log n_{\ell}\right)}} \\ & \leq& \max_{n_{\ell-1}<n \leq n_{\ell}} \max_{\leq j \leq L(\ell)} \sup_{H \in \mathcal{S}_{\ell, j}^{\prime}} \frac{3\left|\sqrt{n} \alpha_{n}(H)\right|}{\lambda_{j}^{\prime}(\ell)}. \end{array} $$

Consequently, recalling once again that \(L(\ell ) \leq 2 \log n_{\ell }\), we can infer from Eq. 6.36 that for some constant \(C_{5}(\rho ) \geq 3 C_{1}\left (A_{1}+\rho \right )\),

$$ \begin{array}{@{}rcl@{}} &&\mathbb{P}\left\{\max_{n_{\ell-1}<n \leq n_{\ell}} A_{n, \ell}^{\prime}>C_{5}(\rho)\right\} \\& \leq & \sum\limits_{j=1}^{L(\ell)} \mathbb{P}\left\{\max_{n_{\ell-1}<n \leq n_{\ell}}\left\|\sqrt{n} \alpha_{n}\right\|_{\mathcal{S}_{\ell, j}^{\prime}}>C_{1}\left( A_{1}+\rho\right) \lambda_{j}^{\prime}(\ell)\right\} \\ & \leq& 8\left( \log n_{\ell}\right)^{1-A_{3} \rho}. \end{array} $$

The Borel-Cantelli lemma, when combined with this inequality for ρ ≥ (2 + δ)/A3,δ > 0 and with the choice n = 2, establishes, for some \(C^{\prime }<\infty \) and with probability 1, that

$$ \underset{\ell \rightarrow \infty}{\limsup} \max_{n_{\ell-1}<n \leq n_{\ell}} \sup_{{a_{n}^{\prime} \leq h \leq b_{0}}} \sup_{g \in \mathcal{F}} \sup_{\mathbf{t} \in \mathbb{R}^{dm}} \frac{m \sqrt{n h^{m(2d-d_{\text{vol}}+\varepsilon)}}\left|U_{n}^{(1)}\!\left( \pi_{1} \bar{G}_{g, h, \mathbf{t}}^{(\ell)}\right)\right|}{\sqrt{|\log h| \vee \log \log n}} \!\leq\! C^{\prime}, $$
(6.37)

this achieves the control of the first term in Eq. 6.1. We now treat the nonlinear terms. The purpose is to prove that, for k = 2,…,m and with probability 1, all of the other terms of Eq. 6.1 are asymptotically bounded or go to zero at the proper rate, that is

$$ \max_{n_{\ell-1}<n \leq n_{\ell}} \sup_{{a_{n}^{\prime} \leq h \leq b_{0}}} \sup_{g \in \mathcal{F}} \sup_{\mathbf{t} \in \mathbb{R}^{dm}} \frac{\sqrt{n h^{m(2d-d_{\text{vol}}+\varepsilon)}}\left|U_{n}^{(k)}\!\left( \pi_{k} \bar{G}_{g, h, \mathbf{t}}^{(\ell)}\right)\right|}{\sqrt{|\log h| \vee \log \log n}} = O\left( \gamma_{\ell}^{1-k / 2}\right). $$
(6.38)

By following the same reasoning as in the bounded case, we define some classes of functions on \(\mathbb {R}^{dm} \times \mathbb {R}^{dm}\) and \(\mathbb {R}^{dk} \times \mathbb {R}^{dk}\),

$$ \begin{array}{@{}rcl@{}} \mathcal{G}_{\ell, j}^{\prime}:=\left\{h^{m} \bar{G}_{g, h, \mathbf{t}}^{(\ell)}(\cdot, \cdot): g \in \mathcal{F}, h \in \mathcal{H}_{\ell, j}^{\prime}, {\mathbf{t} \in \mathbb{R}^{dm}}\right\} ,\\ \mathcal{G}_{\ell, j}^{\prime(k)}:=\left\{h^{m}\left( \pi_{k} \bar{G}_{g, h, \mathbf{t}}^{(\ell)}\right)(\cdot, \cdot) /\left( 2^{k} \varepsilon \gamma_{\ell}^{1 / p}\right): g \in \mathcal{F}, h \in \mathcal{H}_{\ell, j}^{\prime},{ \mathbf{t} \in \mathbb{R}^{dm}}\right\} . \end{array} $$

It is then easily verified that these classes are of VC-type with characteristics that are independent of and with envelope functions \(\widetilde {F}\) and \(\left (2^{k} \varepsilon \gamma _{\ell }^{1 / p}\right )^{-1} F_{k}\), respectively. The function \(\widetilde {F}\) is defined as in Eq. 6.4 and Fk is determined just as in the proof of Theorem 1 of Giné and Mason (2007a). Note that just as in Eqs. 6.19 and 6.21, by setting

$$ \mathcal{U}_{n}^{\prime}(j, k, \ell):=\sup_{H \in \mathcal{G}_{\ell, j}^{\prime(k)}}\left|\frac{1}{n_{\ell}^{k / 2}} \sum\limits_{\mathbf{i} \in {I_{n}^{k}}} H\left( \mathbf{X}_{\mathbf{i}}, \mathbf{Y}_{\mathbf{i}}\right)\right|, \quad n_{\ell-1}<n \leq n_{\ell}, $$

we see that for all k = 2,…,m and n− 1 < nn,

$$ \mathcal{U}_{n}^{\prime}(j, k, \ell) \leq n_{\ell}^{k / 2}\left\|U_{n}^{(k)}\left( \pi_{k} G\right)\right\|_{\mathcal{G}_{\varepsilon, j}^{(k)}}. $$

Consequently, applying Theorem 7.2 with c = 1/2 and r = 2 gives us precisely (6.20) with \(\mathcal {U}_{n}(j, k, \ell )\) and \(\mathcal {U}_{n_{\ell }}(j, k, \ell )\) replaced by \(\mathcal {U}_{n}^{\prime }(j, k, \ell )\) and \(\mathcal {U}_{n_{\ell }}^{\prime }(j, k, \ell )\), respectively. Therefore, the same methodology as in the bounded case will be applied. Note also that, as held for all the functions in \(\mathcal {G}_{\ell , j}^{(k)}\), the functions in \(\mathcal {G}_{\ell , j}^{\prime (k)}\) are bounded by 1 and have second moments that can be bounded by \(h^{m(d_{\text {vol}}-\varepsilon )} D_{m, k}\) for a suitable Dm,k (by arguing as in Eqs. 6.33 and 6.22). Hence, the expression in Eq. 6.22 is also satisfied for functions in \(\mathcal {G}_{\ell , j}^{\prime (k)}\), that is,

$$ \sup_{H \in \mathcal{G}_{\ell, j}^{\prime(k)}} \mathbb{E} H^{2}(\mathbf{X}, \mathbf{Y}) \leq D_{m, k} h_{\ell, j}^{\prime m(d_{\text{vol}}-\varepsilon)}=: \sigma_{\ell, j}^{\prime 2}. $$

Thus, all the conditions for Theorems 7.4 and 7.6 are satisfied so that, after some obvious identifications and modifications, the second part of the proof of Theorem 2.5 (and Eq. 6.26 in particular) gives us, for some C7,k > 0, all j = 1,…,L() and any ρ > 2,

$$ \mathbb{P}\left\{\max_{n_{\ell-1}<n \leq n_{\ell}} \mathcal{U}_{n}^{\prime}(j, k, \ell)>2 \rho^{k / 2} y^{\prime *}\right\} \leq \sqrt{h_{\ell, j}^{\prime m(d_{\text{vol}}-\varepsilon)}}\left( \log n_{\ell}\right)^{-\rho C_{7, k}}, $$
(6.39)

with \(y^{\prime *}=C_{1, k}^{\prime } \lambda _{j, k}^{\prime }(\ell )\) for some \(C_{1, k}^{\prime }>0\) and where \(\lambda _{j, k}^{\prime }(\ell )\) is defined as in Eq. 6.24 with h,j replaced by \(h_{\ell , j}^{\prime }\), that is,

$$ \lambda_{j, k}^{\prime}(\ell)=\left( \left|\log h_{\ell, j}^{\prime}\right| \vee \log \log n_{\ell}\right)^{k / 2}. $$

Now, to finish the proof of Eq. 6.38, note that, similarly to Eq. 6.27, for some Ck > 0, for n− 1 < nn

$$ n_{\ell}^{k / 2}\left\|U_{n}^{(k)}\left( \pi_{k} G\right)\right\|_{\mathcal{G}_{\ell, j}^{\prime}} \leq 2^{k} C_{k} \varepsilon \gamma_{\ell}^{1 / p} \mathcal{U}_{n}^{\prime}(j, k, \ell) . $$

This gives that for some ck > 0,

$$ \begin{array}{@{}rcl@{}} &&\max_{n_{\ell-1}<n \leq n_{\ell}} A_{n, \ell, k}^{\prime} \\&:=&\max_{n_{\ell-1}<n \leq n_{\ell}} \sup_{{a_{n}^{\prime} \leq h \leq b_{0}}} \sup_{g \in \mathcal{F}} \sup_{\operatorname{t} \in \mathbb{R}^{dm}} \frac{\sqrt{n h^{m(2d-d_{\text{vol}}+\varepsilon)}}\left|U_{n}^{(k)}\left( \pi_{k} \bar{G}_{g, h, \mathbf{t}}^{(\ell)}\right)\right|}{\sqrt{(|\log h| \vee \log \log n)^{k}}} \\ & \leq& \frac{2^{k} c_{k} \varepsilon \gamma_{\ell}^{1 / p}}{\sqrt{n_{\ell}^{k-1}}} \max_{n_{\ell-1}<n \leq n_{\ell}} \max_{\leq j \leq L(\ell)} \frac{\mathcal{U}_{n}^{\prime}(j, k, \ell)}{\lambda_{j, k}^{\prime}(\ell)} \\ &\leq& \frac{2^{k} c_{k} \varepsilon \gamma_{\ell}^{1 / p}}{\sqrt{a_{n_{\ell}}^{\prime md} n_{\ell}^{k-1}}} \max_{n_{\ell-1}<n \leq n_{\ell}} \max_{\leq j \leq L(\ell)} \frac{\mathcal{U}_{n}^{\prime}(j, k, \ell)}{\lambda_{j, k}^{\prime}(\ell)}. \end{array} $$

From Eq. 6.31, we now see that

$$\gamma_{\ell}^{2 / p} / a_{n \ell}^{\prime md} n_{\ell}^{k-1}=c^{-m} n_{\ell}^{2-k} / \log n_{\ell} .$$

By the fact that \(\log n / n^{2-k}\) is monotone increasing in n ≥ 2 whenever k ≥ 2, so that for some constant C8,k > 0, we infer that

$$ \begin{array}{@{}rcl@{}} \lefteqn{\mathbb{P}\left\{\max\limits_{n_{\ell-1}<n \leq n_{\ell}} \sqrt{\frac{\log n}{n^{2-k}}} A_{n, \ell, k}^{\prime}>C_{8, k}\right\}} \\ &\leq& \mathbb{P}\left\{\max\limits_{n_{\ell-1}<n \leq n_{\ell}} \max\limits_{1 \leq j \leq L(\ell)}\frac{U_{n}^{\prime}(j, k, \ell)}{\lambda_{j, k}^{\prime}(\ell)}>\frac{C_{8, k}}{2^{k} c_{k} \varepsilon \gamma_{\ell}^{1 / p}} \sqrt{\frac{n_{\ell}^{2-k} a_{n \ell}^{\prime m} n_{\ell}^{k-1}}{\log n_{\ell}}}\right\}\\ &\leq& \sum\limits_{j=1}^{L(\ell)} \mathbb{P}\left\{\max_{n_{\ell-1}<n \leq n_{\ell}} \mathcal{U}_{n}^{\prime}(j, k, \ell)>\frac{C_{8, k} c^{m / 2}}{2^{k} c_{k} \varepsilon} \lambda_{j, k}^{\prime}(\ell)\right\}. \end{array} $$

Therefore, by choosing \(C_{8, k}>2^{k+1} c^{-m / 2} \varepsilon c_{k} C_{1, k}^{\prime }\left ((2+\delta ) / C_{7, k}\right )^{k / 2}\) and noting that by definition \(L(\ell ) \leq 2 \log n_{\ell }\) and \(h_{\ell , j}^{\prime }<2\) for all j = 1,…,L(), we can infer from Eq. 6.39 with

$$ \rho=(2+\delta) / C_{7, k}, $$

that

$$ \begin{array}{@{}rcl@{}} &\mathbb{P} &\left\{\max_{n_{\ell-1}<n \leq n_{\ell}} \sqrt{\frac{\log n}{n^{2-k}}} A_{n, \ell, k}^{\prime}>C_{8, k}\right\} \\ & \leq& \sum\limits_{j=1}^{L(\ell)} \mathbb{P}\left\{\max_{n_{\ell-1}<n \leq n_{\ell}} \mathcal{U}_{n}^{\prime}(j, k, \ell)>2\left( \frac{2+\delta}{C_{7, k}}\right)^{k / 2} C_{1, k}^{\prime} \lambda_{j, k}^{\prime}(\ell)\right\} \\ &=&\sum\limits_{j=1}^{L(\ell)} \mathbb{P}\left\{\max_{n_{\ell-1}<n \leq n_{\ell}} \mathcal{U}_{n}^{\prime}(j, k, \ell)>2\left( \frac{2+\delta}{C_{7, k}}\right)^{k / 2} y^{\prime *}\right\} \\ & \leq& L(\ell) \sqrt{h_{\ell, j}^{\prime m}}\left( \log n_{\ell}\right)^{-\rho C_{7, k}} \\ & \leq& 2 \sqrt{2^{m}}\left( \log n_{\ell}\right)^{-(1+\delta)}. \end{array} $$

This immediately implies, via Borel-Cantelli, that for all k = 2,…,m and ≥ 1,

$$ \begin{array}{@{}rcl@{}} &&\max_{n_{\ell-1}<n \leq n_{\ell}} \sup_{a_{n}^{\prime} \leq h \leq b_{0}} \sup_{g \in \mathcal{F}} \sup_{\mathbf{t} \in \mathbb{R}^{dm}} \frac{\sqrt{n h^{m(2d-d_{\text{vol}}+\varepsilon)}}\left|U_{n}^{(k)}\left( \pi_{k} \bar{G}_{g, h, \mathbf{t}}^{(\ell)}\right)\right|}{\sqrt{(|\log h| \vee \log \log n)^{k}}}\\&&=O\left( \sqrt{\frac{n_{\ell}^{2-k}}{\log n_{\ell}}}\right) \end{array} $$

a.s., which obviously implies Eq. 6.38. Finally, recalling the Hoeffding decomposition (6.1), this implies, together with Eq. 6.37, that for some \(C^{\prime \prime }>0\) with probability 1,

$$ \begin{array}{@{}rcl@{}} &&\limsup_{\ell \rightarrow \infty} \max_{n_{\ell-1}<n \leq n_{\ell}} \sup_{a_{n}^{\prime} \leq h \leq b_{0}} \sup_{g \in \mathcal{F}} \sup_{\mathbf{t} \in \mathbb{R}^{dm}} \\&&\frac{\sqrt{n h^{m(2d-d_{\text{vol}}+\varepsilon)}}\left|U_{n}^{(m)}\left( \bar{G}_{g, h, \mathbf{t}}^{(\ell)}\right)-\mathbb{E} U_{n}^{(m)}\left( \bar{G}_{g, h, \mathbf{t}}^{(\ell)}\right)\right|}{\sqrt{|\log h| \vee \log \log n}} \leq C^{\prime \prime} .\end{array} $$
(6.40)

6.6 Remainder Part

Consider now the remainder process \(\widetilde {u}_{n}^{(\ell )}(g, h, \mathbf {t})\) based on the unbounded (symmetric) U -kernel given by

where we defined γ as in Eq. 6.30. We shall show that this U-process is asymptotically negligible at the rate given in Theorem 2.6. More precisely, we shall prove that as \(\ell \rightarrow \infty \),

$$ \begin{array}{@{}rcl@{}} &&\max_{n_{\ell-1}<n\leq n_{\ell}, \alpha^{\prime}<h \leq b_{0}} \sup_{g \in \mathcal{F}} \sup_{\mathfrak{t} \in \mathbb{R}^{dm}} \frac{\sqrt{n h^{m(2d-d_{\text{vol}}+\varepsilon)}}\left|U_{n}^{(m)}\!\left( \widetilde{G}_{g, h, \mathbf{t}}^{(\ell)}\right) - \mathbb{E} U_{n}^{(m)}\!\left( \widetilde{G}_{g, h, \mathbf{t}}^{(\ell)}\right)\right|}{\sqrt{|\log h| \vee \log \log n}}\\&&=o(1)~~a.s. \end{array} $$
(6.41)

Recall that for all \(g \in \mathcal {F}, h \in \left [a_{n}^{\prime }, b_{0}\right ]\) and \(\mathbf {t}, \mathbf {x} \in \mathbb {R}^{m}, \widetilde {F}(\mathbf {y}) \geq h^{m}\left |\bar {G}_{g, h, \mathbf {t}}(\mathbf {x}, \mathbf {y})\right |\), so from the symmetry of \(\widetilde {F}\), it holds that

where is a U -statistic based on the positive and symmetric kernel . Recalling that \(a_{n}^{\prime md}=c^{md}\) \((\log n / n)^{1-2 / p}\), we obtain easily that for all g\(\mathcal {F}, h \in \left [a_{n}^{\prime }, b_{0}\right ], \mathbf {t} \in \mathbb {R}^{m}\) and some C > 0

Arguing in the same way, since a U -statistic is an unbiased estimator of its kernel, we get that, uniformly in \(g \in \mathcal {F}, h \in \left [a_{n}^{\prime }, b_{0}\right ]\) and \(\mathbf {t} \in \mathbb {R}^{m}\),

(6.42)

From Eq. 6.42, we see that as \(\ell \rightarrow \infty \),

$$ \max_{1<n \leq n_{\ell}} \sup_{a_{n}^{\prime} \leq h \leq b_{0}} \sup_{g \in \mathcal{F}} \sup_{\mathbf{t} \in \mathbb{R}^{dm}} \frac{\sqrt{n h^{m(2d-d_{\text{vol}}+\varepsilon)}}\left|\mathbb{E} U_{n}^{(m)}\left( \widetilde{G}_{g, h, \mathbf{t}}^{(\ell)}\right)\right|}{\sqrt{|\log h| \vee \log \log n}}=o(1). $$
(6.43)

Thus, to finish the proof of Eq. 6.41, it suffices to show that

(6.44)

First, note that from Chebyshev’s inequality and a well-known inequality for the variance of a U-statistic (see Theorem 5.2 of Hoeffding (1948)), we get, for any δ > 0,

(6.45)

Next, in order to establish the finite convergence of the series of the above probabilities, we split the indicator function into two distinct parts determined by whether \(\widetilde {F}(\mathbf {Y})>\) \(n_{\ell }^{1 / p}\) or \(\varepsilon \gamma _{\ell }^{1 / p}<\widetilde {F}(\mathbf {Y}) \leq n_{\ell }^{1 / p}\), and consider the corresponding second moments in Eq. 6.45 separately. In the first case, note that, from Eqs. 2.10 and 6.4, \(\mathbb {E} \widetilde {F}^{p}(\mathbf {Y}) \leq \mu _{p} \kappa ^{p m}(m !)^{p}\) and observe that since p > 2 and n = 2,

To handle the second case, we shall need the following fact from Einmahl and Mason (2000).

Fact 6.1.

Let \(\left (c_{n}\right )_{n \geq 1}\) be a sequence of positive constants such that cn/n1/s \(\nearrow \infty \) for some s > 0 and let Z be a random variable satisfying

$$\sum\limits_{n=1}^{\infty} \mathbb{P}\left\{|Z|>c_{n}\right\}<\infty .$$

We then have, for any q > s,

Notice that for any p < r ≤ 2p,

Now, set \(Z=\widetilde {F}(\mathbf {Y}), c_{n}=n^{1 / p}\) and q = r in Fact 6.1 and note that \(c_{n} / n^{1 / s} \nearrow \infty \) for any s such that q = r > s > p. Since q = r > s, we can conclude from Fact 6.1 that this last bound is finite. Finally, note that the bound leading to Eq. 6.45 implies that

Consequently, the above results, together with Eq. 6.45, imply via Borel-Cantelli and the arbitrary choice of δ > 0 that Eq. 6.44 holds, which, when combined with Eqs. 6.43 and 6.45, completes the proof of Eq. 6.41. This also completes the proof of Theorem 2.6 since we have already established the result in Eq. 6.40.

6.7 Proof of Theorem 2.7

Theorem 2.7 is essentially a consequence of Theorem 7.7, details are similar to the proof of Dony and Mason (2008) and therefore omitted.

6.8 Proof of Corollary 2.8

We now turn to the proof of Corollary 2.8. We observe the following standard inequalities

$$ \begin{array}{@{}rcl@{}} | \widehat{r}_{n}^{(m)}(\varphi,\mathbf{t};h)-\widehat{\mathbb{E}} \widehat{r}_{n}^{(m)}(\varphi,\mathbf{t};h)| \!\!\!\!&=&\!\!\!\!\left|\frac{U_{n}(\varphi, h, \mathbf{t})}{U_{n}(1, h, \mathbf{t})}-\frac{\mathbb{E} U_{n}(\varphi, h, \mathbf{t})}{\mathbb{E} U_{n}(1, h, \mathbf{t})}\right| \\ &\leq &\!\!\!\! \frac{\left|U_{n}(\varphi, h, \mathbf{t})-\mathbb{E} U_{n}(\varphi, h, \mathbf{t})\right|}{\left|U_{n}(1, h, \mathbf{t})\right|} \\ &&\!\!\!\!+\frac{\left|\mathbb{E} U_{n}(\varphi, h, \mathbf{t})\right| \cdot\left|U_{n}(1, h, \mathbf{t})-\mathbb{E} U_{n}(1, h, \mathbf{t})\right|}{\left|U_{n}(1, h, \mathbf{t})\right| \cdot\left|\mathbb{E} U_{n}(1, h, \mathbf{t})\right|} \\ &=: &\!\!\!\!(\mathrm{I})+(\mathbb{I}) . \end{array} $$

We can infer from Theorem 7.7 that

$$ \sup_{a_{n} \leq h<b_{n}} \sup_{\mathbf{t} \in \mathbf{I}^{m}}\left|\mathbb{E} U_{n}(1, h, \mathbf{t})-\widetilde{f}(\mathbf{t})\right|\rightarrow 0. $$
(6.46)

Then, from Theorem 2.5, Eq. 6.46 and fX(⋅) bounded away from zero on J, we get, for some xi1,xi2 > 0 and c large enough in \(a_{n}=c(\log n / n)^{1 /d m}\),

$$ \liminf_{n \rightarrow \infty} \sup_{a_{n} \leq h<b_{n}} \sup_{\mathbf{t} \in \mathbf{I}^{m}}\left|U_{n}(1, h, \mathbf{t})\right|=\mathbf{x}i_{1}>0 \quad \text { a.s., } $$

and, for n large enough,

$$ \sup_{a_{n} \leq h<b_{n}} \sup_{\mathbf{t} \in \mathbf{I}^{m}}\left|\mathbb{E} U_{n}(1, h, \mathbf{t})\right|=\mathbf{x}i_{2}>0. $$

Further, for \(a_{n}^{\prime }\) equalling either n or an, we readily obtain from the assumptions (2.8) or (2.10) on the envelope function that

$$ \sup_{a_{n}^{\prime} \leq h<b_{n}}\sup_{\varphi\in \mathscr{F}_{q}}\sup_{\mathbf{t} \in \mathbf{I}^{m}}\left| \mathbb{E} U_{n}(\varphi, h, \mathbf{t})\right| =O(1) \text {. } $$

Hence, we can now use Theorem 2.5 to handle (\(\mathbb {I}\)), while for (I), depending on whether the class \({\mathscr{F}}_{q}\) satisfies Eq. 2.8 or 2.10, we apply Theorem 2.5 or Theorem 2.6, respectively. Taking everything together, we conclude that for c large enough and some \(\mathfrak {C}_{3}>0\), with probability 1,

$$ \begin{array}{@{}rcl@{}} &&\!\!\!\!{\limsup_{n \rightarrow \infty} \sup_{a_{n}^{\prime} \leq h<b_{n}} \sup_{\varphi \in \mathscr{F}_{q}} \sup_{\mathbf{t} \in \mathbf{I}^{m}} \frac{\sqrt{n l_{n}^{m(2d-d_{\text{vol}}+\varepsilon)}}\left| \widehat{r}_{n}^{(m)}(\varphi,\mathbf{t};h)-\widehat{\mathbb{E}} \widehat{r}_{n}^{(m)}(\varphi,\mathbf{t};h)\right|}{\sqrt{{|\log h|\vee (\log (2/\delta)\vee \log \log n)}}}}\\ &\leq&\!\!\!\!\limsup_{n \rightarrow \infty} \sup_{a_{n}^{\prime} \leq h<b_{n}} \sup_{\varphi \in \mathscr{F}_{q}} \sup_{\mathbf{t} \in \mathbf{I}^{m}} \frac{\sqrt{nh^{{m(2d-d_{\text{vol}}+\varepsilon)}}}(\mathrm{I})}{\sqrt{|\log h |\vee \log\log n}}\\ &&+\limsup_{n \rightarrow \infty} \sup_{a_{n}^{\prime} \leq h<b_{n}} \sup_{\varphi \in \mathscr{F}_{q}} \sup_{\mathbf{t} \in \mathbf{I}^{m}}\frac{\sqrt{nl_{n}^{m(2d-d_{\text{vol}}+\varepsilon)}}(\mathbb{I})}{\sqrt{|\log l_{n}|\vee \log (2/\delta)}} \\ &\leq&\!\!\!\! \mathfrak{C}_{3}. \end{array} $$

We readily obtain the assertion of the theorem by choosing appropriately δ. \(\Box \)

6.9 Proof of Proposition 3.2

Recalling the Definition 3.1 of

it is obvious that Φψ is uniformly bounded, in \((y_{1},\ldots ,y_{m}, c_{1},\ldots ,c_{m})\in \mathbb {R}^{2m}\) and \(\psi \in {\mathscr{F}}\), since \({\mathscr{F}}\) is uniformly bounded, ψ(t) = 0 for all t > τ and G(τ) < 1. This property, when combined with the VC property of \({\mathscr{F}}\), ensures that the class of function

$$\mathscr{F}_{\Phi} : =\{\mathbf{ {\Phi}}_{\psi} : \psi \in \mathscr{F}\}$$

verifies Eqs. F.iiF.iii. Similarly, it can be shown that \({\mathscr{F}}_{\Phi }\) is a pointwise measurable class of functions (F.i). Moreover, by (A.3) and Eq. 3.2, the class

$$\mathcal{M}_{\mathbf{\Phi}}:=\{m_{\mathbf{\Phi}_{\psi}} f_{\mathbf{X}} :\psi \in \mathscr{F}\}$$

is almost surely relatively compact concerning the sup- norm topology on \(\mathcal I_{\alpha }\). So we can apply Theorem 2.8 with Y = (Y,C) and Ψ = Φψ. The result of Proposition 3.2 is straightforward. \(\Box \)

Lemma 6.2.

Under assumptions of Theorem 3.1, we have with probability one,

$$ \sup\limits_{h \geq l_{n}}\sup\limits_{\mathbf{ t}\in\mathcal{I}^{m}}\sup\limits_{ \psi\in\mathscr{F}}\left|\breve{r}_{n}^{(m)*}(\psi,\mathbf{t};h) - \breve{r}_{n}^{(m)}(\psi,\mathbf{t};h)\right| = o\left( \sqrt{\frac{\log(1/h)}{nh^{d}}}\right) \text{as} n\rightarrow \infty. $$
(6.47)

6.10 Proof of Lemma 6.2

Recall the following useful lemma.

Lemma 6.3.

Let ai, i = 1,…,k, bi i = 1,…,k be real number

$$ \prod\limits_{i=1}^{k}a_{i}-\prod\limits_{i=1}^{k}b_{i}=\sum\limits_{i=1}^{k}(a_{i}-b_{i})\prod\limits_{j=1}^{i-1}b_{j}\prod\limits_{h=1+i}^{k}a_{h}. $$

An application of the preceding lemma gives

$$ \begin{array}{@{}rcl@{}} \lefteqn{\sup\limits_{h \geq l_{n}}\sup\limits_{\mathbf{ t}\in\mathcal{I}^{m}}\sup\limits_{\psi\in\mathscr{F}}\left|\breve{r}_{n}^{(m)*}(\psi,\mathbf{t};h)-\breve{r}_{n}^{(m)}(\psi,\mathbf{t};h)\right|}\\&\!\!\!\!=&\!\!\!\!\sup\limits_{h \geq l_{n}}\sup\limits_{\mathbf{ t}\in\mathcal{I}^{m}}\sup\limits_{\psi\in\mathscr{F}}\left| \sum\limits_{(i_{1},\ldots,i_{m})\in I(m,n)}\displaystyle \delta_{i_{1}}\cdots\delta_{i_{m}}\psi({ Z}_{i_{1}},\ldots,{ Z}_{i_{m}}) \overline{\omega}_{n,K,h,\mathbf{ i}}^{(m)}(\mathbf{t})\right. \\&&\!\!\!\!\left. \left( \frac{1} {\displaystyle (1-G_{n}^{*}(Z_{i_{1}})\cdots(1-G_{n}^{*}(Z_{i_{m}}))}-\frac{1} {\displaystyle (1-G^{*}(Z_{i_{1}})\cdots(1-G^{*}(Z_{i_{m}}))}\right) \vphantom{\sum\limits_{(i_{1},\ldots,i_{m})\in I(m,n)}}\right|\\ &\!\!\!\!\leq&\!\!\!\!\sup\limits_{h \geq l_{n}}\sup\limits_{\mathbf{ t}\in\mathcal{I}^{m}}\sum\limits_{i=1}^{n}|\overline{\omega}_{n,K,h,\mathbf{ i}}^{(m)}(\mathbf{t})|\sup\limits_{\mathbf{t}\in [0,\tau)^{m}}\sup\limits_{ \psi\in\mathscr{F}_{q}}|\psi(\mathbf{t})|\\&&\!\!\!\!\times \sum\limits_{\eta=1}^{m}\left( \frac{1}{1-G_{n}^{*}(Z_{i_{\eta}})}-\frac{1}{1-G^{*}(Z_{i_{\eta}})}\right)\prod\limits_{j=1}^{\eta-1}\left( \frac{1}{1-G_{n}^{*}(Z_{i_{j}})}\right)\\&&\prod\limits_{k=\eta+1}^{m}\left( \frac{1}{1-G(Z_{i_{k}})}\right)\\&\!\!\!\!\leq&\!\!\!\! \sup\limits_{h \geq l_{n}}\sup\limits_{\mathbf{ t}\in\mathcal{I}^{m}}\sum\limits_{i=1}^{n}|\overline{\omega}_{n,K,h,\mathbf{ i}}^{(m)}(\mathbf{t})|\sup\limits_{\mathbf{t}\in [0,\tau)^{m}}\sup\limits_{ \psi\in\mathscr{F}_{q}}|\psi(\mathbf{t})|\\&& \!\!\!\!\times \sum\limits_{\eta=1}^{m} \frac{\sup\limits_{x\leq\tau}|G_{n}^{*} (x)-G (x)|}{[1-G_{n}^{*} (\tau)][1-G (\tau)]} \prod\limits_{j=1}^{\eta-1}\left( \frac{1}{1-G_{n}^{*}(Z_{i_{j}})}\right)\prod\limits_{k=\eta+1}^{m}\left( \frac{1}{1-G(Z_{i_{k}})}\right)\\&\!\!\!\!\leq&\!\!\!\! C\sup\limits_{h \geq l_{n}}\sup\limits_{\mathbf{ t}\in\mathcal{I}^{m}}\sum\limits_{i=1}^{n}|\overline{\omega}_{n,K,h,\mathbf{ i}}^{(m)}(\mathbf{t})|\sup\limits_{\mathbf{t}\in [0,\tau)^{m}}\sup\limits_{ \psi\in\mathscr{F}}|\psi(\mathbf{t})|\frac{\sup\limits_{x\leq\tau}|G_{n}^{*} (x)-G (x)|}{[1-G_{n}^{*} (\tau)][1-G (\tau)]}.\\ \end{array} $$
(6.48)

Since

$$\sup\limits_{\psi\in\mathscr{F}}|\psi(\mathbf{t})|<\infty,$$

the kernel K(⋅) is uniformly bounded and

$$\tau<T_{H}=T_{F}\leq T_{G},$$

the law of iterated logarithm for \(G_{n}^{*}(\cdot )\) established in Földes and Rejtő (1981) ensures that

$$\sup\limits_{t\leq\tau}|G_{n}^{*}-G(t)|=O\left( \sqrt{\frac{\log\log n}{n}}\right) \text{almost surely} \text{as} n\rightarrow \infty.$$

By combining the results of Proposition 3.2 and Lemma 6.2, the result of the Theorem 3.1 is immediate by noting that, under the conditions (H.1-3), we have, for n sufficiently large,

$$\sup\limits_{t\leq\tau}|G_{n}^{*}-G(t)|=o\left( \sqrt{\frac{\log(1/h) }{nh^{d}}}\right) \text{almost surely} \text{as} n\rightarrow \infty.$$

Hence the proof is complete.