Uniform in Bandwidth Consistency of Conditional U-statistics Adaptive to Intrinsic Dimension in Presence of Censored Data

Bouzebda, Salim; El-hadjali, Thouria; Ferfache, Anouar Abdeldjaoued

doi:10.1007/s13171-022-00301-7

Uniform in Bandwidth Consistency of Conditional U-statistics Adaptive to Intrinsic Dimension in Presence of Censored Data

Published: 21 December 2022

Volume 85, pages 1548–1606, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Sankhya A Aims and scope Submit manuscript

Uniform in Bandwidth Consistency of Conditional U-statistics Adaptive to Intrinsic Dimension in Presence of Censored Data

Download PDF

Salim Bouzebda ORCID: orcid.org/0000-0001-7801-4945¹,
Thouria El-hadjali² &
Anouar Abdeldjaoued Ferfache¹

138 Accesses
9 Citations
Explore all metrics

Abstract

U-statistics represent a fundamental class of statistics from modelling quantities of interest defined by multi-subject responses. U-statistics generalize the empirical mean of a random variable X to sums over every m-tuple of distinct observations of Stute (Ann. Probab. 19, 812–825 1991) introduced a class of so-called conditional U-statistics, which may be viewed as a generalization of the Nadaraya-Watson estimates of a regression function. Stute proved their strong pointwise consistency to:

$$r(\mathbf{ t}):=\mathbb{E}[\varphi(Y_{1},\ldots,Y_{m})|(X_{1},\ldots,X_{m})=\mathbf{t}], ~~\text{for}~~\mathbf{ t}\in \mathbb{R}^{dm}.$$

We apply the methods developed in Dony and Mason (Bernoulli 14(4), 1108–1133 2008) to establish uniform in t and in bandwidth consistency (i.e., h, h ∈ [a_n,b_n] where $0<a_{n}<b_{n}\rightarrow 0$ at some specific rate) to r(t) of the estimator proposed by Stute when Y, under weaker conditions on the kernel than previously used in the literature. We extend existing uniform bounds on the kernel conditional U-statistic estimator and make it adaptive to the intrinsic dimension of the underlying distribution of X which the so-called intrinsic dimension will characterize. In addition, uniform consistency is also established over $\varphi \in {\mathscr{F}}$ for a suitably restricted class ${\mathscr{F}}$, in both cases bounded and unbounded, satisfying some moment conditions. Our theorems allow data-driven local bandwidths for these statistics. Moreover, in the same context, we show the uniform bandwidth consistency for the nonparametric inverse probability of censoring weighted (I.P.C.W.) estimators of the regression function under random censorship, which is of its own interest. The theoretical uniform consistency results established in this paper are (or will be) key tools for many further developments in regression analysis.

Estimation for semiparametric varying coefficient models with different smoothing variables under random right censoring

Article 26 December 2017

Uniform consistency and uniform in number of neighbors consistency for nonparametric regression estimates and conditional U-statistics involving functional data

Article 28 May 2022

A note on estimating the conditional expectation under censoring and association: strong uniform consistency

Article 13 July 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Motivated by numerous applications, the theory of U-statistics (introduced in the seminal work by Hoeffding (1948)) and U-processes have received considerable attention in the past decades. U-processes are useful for solving complex statistical problems. Examples are density estimation, nonparametric regression tests and goodness-of-fit tests. More precisely, U-processes appear in statistics in many instances, e.g., as the components of higher order terms in von Mises expansions. In particular, U-statistics play a role in the analysis of estimators (including function estimators) with varying degrees of smoothness. For example, Stute (1993) applies a.s. uniform bounds for $\mathbb {P}$-canonical U-processes to the analysis of the product limit estimator for truncated data. Arcones and Wang (2006) present two new tests for normality based on U-processes. Making use of the results of Giné and Mason (2007a), Giné and Mason (2007b), Schick et al. (2011) introduced new tests for normality which used as test statistics weighted L₁-distances between the standard normal density and local U-statistics based on standardized observations. Joly and Lugosi (2016) discussed the estimation of the mean of multivariate functions in case of possibly heavy-tailed distributions and introduced the median-of-means, which is based on U-statistics. U-processes are important tools for a broad range of statistical applications such as testing for qualitative features of functions in nonparametric statistics (Lee et al. 2009; Ghosal et al. 2000; Abrevaya and Jiang, 2005) and establishing limiting distributions of M-estimators (see, e.g., Arcones and Giné 1993; Sherman 1993; Sherman 1994; de la Peña and Giné1999). Halmos (1946), von Mises (1947) and Hoeffding (1948), who provided (amongst others) the first asymptotic results for the case that the underlying random variables are independent and identically distributed. Under weak dependency assumptions asymptotic results are for instance shown in Borovkova et al. (2001), in Denker and Keller (1983) or more recently in Leucht (2012) and in more general setting in Leucht and Neumann (2013), Bouzebda and Nemouchi (2019), Bouzebda and Nemouchi (2022). For excellent resource of references on the U-statistics and U-processes, the interested reader may refer to Borovskikh (1996), Koroljuk and Borovskich (1994), Lee (1990), Arcones and Giné (1995), Arcones et al. (1994) and Arcones and Giné (1993). A profound insight into the theory of U-processes is given by de la Peña and Giné (1999). In this paper, we consider the so-called conditional U-statistics introduced by Stute (1991). These statistics may be viewed as generalizations of the Nadaraya-Watson (Nadaraja, 1964; Watson, 1964) estimates of a regression function.

To be more precise, let us consider a sequence of independent and identically distributed random vectors $\{(\mathbf { X}_{i},\mathbf { Y}_{i}), i\in \mathbb {N}^{*}\}$ with $\mathbf { X}_{i} \in \mathbb {R}^{d}$ and $\mathbf { Y}_{i} \in \mathbb {R}^{q}$, d,q ≥ 1. Let $\mathbb {P}_{\mathbf {X}}=\mathbb {P}$ be an unknown marginal Borel probability distribution in $\mathbb {R}^{d}$. Let $ \varphi : \mathbb {R}^{qm}\rightarrow \mathbb {R}$ be a measurable function. In this paper, we are primarily concerned with the estimation of the conditional expectation, or regression function of φ(Y₁,…,Y_m) evaluated at (X₁,…,X_m) = t, given by

$$ \begin{array}{@{}rcl@{}} r^{(m)}(\varphi,\mathbf{ t})&\! = &\mathbb{E}\left( \varphi(\mathbf{ Y}_{1},\ldots,\mathbf{ Y}_{m})\!\mid\! (\mathbf{ X}_{1},\ldots,\mathbf{ X}_{m}) = \mathbf{ t}\right), ~~\text{for}~~\mathbf{ \!\!t} \in \mathbb{R}^{dm}, \end{array} $$

(1.1)

whenever it exists, i.e., $\mathbb {E}\left (\left |\varphi (\mathbf { Y}_{1},\ldots ,\mathbf { Y}_{m})\right |\right )<\infty $. We now introduce a kernel function $K:\mathbb {R}^{dm}\rightarrow \mathbb {R}$. Stute (1991) presented a class of estimators for r^(m)(φ,t), called the conditional U-statistics, which is defined for each $\mathbf { t}\in \mathbb {R}^{dm}$ to be :

$$ \widehat{r}_{n}^{(m)}(\varphi,\mathbf{t};h_{n}) = \frac{\displaystyle\sum\limits_{(i_{1},\ldots,i_{m})\in I(m,n)}\varphi({ Y}_{i_{1}},\ldots,{ Y}_{i_{m}})\mathit{K}\!\left( \!\frac{\mathbf{ t}_{1} - \mathbf{ X}_{i_{1}}}{h_{n}}{}\!\right){\cdots} \mathit{K}\!\left( \!\frac{\mathbf{ t}_{m} - \mathbf{ X}_{i_{m}}}{h_{n}}{}\!\right)}{\displaystyle\sum\limits_{(i_{1},\ldots,i_{m})\in I(m,n)}K\left( \frac{ \mathbf{ t}_{1}-\mathbf{ X}_{i_{1}}}{h_{n}}\right){\cdots} K\left( \frac{\mathbf{ t}_{m}-\mathbf{ X}_{i_{m}}}{h_{n}}\right)}, $$

(1.2)

where:

$$ I(m,n)=\left\{\mathbf{ i}=(i_{1},\ldots,i_{m}): 1\leq i_{j}\leq n ~~\text{ and }~~i_{j}\neq i_{r} ~~\text{ if }~~ j\neq r \right\}, $$

(1.3)

is the set of all m-tuples of different integers between 1 and n and {h_n}_n≥ 1 denotes a sequence of positive constants converging to zero and $n{h^{m}_{n}} \rightarrow \infty $. For notational simplicity, we let h = h_n. In the particular case m = 1, the r^(m)(φ,t) is reduced to $r^{(1)}(\varphi ,\mathbf { t})=\mathbb {E}(\varphi (\mathbf { Y})|\mathbf { X}=\mathbf { t})$ and Stute’s estimator becomes the Nadaraya-Watson estimator of r⁽¹⁾(φ,t) given by:

$$ \widehat{r}_{n}^{(1)}(\varphi, \mathbf{ t}, h)= \left. \sum\limits_{i=1}^{n}\varphi(\mathbf{ Y}_{i})K\left( \frac{\mathbf{ X}_{i}-\mathbf{ t}}{h}\right) \middle/ \sum\limits_{i=1}^{n}K\left( \frac{\mathbf{ X}_{i}-\mathbf{ t}}{h}\right) \right.. $$

The work of Sen (1994) was devoted to estimate the rate of the uniform convergence in t of $\widehat {r}_{n}^{(m)}(\varphi ,\mathbf {t};h)$ to r^(m)(φ,t). In the paper of Prakasa Rao and Sen (1995), the limit distributions of $\widehat {r}_{n}^{(m)}(\varphi ,\mathbf { t};h)$ are discussed and compared with those obtained by Stute. Harel and Puri (1996) extend the results of Stute (1991), under appropriate mixing conditions, to weakly dependent data and have applied their findings to verify the Bayes risk consistency of the corresponding discrimination rules. Stute (1996) proposed symmetrized nearest neighbour conditional U-statistics as alternatives to the usual kernel-type estimators. An important contribution is given in the paper Dony and Mason (2008) where a much stronger form of consistency holds, namely, uniform in t and in bandwidth consistency (i.e., h ∈ [a_n,b_n] where $a_{n}<b_{n}\rightarrow 0$ at some specific rate) of $\widehat {r}_{n}^{(m)}(\varphi ,\mathbf { t};h)$. In addition, uniform consistency is also established over $\varphi \in {\mathscr{F}}$ for a suitably restricted class ${\mathscr{F}}$. The main tool in their result is the use of the local conditional U-process investigated in Giné and Mason (2007a). In the last decades, empirical process theory has provided very useful and powerful tools to analyze the large sample properties of several nonparametric estimators of functionals of the distribution, such as the regression function and the density function, refer to, van der Vaart and Wellner (1996) and Kosorok (2008). Nolan and Pollard (1987) were the first to introduce the notion of uniform in bandwidth consistency for kernel density estimators and they applied empirical process methods in their study. In the series of papers, Deheuvels (2000), Deheuvels and Mason (2004), Einmahl and Mason (2005), Dony and Mason (2008), Maillot and Viallon (2009), Mason and Swanepoel (2011), Bouzebda and Elhattab(2009, 2010), Bouzebda (2012), Bouzebda et al. (2018, 2021), Bouzebda and Nemouchi (2020), Bouzebda and El-hadjali (2020), Bouzebda and Nezzal (2022) the authors established uniform consistency results for such kernel estimators, where h varies within suitably chosen intervals indexed by n. More precisely, we will consider one of the most commonly used classes of estimators that is formed by the so-called kernel-type estimators. There are basically no restrictions on the choice of the kernel function in our setup, apart from satisfying some mild conditions that we will give after. The selection of the bandwidth, however, is more problematic. It is worth noticing that the choice of the bandwidth is crucial to obtain a good rate of consistency, for example, it has a big influence on the size of the estimate’s bias. In general, we are interested in the selection of bandwidth that produces an estimator which has a good balance between the bias and the variance of the considered estimators. It is then more appropriate to consider the bandwidth varying according to the criteria applied and the available data and location, which cannot be achieved by using classical methods. The interested reader may refer to Mason (2012) for more details and discussion on the subject. In the present paper, we develop methods that permit the study of the kernel-type under nonrestrictive conditions. It is worth noticing that the high-dimensional data sets have several unfortunate properties that make them hard to analyze. The phenomenon that the computational and statistical efficiency of statistical techniques degrade rapidly with the dimension is often referred to as the “curse of dimensionality”. Density and regression estimation on manifolds has received much less attention than the “full-dimensional” counterpart. However, understanding density estimation in situations where the intrinsic dimension can be much lower than the ambient dimension is becoming ever more important: modern systems can capture data at an increasing resolution while the number of degrees of freedom stays relatively constant. One of the limiting aspects of density (regression)-based approaches is their performance in high dimensions.

We know that the notion of the intrinsic dimension, say d_M, has been studied in the statistical machine learning literature so as to establish fast estimation rates in high-dimensional kernel regression settings. There are numerous known techniques for doing so e.g. Kégl (2002), Levina and Bickel (2004), Hein and Audibert (2005), Farahmand et al. (2007). We first introduce a concept proposed by Kim et al. (2018, 2019), the so-called volume dimension, to characterize the intrinsic dimension of the underlying distribution. More specifically, the volume dimension d_vol is the decay rate of the probability of vanishing Euclidean balls. Let ∥⋅∥ be the Euclidean 2-norm. For $\mathbf { x}\in \mathbb {R}^{d}$ and r > 0, we use the notation $\mathbb {B}_{\mathbb {R}^{d}}(\mathbf { x},r)$ for the open Euclidean ball centered at x and radius r, i.e.,

$$\mathbb{B}_{\mathbb{R}^{d}}(\mathbf{ x},r)=\left\{ \mathbf{y}\in \mathbb{R}^{d}:\|\mathbf{ y}-\mathbf{ x}\|<r\right\}.$$

When a probability distribution $\mathbb {P}$ has a bounded density f_X(⋅) supported on a well-behaved manifold M of dimension d_M, it is known that, for any point x ∈ M, the measure on the ball $\mathbb {B}_{\mathbb {R}^{d}}(\mathbf { x},r)$ centered at x and radius r decays as

$$ \mathbb{P}\left( \mathbb{B}_{\mathbb{R}^{d}}(\mathbf{ x},r)\right)\sim r^{d_{M}}, $$

when r is small enough. From this, Kim et al. (2018) define the volume dimension of a probability distribution $\mathbb {P}$ to be the maximum possible exponent rate that can dominate the probability volume decay on balls, i.e., fix a subset $\mathbb {X}\subset \mathbb {R}^{d}$, then

$$ d_{\text{vol}}(\mathbb{P}):=\sup\left\{ \nu\geq 0: \limsup_{r\to 0}\sup_{\mathbf{x}\in\mathbb{X}}\frac{\mathbb{P}(\mathbb{B}_{\mathbb{R}^{d}}(\mathbf{x},r))}{r^{\nu}}<\infty\right\}. $$

(1.4)

The primary purpose of the present work is to extend the work of Kim et al. (2018) to the more general estimators, including the kernel density estimator as a particular case (studied in Kim et al. 2018), this generalization is far from being trivial and harder to control some complex classes of functions, which form a basically unsolved open problem in the literature. We aim to fill this gap in the literature by combining results Kim et al. (2018) with techniques developed in Einmahl and Mason (2005) and Dony and Mason (2008). The present extends our previous work in Bouzebda and El-hadjali (2020) in several directions; the main one is that the present paper considers the conditional U-statistics, including the regression treated in the last mentioned paper. More precisely, a uniform in bandwidth consistency results for some general kernel-type estimators are established. However, as will be seen later, the problem requires much more than “simply” combining ideas from the existing results. Delicate mathematical derivations will be required to cope with the empirical processes that we consider in this extended setting. In addition, we will consider the nonparametric Inverse Probability of Censoring Weighted (I.P.C.W.) estimators of the multivariate regression function under random censorship and obtain uniform in bandwidth consistency results that are of independent interest.

An outline of the remainder of our paper is as follows. In the forthcoming section, we introduce the mathematical framework and provide our main results concerning the uniform in bandwidth consistency of conditional U-statistics adaptive to intrinsic dimension extending the setting of the work of Dony and Mason (2008). In Section 3, we consider the conditional U-statistics in the right censored data framework. Examples of U-statistics kernel are provided in Section 3.1. In Section 4, we present how to select the bandwidth through the cross-validation procedures. An application to the nonparametric discrimination problem is discussed in Section 4.1. Some concluding remarks are given in Section 5. To prevent interrupting the presentation flow, all proofs are gathered in Section 6. A few relevant technical results are given in the ?? for easy reference.

2 Main Results

For a fixed integer m ≤ n, consider a class ${\mathscr{F}}_{q}$ of measurable functions $g : \mathbb {R}^{qm} \rightarrow \mathbb {R}$ defined on $\mathbb {R}^{qm}$, such that $\mathbb {E}g^{2}(\mathbf {Y}_{1},\ldots , \mathbf {Y}_{m}) <\infty $, which satisfies the conditions, (F.i)–(F.iii) given below. First, to avoid measurability problems, we assume that

$$ \mathscr{F}_{q} ~\text{is~a~pointwise~measurable~class,} $$

(F.i)

that is, there exists a countable subclass ${\mathscr{F}}_{0}$ of ${\mathscr{F}}_{q}$ such that we can find, for any function $g \in {\mathscr{F}}_{q}$, a sequence of functions $g_{m}\in {\mathscr{F}}_{0}$ for which

$$g_{m}(z) \rightarrow g(z),~~ z \in \mathbb{R}^{qm}.$$

This condition is discussed in van der Vaart and Wellner (1996). We also assume that ${\mathscr{F}}_{q}$ has a measurable envelope function

$$ F(\mathbf{ y}) \geq \sup_{g\in \mathscr{F}_{q}}|g(\mathbf{ y})|~~\text{for}~~\mathbf{ y}\in\mathbb{R}^{qm}. $$

(F.ii)

Notice that condition (F.i) implies that the supremum in Eq. F.ii is measurable. Finally, we assume that ${\mathscr{F}}_{q}$ is of VC-type, with characteristics A and v (“VC” for Vapnik and Červonenkis), meaning that for some A ≥ 3 and v₁ ≥ 1,

$$ \mathcal{N}(\mathscr{F}_{q},L_{2}(Q),\varepsilon)\leq \left( \frac{A\|F\|_{L_{2}(Q)}}{\varepsilon}\right)^{v_{1}},~~\text{for}~~0<\varepsilon\leq 2\|F\|_{L_{2}(Q)}, $$

(F.iii)

where Q is any probability measure on $(\mathbb {R}^{qm},{\mathscr{B}})$, where ${\mathscr{B}}$ represents the σ-field of Borel sets of $\mathbb {R}^{qm}$, such that $\|F\|_{L_{2}(Q)}< \infty $, and where for ε > 0, $\mathcal {N}({\mathscr{F}}_{q},L_{2}(Q),\varepsilon )$ is defined as the smallest number of L₂(Q)-open balls of radius ε required to cover ${\mathscr{F}}_{q}$. (If Eq. F.iii holds for ${\mathscr{F}}_{q}$, then we say that the VC-type class ${\mathscr{F}}_{q}$ admits the characteristics A and v₁). In this section, we follow Kim et al. (2018) for weakening the conditions on the kernel and making it adaptive to the intrinsic dimension of the underlying distribution and without assumptions on the distribution. It is worth noticing that for general distributions such as the one with support is a lower-dimensional manifold, the usual change of variables argument is no longer directly applicable. However, we can provide a bound based on the volume dimension under an integrability condition on the kernel, given below. Let K(⋅) be a kernel function defined on $\mathbb {R}^{dm}$, that is a measurable function satisfying

$$ {\int}_{\mathbb{R}^{d}}K(\mathbf{t})d\mathbf{t}=1. $$

Assumption 1.

(Integrability condition) Let $\|K\|_{\infty }=\sup _{\mathbf {x}\in \mathbb {R}^{d}}|K(\mathbf {x})|=:\kappa <\infty ,$ and fix k > 0. We have: either d_vol = 0 or

$$ \ {\int}_{0}^{\infty}x^{d_{\text{vol}}-1}\sup_{\| \mathbf{t} \|\geq x}|K|^{k}(\mathbf{t})dx < \infty. $$

(K.ii)

Remark 2.1.

(Kim et al. 2018) It is important to emphasize that Assumption 1 is weak, as it is satisfied by commonly used kernels. For instance, if the kernel function K(x) decays at a polynomial rate strictly faster than d_vol/k (which is at most d/k) as $\mathbf { x}\to \infty $, that is, if

$$ \limsup_{\mathbf{ x}\to\infty} \left\Vert \mathbf{ x}\right\Vert^{d_{\text{vol}}/k+\epsilon}K(x)<\infty $$

for any 𝜖 > 0, the integrability condition Eq. K.ii is satisfied. Also, if the kernel function K(x) is spherically symmetric, that is, if there exists $\widetilde {K}:[0,\infty )\to \mathbb {R}$ with $K(\mathbf { x})=\widetilde {K}(\left \Vert \mathbf { x}\right \Vert _{2})$, then the integrability condition Eq. K.ii is satisfied provided $\left \Vert K\right \Vert _{k} <\infty $. Kernels with bounded support also satisfy the condition Eq. K.ii. Thus, most of the commonly used kernels including Uniform, Epanechnikov, and Gaussian kernels satisfy the above integrability condition.

Now, we consider the class of functions

$$ \mathscr{K }:=\left\{(\mathbf{t},h)\mapsto K\left( \frac{\mathbf{t}-\cdot}{h}\right): \mathbf{t}\in\mathbb{X}, h\geq l_{n}\right\}, $$

with $\|K\|_{2}<\infty $.

Assumption 2.

Assume that ${\mathscr{K}}$ is bounded VC-class with envelope κ and dimension ν₂, i.e., there exists positive number A₂ ≥ 1 and ν₂ ≥ 1 such that, for every probability measure $\mathbb {Q}$ on $\mathbb {R}^{d}$ and for every $\varepsilon \in (0,\|K\|_{\infty })$, the covering numbers $\mathcal {N}(\mathcal {K },L_{2}(\mathbb {Q}),\varepsilon )$ satisfies

$$ \mathcal{N}(\mathcal{K },L_{2}(\mathbb{Q}),\varepsilon)\leq \left( \frac{A_{2}\|K\|_{\infty}}{\varepsilon}\right)^{\nu_{2}}. $$

(K.iii)

Furthermore, let

$$ \widetilde{\mathbf{K}}(\mathbf{t}):=\prod\limits_{j=1}^{m}K(\mathbf{t}_{j}), ~~~~\mathbf{t}= (\mathbf{t}_{1},\ldots,\mathbf{t}_{m}) $$

(K.iv)

denote the product kernel. Next, if $(S, \mathcal {S})$ is a measurable space, define the general U-statistic with kernel $H:S^{k}\rightarrow \mathbb {R}$ based on S-valued random variables Z₁,⋯ ,Z_n as

$$ U_{n}^{(k)}(H):= \frac{(n-k)!}{n!}\sum\limits_{i\in I(k,n)}H(Z_{i_{1}},\ldots, Z_{i_{k}}),~~~~~~~1\leq k\leq n, $$

(2.1)

where I(k,n) is defined as in Eq. 1.3 with m = k. Note that we do not require H to be symmetric here. For a bandwidth 0 < h and $g\in {\mathscr{F}}_{q}$, consider the U-kernel

$$G_{g,h,\mathbf{t}}(\mathbf{x}, \mathbf{y}):=g(\mathbf{y})\widetilde{\mathbf{K}}_{h}(\mathbf{t}-\mathbf{x})~~~~~~~~~~~~~~~\mathbf{x}, \mathbf{t} \in \mathbb{R}^{dm} ~~~~\text{and}~~~~ \mathbf{y} \in \mathbb{R}^{qm},$$

where, as usual, K_h(z) = h^−dK(z/h), $z\in \mathbb {R}^{d}$, and for the sample (X₁,Y₁), …,(X_n,Y_n), define

$$U_{n}(g,h,\mathbf{t}):= U_{n}^{(m)}(G_{g,h,\mathbf{t}})=\frac{(n-m)!}{n!}\sum\limits_{i\in {I_{n}^{m}}}G_{g,h,\mathbf{t}}(\mathbf{X}_{i}, \mathbf{Y}_{i}),$$

where, throughout this paper, we shall use the notation

$$ \begin{array}{@{}rcl@{}} &&{}\mathbf{X} = (\mathbf{X}_{1},\ldots, \mathbf{X}_{m})\in \mathbb{R}^{dm}~~~~\text{and}~~~~~~\mathbf{X}_{\mathbf{i}}=(\mathbf{X}_{i_{1}},\ldots, \mathbf{X}_{i_{k}})\in \mathbb{R}^{dk}~~~~\mathbf{i} \in \mathbf{I}_{n}^{k},\\ &&{}\mathbf{Y} = (\mathbf{Y}_{1},\ldots, \mathbf{Y}_{m})\in \mathbb{R}^{qm}~~~~\text{and}~~~~~~\mathbf{Y}_{\mathbf{i}}=(\mathbf{Y}_{i_{1}},\ldots, \mathbf{Y}_{i_{k}})\in \mathbb{R}^{dk}~~~~\mathbf{i} \in \mathbf{I}_{n}^{k}. \end{array} $$

Now, introduce the U-statistic process

$$ u_{n}(g,h,\mathbf{t}):=\sqrt{n}\{U_{n}(g,h,\mathbf{t})-\mathbb{E}U_{n}(g,h,\mathbf{t})\}. $$

(2.2)

We denote by I and J two fixed subsets of $\mathbb {R}^{d}$ such that

$$ \mathbf{I}=\prod\limits_{j=1}^{d}\left[a_{j}, b_{j}\right] \subset \mathbf{J}=\prod\limits_{j=1}^{d}\left[c_{j}, d_{j}\right] \subset \mathbb{R}^{d}, $$

where

$$ -\infty<c_{j}<a_{j}<b_{j}<d_{j}<\infty \text { for } j=1, \ldots, d. $$

Introduce the class of functions defined on the compact subset J^m of $\mathbb {R}^{dm}$,

$$ \mathcal{M}=\left\{r^{(m)}(\varphi,\cdot) \widetilde{f}(\cdot): \varphi \in \mathscr{F}_{q}\right\}, $$

where r^(m)(φ,⋅) is defined in Eq. 1.1 and the function $\widetilde {f}: \mathbb {R}^{dm} \rightarrow \mathbb {R}$ is defined as

$$ \widetilde{f}(\mathbf{t}):={\int}_{\mathbb{R}^{qm}} f\left( \mathbf t_{1}, \mathbf y_{1}\right) {\cdots} f\left( \mathbf t_{m}, \mathbf y_{m}\right) \mathrm{d}\mathbf y_{1} {\ldots} \mathrm{d} \mathbf y_{m}=f_{\mathbf X}\left( \mathbf t_{1}\right) {\ldots} f_{\mathbf X}\left( \mathbf t_{m}\right) ,$$

where $f\left (\cdot , \cdot \right )$ denote the joint density of (X,Y). We fix a subset $\mathbb {X}\subset \mathbb {R}^{d}$ on which we are considering the uniform convergence of the kernel regression estimator. We first characterize the intrinsic dimension of the distribution $\mathbb {P}$, proposed by Kim et al. (2018), by its rate of the probability volume growth on balls. If a probability distribution has a positive measure on a manifold with a positive reach, then the volume dimension is always between 0 and the manifold’s dimension. In particular, the volume dimension of any probability distribution is between 0 and the ambient dimension d.

Lemma 2.2.

(Kim et al. 2018)Let $\mathbb {P}$ be a probability distribution on $\mathbb {R}^{d}$, and d_vol be its volume dimension. Then for any ν ∈ [0,d_vol), there exists a constant $C_{\nu , \mathbb {P}}$ depending only on $\mathbb {P}$ and ν such that for all $\mathbf { x}\in \mathbb {X}$ and r > 0,

$$ \frac{\mathbb{P}(\mathbb{B}_{\mathbb{R}^{d}}(\mathbf{ x},r))}{r^{\nu}}\leq C_{\nu,\mathbb{P}}. $$

(2.3)

For the exact optimal rate, we impose conditions on how the probability volume decay in Eq. 2.3.

Assumption 3.

Let $\mathbb {P}$ be a probability distribution on $\mathbb {R}^{d}$, and d_vol be its volume dimension. For ν ∈ [0,d_vol), we assume that

$$ \limsup\limits_{r\rightarrow 0}\sup_{\mathbf{x}\in\mathbb{X}} \frac{\mathbb{P}(\mathbb{B}_{\mathbb{R}^{d}}(\mathbf{ x},r))}{r^{\nu}} <\infty . $$

(2.4)

Assumption 4.

Let $\mathbb {P}$ be a probability distribution on $\mathbb {R}^{d}$, and d_vol be its volume dimension. For ν ∈ [0,d_vol), we assume that

$$ \sup_{\mathbf{x}\in\mathbb{X}} \liminf\limits_{r\rightarrow 0} \frac{\mathbb{P}(\mathbb{B}_{\mathbb{R}^{d}}(\mathbf{ x},r))}{r^{\nu}} >0. $$

(2.5)

These assumptions are in fact weak and hold for common probability distributions. In particular, Assumptions 3 and 4 hold when the probability distribution has a bounded density with respect to the d-dimensional Lebesgue measure. By combining Assumption 1 and Lemma (2.0.2) of Kim et al. (2018), we can bound $\mathbb {E}_{\mathbb {P}}\left [K^{2}\right ]$ in terms of the volume dimension d_vol.

Lemma 2.3.

Let $(\mathbb {R}^{d},\mathbb {P})$ be a probability space and let $X \sim \mathbb {P}$. For any kernel K(⋅) satisfying Assumption 1 with k > 0, the expectation of the k-moment of the kernel is upper bounded as

$$ \mathbb{E}_{\mathbb{P}}\left[\left| K\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\right|^{k}\right]\leq C_{k,\mathbb{P},K,\varepsilon}h^{d_{\text{vol}}-\varepsilon} $$

(2.6)

for any ε ∈ (0,d_vol), where $C_{k,\mathbb {P},K,\varepsilon }$ is a constant depending only on $k,\mathbb {P},K$ and ε. Further, if d_vol = 0 or under Assumption 1 in Kim et al. (2018), ε can be 0 in Eq. 2.6.

We give an example from Kim et al. (2018) of an unbounded density. In this case, the volume dimension is strictly smaller than the dimension of the support, which illustrates why the dimension of the support is not enough to characterize the dimensionality of a distribution.

Example 2.4.

(Kim et al. 2018) Let $\mathbb {P}$ be a distribution on $\mathbb {R}^{d}$ having a density p with respect to the d-dimensional Lebesgue measure. Fix β < d, and suppose $p:\mathbb {R}^{d}\to \mathbb {R}$ is defined as

Then, for each fixed r > 0,

$$ \begin{array}{@{}rcl@{}} \sup_{\mathbf{x}\in\mathbb{R}^{d}}\mathbb{P}(\mathbb{B}_{\mathbb{R}^{d}}(\mathbf{ x},r)) & =\mathbb{P}(\mathbb{B}_{\mathbb{R}^{d}}(\mathbf{ 0},r))=r^{d-\beta}. \end{array} $$

Hence from definition in Eq. 1.4, the volume dimension is

$$ d_{\text{vol}}(\mathbb{P})=d-\beta. $$

In our setting, we will use

$$ \begin{array}{@{}rcl@{}} &&\widehat{r}_{n}^{(m)}(\varphi,\mathbf{t};h)\\&&= \left\{ \begin{array}{lc} \frac{\displaystyle\sum\limits_{(i_{1},\ldots,i_{m})\in I(m,n)}\varphi({ Y}_{i_{1}},\ldots,{ Y}_{i_{m}})K\left( \frac{\mathbf{ t}_{1}- \mathbf{ X}_{i_{1}}}{h}\right){\cdots} K\left( \frac{\mathbf{ t}_{m}-\mathbf{ X}_{i_{m}}}{h}\right)}{\displaystyle\sum\limits_{(i_{1},\ldots,i_{m})\in I(m,n)}K\left( \frac{ \mathbf{ t}_{1}-\mathbf{ X}_{i_{1}}}{h}\right){\cdots} K\left( \frac{\mathbf{ t}_{m}-\mathbf{ X}_{i_{m}}}{h}\right)},&\\\text{if}~~\displaystyle \sum\limits_{(i_{1},\ldots,i_{m})\in I(m,n)}K\left( \frac{ \mathbf{ t}_{1}-\mathbf{ X}_{i_{1}}}{h}\right){\cdots} K\left( \frac{\mathbf{ t}_{m}-\mathbf{ X}_{i_{m}}}{h}\right)\neq 0,\\ \displaystyle\frac{(n-m)!}{n!}\sum\limits_{(i_{1},\ldots,i_{m})\in I(m,n)}\varphi({ Y}_{i_{1}},\ldots,{ Y}_{i_{m}})\\ \text{if}~~\displaystyle \sum\limits_{(i_{1},\ldots,i_{m})\in I(m,n)}K\left( \frac{ \mathbf{ t}_{1}-\mathbf{ X}_{i_{1}}}{h}\right){\cdots} K\left( \frac{\mathbf{ t}_{m}-\mathbf{ X}_{i_{m}}}{h}\right)= 0. \end{array} \right. \end{array} $$

It is clear that $\widehat {r}_{n}^{(m)}(\varphi ,\mathbf {t};h)$ can be rewritten, for all $\varphi \in {\mathscr{F}}$, as

$$\widehat{r}_{n}^{(m)}(\varphi,\mathbf{t};h)=\frac{{\sum}_{\mathbf{i} \in {I_{n}^{m}}} \varphi(\mathbf{Y}_{\mathbf{i}})\widetilde{K}_{h}(\mathbf{t}- \mathbf{X}_{\mathbf{i}})}{{\sum}_{\mathbf{i} \in {I_{n}^{m}}}\widetilde{K}_{h}(\mathbf{t}- \mathbf{X}_{\mathbf{i}})}=\frac{U_{n}(\varphi,\mathbf{t};h)}{U_{n}(1,\mathbf{t};h)},$$

where we denote by U_n(1,t;h) the U-statistic U_n(φ,t;h) with φ ≡ 1. To prove the uniform consistency of $\widehat {r}_{n}^{(m)}(\varphi ,\mathbf {t};h)$ to r^(m)(φ,t), we shall consider another, more appropriate, centering factor than the expectation $\mathbb {E} \widehat {r}_{n}^{(m)}(\varphi ,\mathbf { t};h)$, which may not exist or may be difficult to compute. Define the centring

$$ \widehat{\mathbb{E}} \widehat{r}_{n}^{(m)}(\varphi,\mathbf{ t};h):=\frac{\mathbb{E}U_{n}(\varphi,\mathbf{t};h)}{\mathbb{E}U_{n}(1,\mathbf{t};h)}. $$

(2.7)

This centering permits us to derive results on the convergence rates of the process

$$\widehat{r}_{n}^{(m)}(\varphi,\mathbf{ t};h)-\widehat{\mathbb{E}} \widehat{r}_{n}^{(m)}(\varphi,\mathbf{ t};h)$$

to zero and the consistency of $ \widehat {r}_{n}^{(m)}(\varphi ,\mathbf { t};h)$ uniform in t and in bandwidth.

Theorem 2.5.

Let $l_{n} = c(\log n/n)^{1/dm}$ for c > 0. If the class of functions ${\mathscr{F}}_{q}$ is bounded, in the sense that for some $0 < M < \infty $,

$$ F(\mathbf{ y})<M,~~\text{for}~~\mathbf{y} \in \mathbb{R}^{qm}. $$

(2.8)

Fix ε ∈ (0,d_vol). Further, if d_vol = 0 or under Assumption 3, ε can be 0. Suppose

$$\limsup_{n}\frac{\displaystyle\left( \log\left( \frac{1}{l_{n}}\right)\right)_{+}+\log\left( \frac{2}{\delta}\right)}{ \displaystyle nl_{n}^{d_{\text{vol}}-\varepsilon}}< \infty.$$

Then we can infer, under the above mentioned assumptions on ${\mathscr{F}}_{q}$ and Assumptions 1, 2 and 4, that for all δ > 0, there exists a constant $0<\mathfrak {C}_{1}<\infty $ such that we have with probability at least 1 − δ

$$ \sup_{h\geq l_{n}}\sup_{g\in \mathscr{F}_{q}}\sup_{\mathbf{ t}\in\mathbb{R}^{dm}}\frac{\sqrt{nl_{n}^{m(2d-d_{\text{vol}}+\varepsilon)}}| U_{n}(g,\mathbf{t};h)-\widehat{\mathbb{E}} U_{n}(g,\mathbf{ t};h)|}{\sqrt{|\log l_{n}|\vee \log (2/\delta)}}\leq \mathfrak{C}_{1}. $$

(2.9)

Theorem 2.6.

Let $a_{n} = c((\log n/n)^{1-2/p})^{1/dm}$ for c > 0. If ${\mathscr{F}}_{q}$ is unbounded, but satisfies, for some p > 2,

$$ \mu_{p}:=\sup_{\mathbf{x}\in\mathbb{R}^{dm}}\mathbb{E}(F^{p}(\mathbf{Y})\mid \mathbf{ X}=\mathbf{x})<\infty, $$

(2.10)

then we can infer, under the above mentioned assumptions on ${\mathscr{F}}_{q}$ and and Assumptions 1, 2 and 4, that for all c > 0 and 0 < b₀ < 1, there exists a constant $0<\mathfrak {C}_{2}<\infty $ such that

$$ \limsup_{n\rightarrow \infty}\sup_{a_{n}\leq h\leq b_{0}}\sup_{g\in \mathscr{F}_{q}}\sup_{\mathbf{ t}\in\mathbb{R}^{dm}}\frac{\sqrt{nh^{{m(2d-d_{\text{vol}}+\varepsilon)}}}| U_{n}(g,\mathbf{t};h)-\widehat{\mathbb{E}} U_{n}(g,\mathbf{ t};h)|}{\sqrt{|\log h|\vee \log\log n}}\leq \mathfrak{C}_{2}, $$

for any ε ∈ (0,d_vol).

We mention that Kim et al. (2018) do not need continuity of the density for their results. (Of course, continuity of the density is crucial for controlling the bias.) Some related results on uniform convergence over compact subsets have been obtained by Bouzebda and El-hadjali (2020) for a much larger class of estimators including kernel estimators for regression functions among others. In this general setting, however, it is often not possible to obtain the convergence uniformly over $\mathbb {R}^{d}$. Density estimators are in that sense somewhat exceptional.

Theorem 2.7.

Besides being bounded, suppose that the marginal density function f_X of X is continuous and strictly positive on the interval I. Assume that the class of functions ${\mathscr{M}}$ is uniformly equicontinuous. It then follows that for all sequences 0 < b_n < 1 with $b_{n} \rightarrow 0$,

$$ \sup_{0<h \leq b_{n}} \sup_{\varphi \in \mathscr{F}_{q}} \sup_{\mathbf{t} \in \mathbf{I}^{m}}\left|\widehat{\mathbb{E}}\widehat{r}^{(m)}_{n}(\varphi,\mathbf{t};h)-r^{(m)}(\varphi,\mathbf{ t})\right|=o(1) ,$$

where I^m = I ×… ×I.

Corollary 2.8.

Besides being bounded, suppose that the marginal density function f_X of X is continuous and strictly positive on the interval I. It then follows, under the above mentioned assumptions on ${\mathscr{F}}_{q}$ and and Assumptions 1, 2 and 4, that for all c > 0 and all sequences 0 < b_n < 1 with $a_{n}^{\prime }\leq b_{n} \rightarrow 0$, there exists a constant $0<\mathfrak {C}_{3}<\infty $ such that

$$ \limsup_{n\rightarrow \infty}\sup_{a^{\prime}_{n}\leq h\leq b_{0}}\sup_{\varphi\in \mathscr{F}_{q}}\sup_{\mathbf{ t}\in\mathbf{I}^{m}}\frac{\sqrt{nl_{n}^{^{m(2d-d_{\text{vol}}+\varepsilon)}}}| \widehat{r}_{n}^{(m)}(\varphi,\mathbf{t};h)-\widehat{\mathbb{E}} \widehat{r}_{n}^{(m)}(\varphi,\mathbf{t};h)|}{\sqrt{{|\log h|\vee \log \log n}}}\leq \mathfrak{C}_{3}, $$

where for any ε ∈ (0,d_vol) and $a_{n}^{\prime }$ is either a_n or l_n, depending on whether the class ${\mathscr{F}}_{q}$ is bounded or not.

We can now state the main result of this section which follows easily from Theorems 2.5 and 2.6.

Corollary 2.9.

Under the conditions of Theorems 2.5 and 2.6 on f_X and the class of functions ${\mathscr{F}}_{q}$ and Assumptions 1, 2 and 4, it follows that for all sequences $0<a_{n}^{\prime }\leq \widetilde a_{n}\leq b_{n}<1$ satisfying $b_{n}\rightarrow 0$ and $n\widetilde a_{n} /\log n\rightarrow \infty $,

$$ \sup_{\widetilde a_{n} \leq h \leq b_{n}}\sup_{\varphi\in \mathscr{F}_{q}}\sup_{\mathbf{t} \in \mathbf I^{m}}|\widehat{r}_{n}^{(m)}(\varphi,\mathbf{t};h)-r^{(m)}(\varphi,\mathbf{ t})|\rightarrow 0, a.s. $$

Remark 2.10.

Under additional, weak regularity conditions on $\mathbb {P}$, the value of ε can be taken equal to 0 (2.9). Under the assumption that the distribution has a bounded Lebesgue density, d_vol = d so our result recovers existing results in literature in terms of rates of convergence, in particular the results presented in Dony and Mason (2008). Our results complement those Dony and Mason (2008) by relaxing the condition on the kernel functions as it was done by Kim et al. (2018). At this point, we mention that our results are stated in the multivariate setting and more importantly are adaptative to the dimension volume. This alleviates the problem of the curse of dimension. To be more precise, it is well known that the estimation problems of a regression function are especially hard in the case when the dimension of the explanatory X is large. It is worth noticing that one consequence of this is that the optimal minimax rate of convergence n^{− 2k/(2k+d)} for the estimation of a k times differentiable regression function converges to zero rather slowly if the dimension d of X is large compared to k. To circumvent the so-called curse of dimensionality, the only way is to impose additional assumptions on the regression functions. The simplest way is to consider the linear models but this rather restrictive parametric assumption can be extended in several ways. An idea is to consider the additive models to simplify the problem of regression estimation by fitting only functions to the data which have the same additive structure. In projection pursuit one generalizes this further by assuming that the regression function is a sum of univariate functions applied to projections of x in various directions, we note that this includes the single index models as particular cases, the interested reader may refer to Györfi et al. (2002, Chapter 22) for more rigorous developments of such techniques. Other ways to be investigated are the semi-parametric models, considered intermediary models between linear and nonparametric ones, aiming to combine the flexibility of nonparametric approaches with the interpretability of the parametric ones, for details on these methods for functional data, one can refer to Ling and Vieu (2018, Section 4.2) and the reference therein.

Remark 2.11.

We note that the main problem in using an estimator such as in Eq. 1.2 is to choose properly the smoothing parameter h. The uniform in bandwidth consistency results given in Corollary 2.9 shows that any choice of h between a_n and b_n ensures the consistency of $\widehat {r}_{n}^{(m)}(\varphi ,\mathbf {t};h)$. Namely, the fluctuation of the bandwidth in a small interval does not affect the consistency of the nonparametric estimator $\widehat {r}_{n}^{(m)}(\varphi ,\mathbf {t};h)$ of r^(m)(φ,t).

Remark 2.12.

For notational convenience, we have chosen the same bandwidth sequence for each margins. This assumption can be dropped easily. If one wants to make use of the vector bandwidths (see, in particular, Chapter 12 of Devroye and Lugosi (2001)). With obvious changes of notation, our results and their proofs remain true when h is replaced by a vector bandwidth $\mathbf { h}_{n} = (h^{(1)}_{n}, \ldots , h^{(d)}_{n})$, where ${\min \limits } h^{(i)}_{n} > 0$. In this situation we set $h={\prod }_{i=1}^{d} h^{(i)}$, and for any vector v = (v₁,…,v_d) we replace v/h by (v₁/h⁽¹⁾,…,v₁/h^(d)). For ease of presentation, we chose to use real-valued bandwidths throughout.

Remark 2.13.

In the sequel, we will need to symmetrize the functions G_g,h,t(x,y). To do this, we have

$$\bar{G}_{g,h,\mathbf{t}}(\mathbf{x}, \mathbf{y}):= \frac{1}{m!}\sum\limits_{\sigma \in {\mathbf{I}_{m}^{m}}}G_{g,h,\mathbf{t}}(\mathbf{x}_{\sigma}, \mathbf{y}_{\sigma})= \frac{1}{m!}\sum\limits_{\sigma \in \mathbf{I}_{m}^{m}}g(\mathbf{y}_{\sigma})\widetilde{\mathbf{K}}_{h}(\mathbf{t}-\mathbf{x}_{\sigma}),$$

where $\mathbf {x}_{\sigma }:=(x_{\sigma _{1}}, \ldots , x_{\sigma _{m}})$ et $\mathbf {y}_{\sigma }:=(y_{\sigma _{1}}, \ldots , y_{\sigma _{m}})$. Obviously, after symmetrization we have

$$\mathbb{E}(G_{g,h,\mathbf{t}}(\mathbf{x}, \mathbf{y}))= \mathbb{E}(\bar{G}_{g,h,\mathbf{t}}(\mathbf{x}, \mathbf{y}))~~~~~\text{and}~~~~~~U_{n}^{(m)}(\bar{G}_{g,h,\mathbf{t}}(\cdot, \cdot))=U_{n}(g,\mathbf{t};h)$$

So the U-statistic process in Eq. 2.2 may be redefined using the symmetrized kernels, hence we consider

$$ u_{n}^{(m)}(g,h,\mathbf{t}):=\sqrt{n}\{U_{n}^{(m)}(\bar{G}_{g,h,\mathbf{t}})-\mathbb{E}U_{n}^{(m)}(\bar{G}_{g,h,\mathbf{t}} )\}. $$

(2.11)

For more details, consult for instance the book of de la Peña and Giné (1999).

3 Extension to the Censored Case

Consider a triple (Y,C,X) of random variables defined in $\mathbb {R} \times \mathbb {R} \times \mathbb {R}^{d}$. Here Y is the variable of interest, C is a censoring variable and X is a concomitant variable. Throughout, we will use Maillot and Viallon (2009) notation and we work with a sample {(Y_i,C_i,X_i)_1≤i≤n} of independent and identically distributed replication of (Y,C,X), n ≥ 1. Actually, in the right censorship model, the pairs (Y_i,C_i), 1 ≤ i ≤ n, are not directly observed and the corresponding information is given by $Z_{i} := \min \limits \{Y_{i},C_{i}\}$ and , 1 ≤ i ≤ n. Accordingly, the observed sample is

$$\mathcal{D}_{n} = \{(Z_{i}, \delta_{i}, \mathbf{ X}_{i}), i = 1,\ldots,n\}.$$

Survival data in clinical trials or failure time data in reliability studies, for example, are often subject to such censoring. To be more specific, many statistical experiments result in incomplete samples, even under well-controlled conditions. For example, clinical data for surviving most types of disease are usually censored by other competing risks to life which result in death. In the sequel, we impose the following assumptions upon the distribution of (X,Y ). Denote by $\mathcal { I}$ a given compact set in $\mathbb {R}^{d}$ with nonempty interior and set, for any α > 0,

$$\mathcal I_{\alpha}=\left\{\mathbf{ x}: \inf_{\mathbf{ u}\in \mathcal{I}}\|\mathbf{ x}-\mathbf{ u}\| \leq \alpha\right\}.$$

We will assume that, for a given α > 0, (X,Y ) [resp. X] has a density function f_X,Y [resp. f_X] with respect to the Lebesgue measure on $\mathcal I_{\alpha } \times \mathbb {R}$ [resp. $\mathcal I_{\alpha }$]. For $-\infty < t < \infty $, set

$$F_{Y}(t) = \mathbb{P}(Y \leq t), ~~G(t) = \mathbb{P}(C \leq t), ~~\text{and}~~H(t) = \mathbb{P}(Z \leq t),$$

the right-continuous distribution functions of Y, C and Z respectively. For any right-continuous distribution function L defined on $\mathbb {R}$, denote by

$$T_{L} = \sup\{t \in \mathbb{R} : L(t) < 1\}$$

the upper point of the corresponding distribution. Now consider a pointwise measurable class ${\mathscr{F}}$ of real measurable functions defined on $\mathbb {R}$, and assume that ${\mathscr{F}}$ is of VC-type. We recall the regression function of ψ(Y ) evaluated at X = x, for $\psi \in {\mathscr{F}}$ and $\mathbf { x} \in \mathcal I_{\alpha }$, given by

$$ r^{(1)}(\psi,\mathbf{ x})=\mathbb{E}(\psi(Y)\mid \mathbf{ X}=\mathbf{ x}), $$

when Y is right-censored. To estimate r⁽¹⁾(ψ,⋅), we make use of the Inverse Probability of Censoring Weighted (I.P.C.W.) estimators have recently gained popularity in the censored data literature (see Kohler et al. (2002), Carbonez et al. (1995), Brunel and Comte (2006)). The key idea of I.P.C.W. estimators is as follows. Introduce the real-valued function Φ_ψ(⋅,⋅) defined on $\mathbb {R}^{2}$ by

(3.1)

Assuming the function G(⋅) to be known, first note that Φ_ψ(Y_i,C_i) = δ_iψ (Z_i)/(1 − G(Z_i)) is observed for every 1 ≤ i ≤ n. Moreover, under the Assumption (${\mathscr{I}}$) below,

(${\mathscr{I}}$):: C and (Y,X) are independent.

We have

(3.2)

Therefore, any estimate of $r^{(1)}({\Phi }_{\psi },\cdot )$, which can be built on fully observed data, turns out to be an estimate for r⁽¹⁾(ψ,⋅) too. Thanks to this property, most statistical procedures known to provide estimates of the regression function in the uncensored case can be naturally extended to the censored case. For instance, kernel-type estimates are particularly easy to construct. Set, for $\mathbf { x}\in \mathcal {I}$, h ≥ l_n, 1 ≤ i ≤ n,

$$ \begin{array}{@{}rcl@{}} \overline{\omega}_{n,K,h,i}^{(1)}(\mathbf{x}):=K\left( \frac{\mathbf{ x}-\mathbf{ X}_{i}}{h}\right)\Big/\sum\limits_{j=1}^{n}K\left( \frac{\mathbf{ x}-\mathbf{ X}_{j}}{h}\right). \end{array} $$

(3.3)

We assume that h satisfies (H.1). In view of Eqs. 3.1, 3.2, and 3.3, whenever G(⋅) is known, a kernel estimator of r⁽¹⁾(ψ,⋅) is given by

$$ \begin{array}{@{}rcl@{}} \breve{r}_{n}^{(1)}(\psi,\mathbf{x};h)=\sum\limits_{i=1}^{n}\overline{\omega}_{n,K,h,i}^{(1)}(\mathbf{x})\frac{\delta_{i}\psi(Z_{i})}{1-G(Z_{i})}. \end{array} $$

(3.4)

The function G(⋅) is generally unknown and has to be estimated. We will denote by $G^{*}_{n}(\cdot )$ the Kaplan-Meier estimator of the function G(⋅) (Kaplan and Meier, 1958). Namely, adopting the conventions

$$\prod\limits_{\emptyset} = 1$$

and 0⁰ = 1 and setting

we have

$$ G_{n}^{*}(u)=1-\prod\limits_{i:Z_{i}\leq u}\left\{\frac{N_{n}(Z_{i})-1}{N_{n}(Z_{i})}\right\}^{(1-\delta_{i})},~~\text{for}~~ u \in \mathbb{R}. $$

Given this notation, we will investigate the following estimator of r⁽¹⁾(ψ,⋅)

$$ \begin{array}{@{}rcl@{}} \breve{r}_{n}^{(1)*}(\psi,\mathbf{x};h)=\sum\limits_{i=1}^{n}\overline{\omega}_{n,K,h,i}^{(1)}(\mathbf{x})\frac{\delta_{i}\psi(Z_{i})}{1-G_{n}^{*}(Z_{i})}, \end{array} $$

(3.5)

refer to Kohler et al. (2002) and Maillot and Viallon (2009). Adopting the convention 0/0 = 0, this quantity is well defined, since $G_{n}^{*}(Z_{i})=1$ if and only if Z_i = Z_(n) and δ_(n) = 0, where Z_(k) is the k th ordered statistic associated with the sample (Z₁,…,Z_n) for k = 1,…,n and δ_(k) is the δ_j corresponding to Z_k = Z_j. When the variable of interest is right-censored, functional of the (conditional) law can generally not be estimated on the complete support (see Brunel and Comte 2006). To obtain our results, we will work under the following assumptions.

(A.1):: , where τ < T_H and ${\mathscr{F}}_{1}$ is a pointwise measurable class of real measurable functions defined on $\mathbb {R}$ and of type VC.
(A.2):: The class of functions ${\mathscr{F}}$ has a measurable and uniformly bounded envelope function Υ with,
$$ {\Upsilon}(y_{1},\ldots,y_{m})\geq \sup_{\psi \in \mathscr{F}}\mid\psi (y_{1},\ldots,y_{m})\mid , y_{i}\leq T_{H}.$$
(A.3):: The class of functions ${\mathscr{M}}$ is relatively compact concerning the sup- norm topology on $\mathcal I_{\alpha }^{m}$.

In what follows, we will study the uniform convergence of $\widetilde {m}^{*}_{\psi ,n,h}(\mathbf {x})$ centred by the following centring factor

$$\widehat{\mathbb{E}} \breve{r}_{n}^{(1)*}(\psi,\mathbf{x};h)=\frac{\displaystyle\mathbb{E}\left( \psi(Y)K\left( \frac{\mathbf{x}-\mathbf{X}}{h}\right)\right)}{\displaystyle\mathbb{E}\left( K\left( \frac{\mathbf{x}-\mathbf{X}}{h}\right)\right)}.$$

This choice is justified by the fact that, under hypothesis (${\mathscr{I}}$) we have

(3.6)

Let us assume the following conditions.

(H.1):: h ↓ 0,0 < h < 1, and $n h^{d} \uparrow \infty $;
(H.2):: $ n h^{d} / \log n \rightarrow \infty $ as $n \rightarrow \infty ;$
(H.3):: $ \log \left (1 / h\right ) / \log \log n \rightarrow \infty $ as $n \rightarrow \infty ;$

We now have all the ingredients to state the result corresponding to the censored case. Let $\left \{h_{n}^{\prime }\right \}_{n \geq 1}$ and $\left \{h_{n}^{\prime \prime }\right \}_{n \geq 1}$ be two sequences of positive constants fulfilling Assumptions (H.1-H.3) with

$$ 0<h_{n}^{\prime} \leq h_{n}^{\prime \prime}<1. $$

Bouzebda and El-hadjali (2020) showed under assumptions (A.1–3), for m = 1, (I), Assumption 3, for any kernel K(⋅) satisfying Assumptions 1 and 2, with probability at least 1 − δ,

$$ \sup\limits_{h_{n}^{\prime}\leq h \leq h_{n}^{\prime\prime}}\sup\limits_{\mathbf{ x}\in\mathcal{I}}\left| \breve{r}_{n}^{(1)*}(\psi,\mathbf{x};h)-\widehat{\mathbb{E}}(\breve{r}_{n}^{(1)*}(\psi,\mathbf{x};h))\right| \leq \mathfrak{C}_{4}\sqrt{\frac{\log\left( 1 / l_{n}\right)_{+}+\log\left( 2 / \delta\right)}{nl_{n}^{2d-d_{\text{vol}}+\varepsilon}}}, $$

(3.7)

for some positive constant $\mathfrak {C}_{4}$. A right-censored version of an unconditional U-statistic with a kernel of degree m ≥ 1 is introduced by the principle of a mean preserving reweighting scheme in Datta et al. (2010). Stute and Wang (1993) have proved almost sure convergence of multi-sample U-statistics under random censorship and provided application by considering the consistency of a new class of tests designed for testing equality in distribution. To overcome potential biases arising from right-censoring of the outcomes and the presence of confounding covariates, Chen and Datta (2019) proposed adjustments to the classical U-statistics. Yuan et al. (2017) proposed a different way in the estimation procedure of the U-statistic by using a substitution estimator of the conditional kernel given the observed data. To our best knowledge, the problem of the estimation of the conditional U-statistics was opened up to the present, and it gives and main motivation to the study of this section. A natural extension of the function defined in Eq. 3.1 is given by

From this, we have an analogous relation to Eq. 3.2 given by

An analogue estimator to Eq. 1.2 in the censored case is given by

$$ \begin{array}{@{}rcl@{}} \breve{r}_{n}^{(m)}(\psi,\mathbf{t};h)=\sum\limits_{(i_{1},\ldots,i_{m})\in I(m,n)}\frac{\displaystyle \delta_{i_{1}}\cdots\delta_{i_{m}}\psi({ Z}_{i_{1}},\ldots,{ Z}_{i_{m}})}{\displaystyle (1-G(Z_{i_{1}})\cdots(1-G(Z_{i_{m}}))}\overline{\omega}_{n,K,h,\mathbf{ i}}^{(m)}(\mathbf{t}), \end{array} $$

(3.8)

where, for i = (i₁,…,i_m) ∈ I(m,n),

$$ \begin{array}{@{}rcl@{}} \overline{\omega}_{n,K,h,\mathbf{ i}}^{(m)}(\mathbf{t}):=\frac{\displaystyle K\left( \frac{\mathbf{ t}_{1}- \mathbf{ X}_{i_{1}}}{h}\right){\cdots} K\left( \frac{\mathbf{ t}_{m}-\mathbf{ X}_{i_{m}}}{h}\right)}{\displaystyle\sum\limits_{(i_{1},\ldots,i_{m})\in I(m,n)}K\left( \frac{ \mathbf{ t}_{1}-\mathbf{ X}_{i_{1}}}{h}\right){\cdots} K\left( \frac{\mathbf{ t}_{m}-\mathbf{ X}_{i_{m}}}{h}\right)}. \end{array} $$

(3.9)

The estimator that we will investigate is given by

$$ \breve{r}_{n}^{(m)*}(\psi,\mathbf{t};h)=\sum\limits_{(i_{1},\ldots,i_{m})\in I(m,n)}\frac{\displaystyle \delta_{i_{1}}\cdots\delta_{i_{m}}\psi({ Z}_{i_{1}},\ldots,{ Z}_{i_{m}})}{\displaystyle (1-G_{n}^{*}(Z_{i_{1}})\cdots(1-G_{n}^{*}(Z_{i_{m}}))}\overline{\omega}_{n,K,h,\mathbf{ i}}^{(m)}(\mathbf{t}). $$

(3.10)

Theorem 3.1.

Let $a_{n} = c((\log n/n)^{1-2/p})^{1/dm}$ for c > 0. If the class of functions ${\mathscr{F}}$ is bounded, in the sense (A.2). Fix ε ∈ (0,d_vol). Further, if d_vol = 0 or under Assumption 3, ε can be 0. Suppose that

$$\limsup_{n}\frac{\displaystyle\left( \log\left( \frac{1}{l_{n}}\right)\right)_{+}+\log\left( \frac{2}{\delta}\right)}{ \displaystyle nl_{n}^{d_{\text{vol}}-\varepsilon}}< \infty.$$

Then we can infer, under the above mentioned assumptions on ${\mathscr{K}}$, (A.1-2) (${\mathscr{I}}$), that for all c > 0 and 0 < b₀ < 1, there exists a constant $0<\mathfrak {C}_{5}<\infty $ such that

$$ \sup_{h\geq l_{n}}\sup_{\varphi\in \mathscr{F}}\sup_{\mathbf{ t}\in\mathcal{I}^{m}}\frac{\sqrt{nl_{n}^{(2d-d_{\text{vol}}+\varepsilon)m}}| \breve{r}_{n}^{(m)*}(\psi,\mathbf{t};h)-\widehat{\mathbb{E}}\breve{r}_{n}^{(m)*}(\psi,\mathbf{t};h)|}{\sqrt{|\log l_{n}|\vee \log (2/\delta)}}\leq \mathfrak{C}_{5}. $$

Proposition 3.2.

Under assumptions (A.1-3), (${\mathscr{I}}$) and for any kernel K(⋅) satisfying Assumptions 1, 2 and 3. Let $\left \{h_{n}^{\prime }\right \}_{n \geq 1}$ and $\left \{h_{n}^{\prime \prime }\right \}_{n \geq 1}$ be two sequences of positive constants fulfilling Assumptions (H.1-H.3) with

$$ 0<h_{n}^{\prime} \leq h_{n}^{\prime \prime}<1. $$

With probability at least 1 − δ, there exists a constant $0<\mathfrak {C}_{6}<\infty $ such that

$$ \begin{array}{@{}rcl@{}} &&\sup\limits_{h_{n}^{\prime}\leq h \leq h_{n}^{\prime\prime}}\sup\limits_{\mathbf{ x}\in\mathcal{I}^{m}}\sup\limits_{ \psi\in\mathscr{F}}\left|\breve{r}_{n}^{(m)*}(\psi,\mathbf{t};h)-\widehat{\mathbb{E}}\breve{r}_{n}^{(m)*}(\psi,\mathbf{t};h)\right| \\&&\leq \mathfrak{C}_{6}\sqrt{\frac{\log\left( 1 / l_{n}\right)_{+}+\log\left( 2 / \delta\right)}{nl_{n}^{(2d-d_{\text{vol}}+\varepsilon)m}}}. \end{array} $$

(3.11)

3.1 Examples of U-statistics

Example 3.3.

Let $\widehat {{Y}_{1}{Y}_{2}}$ denote the oriented angle between Y₁,Y₂ ∈ T, T is the circle of radius 1 and center 0 in $\mathbb {R}^{2}$. Let:

Silverman (1978) has used this kernel to propose a U-process to test uniformity on the circle.

Example 3.4.

Hoeffding (1948) introduced the parameter

$$ \triangle={\int}_{-\infty}^{\infty} {\int}_{-\infty}^{\infty} D^{2}(y_{1}, y_{2}) d F(y_{1}, y_{2}), $$

where $D(y_{1}, y_{2})=F(y_{1}, y_{2})-F(y_{1}, \infty ) F(\infty , y_{2})$ and F(⋅,⋅) is the distribution function of Y₁ and Y₂. The parameter △ has the property that △ = 0 if and only if Y₁ and Y₂ are independent. From Lee (1990), an alternative expression for △ can be developed by introducing the functions

$$ \psi\left( y_{1}, y_{2}, y_{3}\right)=\left\{\begin{array}{ll} 1 & \text { if } y_{2} \leq y_{1}<y_{3} \\ 0 & \text { if } y_{1}<y_{2}, y_{3} \text { or } y_{1} \geq y_{2}, y_{3} \\ -1 & \text { if } y_{3} \leq y_{1}<y_{2} \end{array}\right. $$

and

$$ \begin{array}{@{}rcl@{}} \varphi\left( y_{1,1}, y_{1,2} , {\ldots} , y_{5,1}, y_{5,2}\right)&=&\frac{1}{4} \psi\left( y_{1,1}, y_{1,2}, y_{1,3}\right) \psi\left( y_{1,1}, y_{1,4}, y_{1,5}\right)\\&& \psi\left( y_{1,2}, y_{2,2}, y_{3,2}\right) \psi\left( y_{1,2}, y_{4,2}, y_{5,2}\right). \end{array} $$

We have

$$ \triangle=\int {\ldots} \int \varphi\left( y_{1,1}, y_{1,2} , {\ldots} , y_{5,1}, y_{5,2}\right)d F\left( y_{1,1}, y_{1,2}\right) {\ldots} d F\left( y_{1,5}, y_{2,5}\right). $$

We have

$$ \begin{array}{@{}rcl@{}} \lefteqn{ r^{(5)}\left( \varphi,t_{1},t_{2},t_{3},t_{4},{ t}_{5}\right)}\\&=&\mathbb{E}\left( \varphi((Y_{1,1},Y_{1,2}),\ldots,(Y_{5,1},Y_{5,2}))\mid X_{1}=X_{2}= X_{3}=X_{4}=X_{5}=t\right). \end{array} $$

The corresponding U-statistics may be used to test the conditional independence.

Example 3.5.

For m = 3, let , the corresponding U-Statistic corresponds to the Hollander-Proschan test-statistic (Hollander and Proschan, 1972).

Example 3.6.

For

$$\varphi({Y}_{1},{Y}_{2})=\frac{1}{2}({Y}_{1}-{Y}_{2})^{2},$$

we obtain the variance of Y.

Example 3.7.

For

$$\varphi({Y}_{1},{Y}_{2})=\frac{1}{2}({Y}_{1}-{Y}_{2})^{2},$$

we obtain:

$$ \begin{array}{@{}rcl@{}} r^{(2)}(\varphi({Y_{1}}, {Y_{2}})\mid {t}_{1},{t}_{2})={\text{Var}}({Y}_{1}\mid{X}_{1}={t}_{1}). \end{array} $$

Example 3.8.

Let Y = (Y₁,Y₂) such that Y₂ is a smooth curve, Y₂ ∈ L₂([0,1]) and $Y_{1} \in \mathbb {R}$ has a continuous distribution. For

which can be used to treat the problem of testing for conditional association between a functional variable belonging to Hilbert space and a scalar variable. More precisely, this gives the conditional Kendall’s Tau type statistics.

4 The Bandwidth Selection Criterion

Many methods have been established and developed to construct, in asymptotically optimal ways, bandwidth selection rules for nonparametric kernel estimators especially for Nadaraya-Watson regression estimator we quote among them Hall (1984), Härdle and Marron (1985). This parameter has to be selected suitably, either in the standard finite-dimensional case, or in the infinite dimensional framework for ensuring good practical performances. Following Dony and Mason (2008), the leave-one-out cross-validation procedure allows to define, for any fixed i = (i₁,…,i_m) ∈ I(m,n):

$$ \begin{array}{@{}rcl@{}} &&{}\widehat{r}_{n,\mathbf{i}}^{(m)}(\varphi,\mathbf{t};h)\\&&{}=\frac{\displaystyle\sum\limits_{(j_{1},\ldots,j_{m})\in I(m,n)(\mathbf{i})}\varphi({ Y}_{j_{1}},\ldots,{ Y}_{j_{m}})K\left( \frac{\mathbf{ t}_{1} - \mathbf{ X}_{j_{1}}}{h}\right){\cdots} K\left( \frac{\mathbf{ t}_{m} - \mathbf{ X}_{j_{m}}}{h}\right)}{\displaystyle\sum\limits_{(i_{1},\ldots,i_{m})\in I(m,n)}K\left( \frac{ \mathbf{ x}_{1}-\mathbf{ X}_{j_{1}}}{h}\right){\cdots} K\left( \frac{\mathbf{ x}_{m}-\mathbf{ X}_{j_{m}}}{h}\right)}, \end{array} $$

(4.1)

where

$$I(m,n)(\mathbf{i}):=\left\{\mathbf{j}\in I(m,n) \text{and} \mathbf{j}\neq \mathbf{i}\right\}= I(m,n)\backslash \{\mathbf{i}\}.$$

In order to minimize the quadratic loss function, we introduce the following criterion, we have for some (known) non-negative weight function $\mathcal {W}(\cdot ):$

$$ CV\left( \varphi, h\right):=\frac{(n-k)!}{n!}\sum\limits_{\mathbf{i}\in I(m,n)}\left( \varphi\left( \mathbf{Y}_{\mathbf{i}}\right)- \widehat{r}_{n,\mathbf{i}}^{(m)}(\varphi,\mathbf{X}_{\mathbf{i}};h)\right)^{2}\widetilde{\mathcal{W}}\left( \mathbf{X}_{\mathbf{i}}\right), $$

(4.2)

where

$$\widetilde{\mathcal{W}}\left( \mathbf{t}\right):=\prod\limits_{i=1}^{m}\mathcal{W}(t_{i}). $$

A natural way for choosing the bandwidth is to minimize the precedent criterion, so let’s choose $\widehat {h}_{n} \in [a_{n},b_{n}]$ minimizing among h ∈ [a_n,b_n]:

$$\sup_{\varphi\in \mathscr{F}}CV\left( \varphi, h\right),$$

we can conclude, by Corollary 2.9, that :

$$ \sup_{\varphi\in \mathscr{F}}\sup_{\mathbf{t} \in \mathbf{ I}} \left|\widehat{r}_{n}^{(m)}(\varphi,\mathbf{t};\widehat{h}_{n})-r^{(m)}(\varphi,\mathbf{ t}) \right|\longrightarrow 0 \qquad \text{p.s.} $$

The main interest of our results is the possibility to derive the asymptotic properties of our estimate even if the bandwidth parameter is a random variable, like in the last equation. One can replace (4.2) by

$$ CV\left( \varphi, h\right):=\frac{(n-k)!}{n!}\sum\limits_{\mathbf{i}\in I(m,n)}\left( \varphi\left( \mathbf{Y}_{\mathbf{i}}\right)- \widehat{r}_{n,\mathbf{i}}^{(m)}(\varphi,\mathbf{X}_{\mathbf{i}};h)\right)^{2}\widehat{\mathcal{W}}\left( \mathbf{X}_{\mathbf{i}}, \mathbf{ t}\right), $$

(4.3)

where

$$\widehat{\mathcal{W}}\left( \mathbf{ s},\mathbf{t}\right):=\prod\limits_{i=1}^{m}\widehat{W}(s_{i}, t_{i}).$$

In practice, one takes for i ∈ I(m,n), the uniform global weights $\widetilde {\mathcal {W}}\left (\mathbf {X}_{\mathbf {i}}\right )= 1$, and the local weights

$$ \widehat{W}(\mathbf{X}_{\mathbf{i}}, \mathbf t)=\left\{ \begin{array}{ccl} 1 & \text{if}& \|\mathbf{X}_{\mathbf i}- \mathbf t\| \leq h, \\ 0 & & \text{otherwise}. \end{array}\right. $$

For sake of brevity, we have just considered the most popular method, that is, the cross-validated selected bandwidth. This may be extended to any other bandwidth selector such as the bandwidth based on Bayesian ideas (Shang, 2014).

4.1 Discrimination

Now, we apply the results to the problem of discrimination described in Section 3 of Stute (1994b), refer to also to Stute (1994a). We will use a similar notation and setting. Let φ(⋅) be any function taking at most finitely many values, say 1,…,M. The sets

$$ A_{j}=\left\{(y_{1},\ldots,y_{m}): \varphi(y_{1},\ldots,y_{k})=j\right\},~~1\leq j\leq M $$

then yield a partition of the feature space. Predicting the value of φ(Y₁,…, Y_m) is tantamount to predicting the set in the partition to which (Y₁,…,Y_m) belongs. For any discrimination rule g, we have

$$ \mathbb{P}(g(\mathbf{ X})=\varphi(\mathbf{ Y}))\leq \sum\limits_{j=1}^{M}{\int}_{\{\mathbf{ x}:g(\mathbf{ x})=j\}}\max m^{j}(\mathbf{ x})d\mathbb{P}(\mathbf{ x}), $$

where

$$ m^{j}(\mathbf{ x})=\mathbb{P}(\varphi(\mathbf{ Y})=j\mid \mathbf{ X}=\mathbf{ x}), ~~\mathbf{ x}\in\mathbb{R}^{d}. $$

The above inequality becomes equality if

$$ g_{0}(\mathbf{ x})=\arg \max_{1\leq j\leq M}m^{j}(\mathbf{ x}). $$

g₀(⋅) is called the Bayes rule, and the pertaining probability of error

$$ \mathbf{ L}^{*}=1-\mathbb{P}(g_{0}(\mathbf{ X})=\varphi(\mathbf{ Y}))=1-\mathbb{E}\left\{\max_{1\leq j\leq M}m^{j}(\mathbf{ x})\right\} $$

is called the Bayes risk. Each of the above unknown function m^j’s can be consistently estimated by one of the methods discussed in the preceding sections. Let, for 1 ≤ j ≤ M,

(4.4)

Set

$$ g_{0,n}(\mathbf{ x})=\arg \max_{1\leq j\leq M}{m^{j}_{n}}(\mathbf{ x}). $$

Let us introduce

$$ \mathbf{ L}^{*}_{n}=\mathbb{P}(g_{0,n}(\mathbf{ X})\neq \varphi(\mathbf{ Y})). $$

Then, one can show that the discrimination rule g_0,n(⋅) is asymptotically Bayes’ risk consistent

$$ \mathbf{ L}^{*}_{n} \rightarrow \mathbf{ L}^{*}. $$

5 Concluding Remarks and Future Works

In the present work, we have used general methods based upon empirical process techniques to prove uniform in bandwidth consistency for kernel-type estimators of the conditional U-statistics. We have considered an extended setting when the dimension can be lower than the ambient dimension. In addition, our work complements the paper Kim et al. (2018) by considering other examples of kernel estimates. Our proof relies on the work of Dony and Mason (2008). Our results extend and complement the last cited reference by establishing the convergence rate adaptive to the volume dimension. Our results are especially useful to establish uniform consistency of data-driven bandwidth kernel-type function estimators. The interest in doing so would be to extend our work to k-nearest neighbours estimators. Presently it is beyond reasonable hope to achieve this program without new technical arguments. We will not treat the uniform consistency of such estimators in the present paper, and leave this for future investigation.

6 Mathematical Developments

This section is devoted to the proof of our results. The previously defined notation continues to be used in what follows.

Our main tool to analyze $u_{n}^{(m)}(g,h,\mathbf { t})$ will be the Hoeffding decomposition, which we recall here for the reader’s convenience.

6.1 Hoeffding Decomposition

The Hoeffding decomposition, Hoeffding (1948), states the following, which is easy to check,

$$ u_{n}^{(m)}(g,h,\mathbf{ t})=\sqrt{n}\sum\limits_{k=1}^{m} \binom{m}{k} U_{n}^{(k)}(\pi_{k}\overline{G}_{g,h, \mathbf{ t}}(\cdot,\cdot)), $$

(6.1)

where the k th Hoeffding projection for a (symmetric) function $L:S^{m}\times S^{m} \rightarrow \mathbb {R}$ with respect to $\mathbb {P}$ is defined for x_k = (x₁,…,x_k) ∈ S^k and y_k = (y₁,…,y_k) ∈ S^k as

$$\pi_{k}L(\mathbf{x}_{k}, \mathbf{y}_{k}):= (\delta_{(x_{1},y_{1})}-\mathbb{P})\times {\ldots} \times (\delta_{(x_{k},y_{k})}-\mathbb{P})\times \mathbb{P}^{m-k}(L),$$

where $\mathbb {P}$ is any probability measure on $(S, \mathcal {S})$ and for measures $\mathbb {Q}_{i}$ on S we have

$$ \mathbb{Q}_{1}{\cdots} \mathbb{Q}_{m}h=\int{\ldots} \int h(x_{1},\ldots,x_{m})d\mathbb{Q}_{1}(x_{1}){\cdots} d\mathbb{Q}_{m}(x_{m}). $$

Considering (X_i,Y_i),i ≥ 1, i.i.d.-$\mathbb {P}$ and assuming $L \in L_{2}(\mathbb {P}^{m})$, this is an orthogonal decomposition and

$$\mathbb{E}[\pi_{k}L(\mathbf{X}_{k},\mathbf{Y}_{k})\mid(X_{2},Y_{2})),\ldots, (X_{k}, Y_{k})]=0, k\geq 1,$$

where we denote X_k and Y_k for (X₁,…,X_k) and (Y₁,…,Y_k), respectively. Thus the kernels π_kL are canonical for $\mathbb {P}$. Also, π_k,k ≥ 1, are nested projections, that is, π_k ∘ π_l = π_k if k ≤ l, and

$$ \mathbb{E}[(\pi_{k}L)^{2}(\mathbf{X}_{k},\mathbf{Y}_{k})] \leq \mathbb{E} [(L-\mathbb{E}L)^{2}(\mathbf{X},\mathbf{Y})] \leq \mathbb{E}L^{2}(\mathbf{X},\mathbf{Y}). $$

(6.2)

For more details, consult de la Peña and Giné (1999). The proofs of our results are largely inspired from Dony and Mason (2008), Kim et al. (2018) and Bouzebda and El-hadjali (2020).

6.2 Proof of Theorem 2.5: the Bounded Case

6.3 Linear Term

To establish the relation (2.9), we need to study the linear term (the first term) of Eq. 6.1, given by

$$ m\sqrt{n}U_{n}^{(1)}(\pi_{1}\bar{G}_{g,h,\mathbf{t}}(\cdot,\cdot))=\frac{m}{\sqrt{n}}\sum\limits_{i=1}^{n}\pi_{1}\bar{G}_{g,h,\mathbf{t}}(X_{i},Y_{i}). $$

Keeping in mind the fact that the class ${\mathscr{F}}_{q}$ is a VC-type class of functions with an envelope function F and the class ${\mathscr{K}}$ is a VC-type with envelope κ, which implies that the class of functions on $\mathbb {R}^{qm}\times \mathbb {R}^{dm}$ given by $\{h^{dm}G_{g,h,\mathbf { t}}(\cdot ,\cdot ):g \in {\mathscr{F}}_{q},\mathbf {t} \in \mathbb {R}^{dm}\}$ is of VC-type (via Lemma A.1 in Einmahl and Mason (2000)), as well as the class

$$ \mathcal{G}=\{h^{dm}\overline{G}_{g,h,\mathbf{ t}}(\cdot,\cdot):g \in \mathscr{F}_{q},h\geq l_{n},\mathbf{t} \in \mathbb{R}^{dm}\}, $$

(6.3)

for which we denote the VC-type characteristics by A and v, and the envelope function by

$$ \widetilde{F}(\mathbf{ y})\equiv\widetilde{F}(\mathbf{ x},\mathbf{ y}) =\kappa^{m}\sum\limits_{\sigma\in {I_{m}^{m}}}F(\mathbf{ y}_{\sigma}),~~~ \mathbf{ y}\in \mathbb{R}^{qm}. $$

(6.4)

By considering the following class of functions on $\mathbb {R}^{dk}\times \mathbb {R}^{qk}$, for k = 1,…,m,

$$ \mathcal{G}^{(k)}=\{h^{dm}\pi_{k}\overline{G}_{g,h,\mathbf{ t}}(\cdot,\cdot):g \in \mathscr{F}_{q}, h\geq{l_{n}},\mathbf{t} \in \mathbb{R}^{dm}\}, $$

(6.5)

and following Giné and Mason (2007a) one can show that each class $\mathcal {G}^{(k)}$ is of VC-type with characteristics A and v and envelope function

$$ \mathbf{ F}_{k}\leq 2^{k}\|\widetilde{\mathbf{ F}}\|_{\infty}. $$

(6.6)

Recall that the sample (X_i,Y_i),1 ≤ i ≤ n is i.i.d. and from the definition of the Hoeffding projections, for all $(x,y) \in \mathbb {R}^{d} \times \mathbb {R}^{q}$, we get

$$ \begin{array}{@{}rcl@{}} \pi_{1}\overline{G}_{g,h, \mathbf{ t}}(x,y)\!\!\!&=&\!\!\! \mathbb{E}\left( \overline{G}_{g,h, \mathbf{ t}}((x,X_{2},\ldots,X_{m}), (y,Y_{2},\ldots,Y_{m}))\right)\\&&\!\!\!-\mathbb{E}\left( \overline{G}_{g,h, \mathbf{ t}} (\mathbf{X},\mathbf{Y})\right)\\ \!\!\!&=&\!\!\!\mathbb{E}\left( \overline{G}_{g,h, \mathbf{ t}} (\mathbf{ X},\mathbf{ Y})\mid (X_{1},Y_{1})=(x,y)\right)-\mathbb{E}\left( \overline{G}_{g,h, \mathbf{t}} (\mathbf{X},\mathbf{Y})\right). \end{array} $$

Introduce the following function on $\mathbb {R}^{d} \times \mathbb {R}^{q}$:

$$ \begin{array}{@{}rcl@{}} S_{g,h,\mathbf{ t}}:\mathbb{R}^{d}\times \mathbb{R}^{q}&\rightarrow& \mathbb{R}\\ (x,y)&\mapsto& mh^{dm}\mathbb{E}\left( \overline{G}_{g,h, \mathbf{ t}} (\mathbf{ X},\mathbf{ Y})\mid (X_{1},Y_{1})=(x,y)\right). \end{array} $$

Making use of this notation, we can write

$$ mh^{dm}\pi_{1}\overline{G}_{g,h, \mathbf{ t}}(x,y)= S_{g,h,\mathbf{ t}}(x,y)-\mathbb{E}(S_{g,h,\mathbf{ t}} (X_{1},Y_{1})). $$

For all $g \in {\mathscr{F}}_{q}$, h ≥ l_n and $\mathbf {t} \in \mathbb {R}^{dm}$, the linear term of the decomposition in Eq. 6.1 times h^dm is given by

$$ \begin{array}{@{}rcl@{}} m\sqrt{n}h^{dm} U^{(1)}_{n}(\pi_{1}\overline{G}_{g,h,\mathbf{ t}}) &=&\frac{1}{\sqrt{n}}\sum\limits_{i=1}^{n}\{S_{g,h,\mathbf{ t}}(X_{i},Y_{i})-\mathbb{E}(S_{g,h,\mathbf{ t}} (X_{i},Y_{i}))\}\\ &=:&\alpha_{n}(S_{g,h,\mathbf{ t}}), \end{array} $$

where we recall that the last expression is the empirical process α_n(⋅) based on the sample (X₁,Y₁),…,(X_n,Y_n) and we set for $\mathbf {t}\in \mathbf {I}\subset \mathbb {X}$, $g \in {\mathscr{F}}_{q}$ and h ≥ l_n the class of normalised functions on $\mathbb {R}^{dm} \times \mathbb {R}^{qm}$,

$$ \begin{array}{@{}rcl@{}} \mathcal{S}_{n}=\left\{S_{g,h,\mathbf{ t}}(\cdot, \cdot): g \in \mathscr{F}_{q}, h\geq l_{n}, \mathbf{ t}\in \mathbf{I} \subset \mathbb{X}\right\}. \end{array} $$

(6.7)

Now, we have to bound S_g,h,t. From Eq. 2.8, we get

$$ \begin{array}{@{}rcl@{}} \left|S_{g,h,\mathbf{ t}}(x, y)\right| \leq m h^{dm} M\left\|K\right\|_{\infty}^{m}. \end{array} $$

In order to bound the VC dimension of $ \mathcal {S}_{n}$, we remark that $ \mathcal {S}_{n}=m\mathcal {G}^{(1)}$ is VC-type with characteristics A and v as defined in Eq. 6.5 for k = 1. For reader convenience, we give more details. Let us give the bound for the VC dimension of $\mathcal {S}_{n}$. Fix $\eta <l_{n}^{-dm}\left \| S_{g,h,\mathbf {t}}\right \|_{\infty }$ and a probability measure $\mathbb {Q}$ on $\mathbb {R}^{d}$. Suppose

$$\left[l_{n} ,\left( \frac{\eta}{2\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}}\right)^{-1/dm}\right]$$

is covered by balls of the form

$$\left\lbrace\left( h_{i}-\frac{\sqrt{n}\eta l_{n}^{dm+1}}{3dm\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}},h_{i}+\frac{\sqrt{n}\eta l_{n}^{dm+1}}{3dm\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}}\right), 1 \leq i \leq N_{1}\right\rbrace,$$

and $\left (\mathcal {S}_{n}, \mathbb {L}_{2}(\mathbb {Q})\right )$ is covered b

$$\left\lbrace\mathbb{B}_{\mathbb{L}_{2}(\mathbb{Q})}\left( K_{j},\frac{l_{n}^{dm}\eta}{3mM}\right)\cup \mathbb{B}_{\mathbb{L}_{2}(\mathbb{Q})}\left( g_{k},\widetilde{\varepsilon}\right),1 \leq j \leq N_{2}, 1 \leq k \leq N_{3} \right\rbrace, $$

where

$$\widetilde{\varepsilon}\leq \frac{l_{n}^{dm}\eta}{3mM\left\|K\right\|^{m}_{\infty}}.$$

For 1 ≤ i ≤ N₁,1 ≤ j ≤ N₂ and 1 ≤ k ≤ N₃, we let

$$S_{i,j,k}=\frac{1}{\sqrt{n}h_{i}^{dm}}S_{j,k}=\frac{m}{h_{i}^{dm}}\mathbb{E}\left( g_{k}(\mathbf{Y})\widetilde{K}_{j}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mid (X_{1},Y_{1})=(x,y)\right) .$$

Also, choose $b_{0}>\left (\frac {\sqrt {n}\eta }{2\left \|S_{g,h,\mathbf {t}}\right \|_{\infty }}\right )^{-1/dm},$ $ \mathbf {t}_{0}\in \mathbf {I}, g_{0}\in {\mathscr{F}}_{q}$ and let

$$S_{0}=\frac{1}{\sqrt{n}b_{0}}S_{g_{0}, b_{0},\mathbf{t}_{0}}.$$

We will show that

$$ \begin{array}{@{}rcl@{}} &&\left\lbrace\mathbb{B}_{\mathbb{L}_{2}(\mathbb{Q})}\left( S_{i,j,k},\eta\right): 1 \leq i \leq N_{1},1 \leq j \leq N_{2} \text{and}\right.\\&&\left. 1 \leq k \leq N_{3}\right\rbrace\cup \left\lbrace\mathbb{B}_{\mathbb{L}_{2}(\mathbb{Q})}\left( S_{0},\eta\right)\right\rbrace \text{covers} \mathcal{S}_{n}. \end{array} $$

(6.8)

For the first case when $h\leq \left (\frac {\sqrt {n}\eta }{2\left \|S_{g,h,\mathbf {t}}\right \|_{\infty }}\right )^{-1/dm}$, find h_i, K_j and g_k with

$$ \begin{array}{@{}rcl@{}}h&\in&\left( h_{i}-\frac{\sqrt{n}\eta l_{n}^{dm+1}}{3dm\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}},h_{i}+\frac{\sqrt{n}\eta l_{n}^{dm+1}}{3dm\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}}\right) ,\\ g & \in& \mathbb{B}_{\mathbb{L}_{2}(\mathbb{Q})}\left( g_{k},\widetilde{\varepsilon}\right),\\ K&\in&\mathbb{B}_{\mathbb{L}_{2}(\mathbb{Q})}\left( K_{j},\frac{l_{n}^{dm}\eta}{3mM}\right). \end{array} $$

Then the distance between $\frac {1}{\sqrt {n}h^{dm}}S_{g,h,\mathbf {t}}$ and $\frac {1}{\sqrt {n}h_{i}^{dm}}S_{j,k}$ is upper bounded as follows

$$ \begin{array}{@{}rcl@{}} \lefteqn{ \left\|\frac{1}{\sqrt{n}h^{dm}}S_{g,h,\mathbf{ t}}-\frac{1}{\sqrt{n}h_{i}^{dm}}S_{j,k}\right\|_{\mathbb{L}_{2}(\mathbb{Q})}}\\&=&\!\!\!\left\| \frac{1}{h^{dm}}m\mathbb{E}(\overline{G}_{g,h, \mathbf{ t}} (\mathbf{ X},\mathbf{ Y})\mid (X_{1},Y_{1})=(x,y)) -\frac{1}{\sqrt{n}h_{i}^{dm}}S_{j,k}(X_{1},Y_{1})\right\|_{\mathbb{L}_{2}(\mathbb{Q})}\\ &\leq& \!\!\!\left\| \frac{m}{h^{dm}}\mathbb{E}\left( g(\mathbf{Y})\widetilde{K}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mid (X_{1},Y_{1})=(x,y)\right)\right.\\&&\left.\!\!\!-\frac{m}{h_{i}^{dm}}\mathbb{E}\left( g(\mathbf{Y})\widetilde{K}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mid (X_{1},Y_{1})=(x,y)\right)\right\|_{\mathbb{L}_{2}(\mathbb{Q})}\\ &&\!\!\!+ \left\| \frac{m}{h_{i}^{dm}}\mathbb{E}\left( g(\mathbf{Y})\widetilde{K}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mid (X_{1},Y_{1})=(x,y)\right)-\frac{1}{\sqrt{n}h_{i}^{dm}}S_{i,k}\right\|_{\mathbb{L}_{2}(\mathbb{Q})}\\ &\leq&\!\!\! \left\| \frac{1}{\sqrt{n}h^{dm}}S_{g,h,\mathbf{ t}}-\frac{1}{\sqrt{n}h_{i}^{dm}}S_{g,h,\mathbf{ t}}\right\|_{\mathbb{L}_{2}(\mathbb{Q})}\\ &&\!\!\!+\left\| \frac{1}{\sqrt{n}h_{i}^{dm}}S_{g,h,\mathbf{ t}}-\frac{m}{h_{i}^{dm}}\mathbb{E}\left( g(\mathbf{Y})\widetilde{K}_{j}\left( \frac{\mathbf{t} - \mathbf{X}}{h}\right)\mid (X_{1},Y_{1}) = (x,y)\right)\right\|_{\mathbb{L}_{2}(\mathbb{Q})}\\ &&\!\!\!+\left\|\frac{m}{h_{i}^{dm}}\mathbb{E}\left( g(\mathbf{Y})\widetilde{K}_{j}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mid (X_{1},Y_{1})=(x,y)\right)- \right.\\&&\!\!\!\left.\frac{m}{h_{i}^{dm}}\mathbb{E}\left( g_{k}(\mathbf{Y})\widetilde{K}_{j}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mid (X_{1},Y_{1})=(x,y)\right)\right\|_{\mathbb{L}_{2}(\mathbb{Q})}. \end{array} $$

(6.9)

Now the first term of Eq. 6.9 is upper bounded as

$$ \begin{array}{@{}rcl@{}} &&\left\| \frac{1}{\sqrt{n}h^{dm}}S_{g,h,\mathbf{t}}-\frac{1}{\sqrt{n}h_{i}^{dm}}S_{g,h,\mathbf{t}}\right\|_{\mathbb{L}_{2}(\mathbb{Q})}\\&=& \left|\frac{1}{h^{dm}}-\frac{1}{h_{i}^{dm}}\right|\frac{ \left\|S_{g,h,\mathbf{t}}\right\|_{\mathbb{L}_{2}(\mathbb{Q})}}{\sqrt{n}}\\ &=&|h_{i}-h| \sum\limits_{k=0}^{dm-1} h_{i}^{k-dm}h^{-1-k}\frac{\left\|S_{g,h,\mathbf{t}}\right\|_{\mathbb{L}_{2}(\mathbb{Q})}}{\sqrt{n}} \\ &\leq&|h_{i}-h| dml_{n}^{-dm-1}\frac{1}{\sqrt{n}}\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}< \frac{\eta}{3}. \end{array} $$

(6.10)

Also, the second term of Eq. 6.9 is upper bounded as

$$ \begin{array}{@{}rcl@{}} &&\left\|\frac{1}{\sqrt{n}h_{i}^{dm}}S_{g,h,\mathbf{ t}}-\frac{m}{h_{i}^{dm}}\mathbb{E}\left( g(\mathbf{Y})\widetilde{K}_{j}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mid (X_{1},Y_{1})=(x,y)\right)\right\|_{\mathbb{L}_{2}(\mathbb{Q})}\\ &&=\frac{m}{h_{i}^{dm}}\left\|\mathbb{E}\left( g(\mathbf{Y})\widetilde{K}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mid (X_{1},Y_{1})=(x,y)\right)\right.\\ &&\qquad\left.-\mathbb{E}\left( g(\mathbf{Y})\widetilde{K}_{j}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mid (X_{1},Y_{1})=(x,y)\right)\right\|_{\mathbb{L}_{2}(\mathbb{Q})}\\ &&\leq \frac{m M}{h_{i}^{dm}}\left\|\mathbb{E}\left( \widetilde{K}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mid (X_{1},Y_{1})=(x,y)\right)\right.\\&&\left.-\mathbb{E}\left( \widetilde{K}_{j}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mid (X_{1},Y_{1})=(x,y)\right)\right\|_{\mathbb{L}_{2}(\mathbb{Q})}\\ &&\leq \frac{\eta}{3}. \end{array} $$

(6.11)

The last term of Eq. 6.9 is upper bounded as

$$ \begin{array}{@{}rcl@{}} \lefteqn{\left\|\frac{m}{h_{i}^{dm}}\mathbb{E}\left( g(\mathbf{Y})\widetilde{K}_{j}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mid (X_{1},Y_{1})=(x,y)\right) \right.}\\&&\left.- \frac{m}{h_{i}^{dm}}\mathbb{E}\left( g_{k}(\mathbf{Y})\widetilde{K}_{j}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mid (X_{1},Y_{1})=(x,y)\right)\right\|_{\mathbb{L}_{2}(\mathbb{Q})} \\&\leq& ml_{n}^{-dm}\|K\|_{\infty}^{m}\left\|g-g_{k}\right\|_{\mathbb{L}_{2}(\mathbb{Q})}\\ &<& ml_{n}^{-dm}\|K\|_{\infty}^{m}\widetilde{\varepsilon}\leq\frac{\eta}{3}. \end{array} $$

(6.12)

By combining the Eqs. 6.10, 6.11 and 6.12 to 6.9, we readily obtain the following bound

$$ \left\|\frac{1}{\sqrt{n}h^{dm}}S_{g,h,\mathbf{ t}}-\frac{1}{\sqrt{n}h_{i}^{dm}}S_{j,k}\right\|_{\mathbb{L}_{2}(\mathbb{Q})}< \eta.$$

For the second case when $h>\left (\frac {\sqrt {n}\eta }{2\left \|S_{g,h,\mathbf {t}}\right \|_{\infty }}\right )^{-1/dm}$, we have

$$\left\| \frac{1}{\sqrt{n}h^{dm}}S_{g,h,\mathbf{t}}\right\|_{\mathbb{L}_{2}(\mathbb{Q})}\leq \left\| \frac{1}{\sqrt{n}h^{dm}}S_{g,h,\mathbf{t}}\right\|_{\infty}<\frac{\eta}{2},$$

holds, and hence

$$\left\| \frac{1}{\sqrt{n}h^{dm}}S_{g,h,\mathbf{t}}-S_{0}\right\|_{\mathbb{L}_{2}(\mathbb{Q})}\leq \left\| \frac{1}{\sqrt{n}h^{dm}}S_{g,h,\mathbf{t}}\right\|_{\mathbb{L}_{2}(\mathbb{Q})}+\left\| S_{0}\right\|_{\mathbb{L}_{2}(\mathbb{Q})}<\eta.$$

Therefore Eq. 6.8 is shown. Hence by combining Eqs. F.ii, F.iii and 2.8 with Lemma 9.9, p.160 of Kosorok (2008), gives that every probability measure $\mathbb {Q}$ on $\mathbb {R}^{d}$ and for every $\eta \in \left (0, \sqrt {n}h^{-dm}\left \|S_{g,h,\mathbf {t}}\right \|_{\infty }\right )$, the covering number $\mathcal {N}(\mathcal {S}_{n},\eta )$ is upper bounded as

$$ \begin{array}{@{}rcl@{}} \lefteqn{\sup_{\mathbb{Q}}\mathcal{N}(\mathcal{S}_{n},L_{2}(\mathbb{Q}),\eta)}\\ &\leq &\mathcal{N}\left( \left[l_{n} ,\left( \frac{\eta}{2\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}}\right)^{-1/dm}\right],|\cdot|,\frac{\sqrt{n}\eta l_{n}^{dm+1}}{3dm\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}}\right)\\&&\sup_{\mathbb{Q}}\mathcal{N}\left( m\mathscr{F}_{q} ,L_{2}(\mathbb{Q}), \widetilde{\varepsilon}\right)\\ &&\sup_{\mathbb{Q}}\mathcal{N}\left( \mathscr{K}^{m} ,L_{2}(\mathbb{Q}),\frac{l_{n}^{dm}\eta}{3mM})\right)+1\\ &\leq& \frac{{3dm\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}}}{\eta l_{n}^{dm+1}}\left( \frac{2\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}}{\eta}\right)^{1/dm}\left( \frac{3A_{1}mM\|K\|^{m}_{\infty}}{\eta l_{n}^{dm}}\right)^{2\nu_{1} -1}\\&&\left( \frac{3A_{2}m M\left\| K\right\|_{\infty}}{ \eta l_{n}^{dm} }\right)^{m\nu_{2}}+1\\ &\leq &\left( \frac{3Adm\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}}{\eta l_{n}^{dm}}\right)^{2\nu_{1}+m\nu_{2}-1}\left[\left( \frac{3d^{2-2\nu_{1}-m\nu_{2}}\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}}{\eta l_{n}^{dm+1}}\right)\right. \\ &&\left( \frac{2\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}}{\eta}\right)^{1/dm}+\left.\left( \frac{3Adm\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}}{\eta l_{n}^{dm}}\right)^{-2\nu_{1}-m\nu_{2}+1}\right]\\ &\leq&\left( \frac{3Adm\left\|S_{g,h,\mathbf{t}}\right\|_{\infty}}{\eta l_{n}^{dm}}\right)^{2\nu_{1}+m\nu_{2}-1}, \end{array} $$

(6.13)

for some finite constant $0<A<\infty $. For p > 2, note that assumption (2.8) implies that

$$ \sup\limits_{\mathbf{x}\in\mathbf{I}}\mathbb{E}(F^{p}(\mathbf{Y})\mid \mathbf{X}=\mathbf{x})< M^{p}<\infty. $$

From Lemma 2.3 and using Jensen’s inequality, we observe that, for p ≥ k

$$ \begin{array}{@{}rcl@{}} \mathbb{E}\left[S_{g,h,\mathbf{t}}(X,Y)\right]^{k}&\leq & m^{k}n^{\frac{k}{2}}h^{kdm}\mathbb{E}\left( \overline{G}^{k}_{g,h, \mathbf{ t}} (\mathbf{ X},\mathbf{ Y})\right)\\ &\leq& m^{k}n^{\frac{k}{2}}\mathbb{E}\left( \widetilde{K}^{k}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\mathbb{E}\left( g^{k}(\mathbf{Y})\mid \mathbf{X}=\mathbf{x}\right)\right), \end{array} $$

where

$$\mathbb{E}(g^{k}(\mathbf{Y}))\mid \mathbf{X}=\mathbf{x})\leq \sup\limits_{\mathbf{x}\in\mathbf{J}}\mathbb{E}(F^{k}(\mathbf{Y})\mid \mathbf{X}=\mathbf{x}).$$

Then, we have

$$ \mathbb{E}\left[S_{g,h,\mathbf{t}}(X,Y)\right]^{2}\leq M^{p}m^{2} \mathbb{E}\left[\prod\limits_{j=1}^{m} K^{2}\left( \frac{\mathbf{t}_{j}-\mathbf{X}}{h}\right)\right]. $$

By Hölder inequality and using once more Lemma 2.3 we obtain

$$ \mathbb{E}\left[\prod\limits_{j=1}^{m} K^{2}\left( \frac{\mathbf{t}_{j}-\mathbf{X}}{h}\right)\right] \leq\prod\limits_{j=1}^{m} \left( \mathbb{E}\left[K^{2p_{j}}\left( \frac{\mathbf{t}_{j}-\mathbf{X}}{h}\right)\right] \right)^{1/p_{j}}, $$

where

$$\sum\limits_{j=1}^{m}\frac{1}{p_{j}}=1, ~~0<p_{j}\leq \infty, j=1,\ldots,m.$$

Hence, from Lemma 2.3, we have

$$ \mathbb{E}\left[\prod\limits_{j=1}^{m} K^{2}\left( \frac{\mathbf{t}_{j}-\mathbf{X}}{h}\right)\right] \leq \widetilde{ C}_{\mathbb{P},K,\varepsilon}^{m} h^{m(d_{\text{vol}}-\varepsilon)}, $$

(6.14)

where

$$ \widetilde{ C}_{\mathbb{P},K,\varepsilon}=\max_{1\leq j\leq m} C_{2p_{j},\mathbb{P},K,\varepsilon}. $$

Then, we readily obtain

$$ \begin{array}{@{}rcl@{}} \mathbb{E}\left[S_{g,h,\mathbf{t}}(X,Y)\right]^{2}&\leq & \widetilde{ C}_{\mathbb{P},K,\varepsilon}^{m} M^{p}m^{2} h^{m(d_{\text{vol}}-\varepsilon)}. \end{array} $$

(6.15)

Now from Eq. 6.15, applying Theorem 7.1 to the class $\mathcal {S}_{n}$ defined in Eq. 6.7 gives that

$$\sup\limits_{S_{g,h,\mathbf{t}}\in \mathcal{S}_{n}}\left|\frac{1}{n}\sum\limits_{i=1}^{n}S_{g,h,\mathbf{ t}}(X_{i},Y_{i})-\mathbb{E}(S_{g,h,\mathbf{ t}} (X_{1},Y_{1}))\right|$$

is upper bounded with probability at least 1 − δ as

$$ \begin{array}{@{}rcl@{}} &&{}\sup\limits_{S_{g,h,\mathbf{t}}\in \mathcal{S}_{n}}\left|\frac{1}{n}\sum\limits_{i=1}^{n}S_{g,h,\mathbf{ t}}(X_{i},Y_{i})-\mathbb{E}(S_{g,h,\mathbf{ t}} (X_{1},Y_{1}))\right| \leq \displaystyle C_{A,\|\widetilde{\mathbf{ F}}\|_{\infty},\nu,d_{\text{vol}},\widetilde{ C}_{\mathbb{P},K,\varepsilon},m,\varepsilon}\\&&{}\times\left( \frac{\displaystyle\left( \log\left( \frac{1}{h}\right)\right)_{+}}{\displaystyle n} +\sqrt{\frac{\displaystyle h^{m(d_{\text{vol}}-\varepsilon)} \left( \log\left( \frac{1}{h}\right)\right)_{+}}{\displaystyle n }}\right.\\&&\left.{}+\sqrt{\frac{\displaystyle h^{m(d_{\text{vol}}-\varepsilon)} \log\left( \frac{2}{\delta}\right)}{ \displaystyle n }}+\frac{\log\left( \frac{\displaystyle 2}{\displaystyle \delta}\right)}{\displaystyle n}\right). \end{array} $$

Using the condition

$$\limsup_{n}\frac{\displaystyle\left( \log\left( \frac{1}{l_{n}}\right)\right)_{+}+\log\left( \frac{2}{\delta}\right)}{ \displaystyle nl_{n}^{d_{\text{vol}}-\varepsilon}}< \infty,$$

we conclude that

$$ \sup_{h\geq l_{n}}\sup_{g\in \mathscr{F}_{q}}\sup_{\mathbf{ t}\in\mathbf{I}}\frac{m\sqrt{nl_{n}^{m(2d-d_{\text{vol}}+\varepsilon)}}|U^{(1)}_{n}(\pi_{1}\overline{G}_{g,h,\mathbf{ t}}))|}{\sqrt{|\log l_{n}|\vee \log (2/\delta)}}\leq C_{1}. $$

6.4 The Other Terms of Eq. 6.1

We will follow the steps of the proof of Dony and Mason (2008). Now we consider the other terms of the Hoeffding decomposition (6.1) and show that is almost surely upper bounded, that is, for each k = 2,…,m,

$$ \sup_{h\geq l_{n}}\sup_{g\in \mathscr{F}_{q}}\sup_{\mathbf{ t}\in \mathbb{R}^{dm}}\frac{\binom{m}{k}\sqrt{nl_{n}^{m(2d-d_{\text{vol}}+\varepsilon)}}|U^{(k)}_{n}(\pi_{k}\overline{G}_{g,h,\mathbf{ t}}))|}{\sqrt{|\log l_{n}|\vee \log (2/\delta)}}\leq C_{2}. $$

(6.16)

By the fact that $nl_{n}^{dm}=c^{dm}\log n$, this will be established if we can obtain that for each k = 2,…,m,

$$ \sup_{h\geq l_{n}}\sup_{g\in \mathscr{F}_{q}}\sup_{\mathbf{ t}\in\mathbb{R}^{dm}}\frac{\sqrt{nl_{n}^{m(2d-d_{\text{vol}}+\varepsilon)}}|U^{(k)}_{n}(\pi_{k}\overline{G}_{g,h,\mathbf{ t}}))|}{(\sqrt{|\log l_{n}|\vee \log (2/\delta)})^{k}}=O\left( \frac{1}{\sqrt{{l_{n}^{dm}n^{k-1}}}}\right). $$

(6.17)

To establish the uniform in bandwidth convergence rates, we have to use a blocking argument and a decomposition of the interval [l_n,b₀], for b₀ large enough, into smaller intervals. For this, set n_ℓ = 2^ℓ,ℓ ≥ 0 and consider the intervals ${\mathscr{H}}_{\ell , j}:=[h_{\ell ,j-1},h_{\ell ,j}]$ where the boundaries are given by $h^{m}_{\ell , j}:=2^{j}l_{n_{\ell }}^{m}$. By setting

$$L(\ell)=\max\{j:h_{_{\ell},j}\leq 2b_{0}\},$$

remark that

$$ [l_{n_{\ell}}, b_{0}]\subseteq \bigcup_{\ell=1}^{L(\ell)}\mathcal{H}_{\ell,j} \text{and} {L(\ell)\sim\log\left( \frac{n_{\ell}b_{0}}{c\log{n_{\ell}}}\right)/ \log2}, $$

(6.18)

implying, in particular, that $L(\ell )\leq 2\log n_{\ell }$. This fact will be used tacitly to conclude some crucial steps of the proofs. Next, for 1 ≤ j ≤ L(ℓ), consider the class of functions on $\mathbb {R}^{dm}\times \mathbb {R}^{qm}$,

$$\mathcal{G}_{\ell,j}:=\{h^{dm}\bar{G}_{g,h,\mathbf{t}}(\cdot, \cdot):g\in\mathcal{F}_{q},h\in\mathcal{H}_{\ell,j}, \mathbf{t}\in \mathbb{R}^{dm}\},$$

as well the class on $\mathbb {R}^{dm}\times \mathbb {R}^{qm}$,

$$\mathcal{G}_{\ell,j}^{(k)}:=\left\{\frac{h^{dm}\pi_{k}\bar{G}_{g,h,\mathbf{t}}(\cdot, \cdot)}{M_{k}}:g\in\mathcal{F}_{q},h\in\mathcal{H}_{\ell,j}, \mathbf{t}\in \mathbb{R}^{dm}\right\},$$

where M_k = 2^kκ^mM. Clearly, each class $\mathcal {G}_{\ell ,j}$ is of VC-type with the same characteristics as $\mathcal {G}^{k}$ (and thus as $\mathcal {G}$) with envelope function $M_{k}^{-1}F_{k}$, where F_k is the envelope function of $\mathcal {G}^{k}$. Notice that from Eqs. 2.8 and 6.6,

$$M_{k}\geq \sup_{\mathbf{x},\mathbf{y}\in\mathbb{R}^{k}}\{|\pi_{k}\bar{G}_{g,h,\mathbf{t}(\mathbf{x},\mathbf{y})}|:g\in\mathcal{F}_{q},{0<{h<1}}, \mathbf{t}\in \mathbb{R}^{dm}\}$$

and hence each function in $\mathcal {G}_{\ell ,j}^{(k)}$ is bounded by 1. Define now for n_ℓ− 1 ≤ n ≤ n_ℓ,ℓ = 1,2,…,

$$ \mathcal{U}_{n}(j,k,\ell)=n_{\ell}^{-k/2}\sup_{H\in\mathcal{G}_{\ell,j}^{(k)}}\left|\sum\limits_{\mathbf{i}\in {I_{n}^{k}}}H(\mathbf{X}_{\mathbf{i}},\mathbf{Y}_{\mathbf{i}})\right| . $$

(6.19)

From Theorem 4 of Giné and Mason (2007b) as in Dony and Mason (2008), we get for c = 1/2,r = 2 and all x > 0 that for any ℓ ≥ 1,

$$ \mathbb{P}\left\{\max_{n_{\ell-1}<n\leq n_{\ell}}\mathcal{U}_{n}(j,k,\ell)>x\right\}\leq\frac{2}{x} \mathbb{P}\left\{\mathcal{U}_{n_{\ell}}(j,k,\ell)>x/2\right\}^{1/2}\mathbb{E}[\mathcal{U}_{n_{\ell}}^{2}(j,k,\ell)]^{1/2}. $$

(6.20)

We shall apply an exponential inequality and a moment bound for U-statistics, due to, respectively, de la Peña and Giné (1999) and Giné and Mason (2007b), on the class $\mathcal {G}_{\ell ,j}(k)$ to bound (6.20). To use these results, we must first derive some bounds. First, it is readily checked that

$$ \begin{array}{@{}rcl@{}} \mathcal{U}_{n}(j,k,\ell)&=& n_{\ell}^{-k/2}\sup_{H\in\mathcal{G}_{\ell,j}^{(k)}}\left|\sum\limits_{\mathbf{i}\in {I_{n}^{k}}}H(\mathbf{X}_{\mathbf{i}},\mathbf{Y}_{\mathbf{i}})\right|\\ &=& n_{\ell}^{-k/2}{n\choose k}{n\choose k}^{-1}\sup_{H\in\mathcal{G}_{\ell,j}^{(k)}}\left|\sum\limits_{\mathbf{i}\in {I_{n}^{k}}}H(\mathbf{X}_{\mathbf{i}},\mathbf{Y}_{\mathbf{i}})\right|\\ &=& n_{\ell}^{-k/2}{n\choose k}\|U_{n}^{(k)}(\pi_{k}G)\|_{\mathcal{G}_{\ell,j}^{(k)}}\\ &\leq& n_{\ell}^{k/2}\|U_{n}^{(k)}(\pi_{k}G)\|_{\mathcal{G}_{\ell,j}^{(k)}}, \end{array} $$

(6.21)

for all n_ℓ− 1 < n ≤ n_ℓ. Second, notice that in Assumption 2, the kernel K(⋅) is assumed to be bounded by κ and, for notational convenience in the proofs, to have support in [− 1/2,1/2], so that by assumption (2.8) and M_k = 2^kκ^mM, for $H\in \mathcal {G}_{\ell ,j}^{(k)}$, we have by Eqs. 6.2 and 6.14,

$$ \begin{array}{@{}rcl@{}} \mathbb{E}H^{2}(\mathbf{X},\mathbf{Y})&\leq& M_{k}^{-2}h^{2dm}\mathbb{E}\bar{G}^{2}_{g,h,\mathbf{t}}(\mathbf{X},\mathbf{Y})\\ &=& M_{k}^{-2}\mathbb{E}\left[g^{2}(\mathbf{Y})\Tilde{K}^{2}\left( \frac{\mathbf{t}-\mathbf{X}}{h}\right)\right]\\ &\leq& \widetilde{ C}_{k,\mathbb{P},K,\varepsilon}^{m}4^{-k}\kappa^{-2m}h^{m(d_{\text{vol}}-\varepsilon)}. \end{array} $$

For $D_{m,k}= \widetilde { C}_{k,\mathbb {P},K,\varepsilon }^{m}4^{-k}\kappa ^{-2m}$, this gives us that

$$ \sup_{H\in\mathcal{G}_{\ell,j}^{(k)}}\mathbb{E}H^{2}(\mathbf{X},\mathbf{Y})\leq D_{m,k}h_{\ell,j}^{m(d_{\text{vol}}-\varepsilon)} =:\sigma^{2}_{\ell,j}. $$

(6.22)

Since π_kπ_kL = π_kL for all k ≤ 1, we can now apply Corollary 1 of Giné and Mason (2007b) to the class $\mathcal {G}_{\ell ,j}^{(k)}$ with σ² as in Eq. 6.22 and easily obtain that for some constant A_k,

$$ \begin{array}{@{}rcl@{}} \mathbb{E}\mathcal{U}^{2}_{n_{\ell}}(j,k,\ell)&\leq& n_{\ell}^{k}\|U_{n_{\ell}}^{(k)}(\pi_{k}G)\|_{\mathcal{G}_{\ell,j}^{(k)}}^{2} \leq {2^{k}A_{k}h_{\ell,j}^{m(d_{\text{vol}}-\varepsilon)}|\log(h_{\ell,j})|^{k}} \\&\leq& {2^{k}A_{k}h_{\ell,j}^{m(d_{\text{vol}}-\varepsilon)}|\log(l_{n})|^{k}}. \end{array} $$

(6.23)

To control the probability term in Eq. 6.20, we shall apply an exponential inequality to the same class $\mathcal {G}_{\ell ,j}^{(k)}$ recall that each $H \in \mathcal {G}_{\ell ,j}^{(k)}$ is bounded by 1. Setting

$$ y^{*}=C_{1,k}(|\log l_{n}|\vee \log\log (2/\delta))^{k/2}=:C_{1,k}\lambda_{n,k}, $$

(6.24)

where $C_{1,k}< \infty $, Theorem 5.3.14 of de la Peña and Giné (1999) gives us constants C_2,k,C_3,k and C_4,k such that for j = 1,…,L(ℓ) and for any ρ > 1,

$$ \begin{array}{@{}rcl@{}} \mathbb{P}\left\{\mathcal{U}_{n_{\ell}}(j,k,\ell)>\rho^{k/2}y^{*}\right\}&\leq& C_{2,k}\exp\{-C_{3.k}\rho y^{*2/k}\}\\ &\leq& \exp\{-C_{4.k}\rho \log\log(2/\delta)\}, \end{array} $$

(6.25)

plugging the bounds Eqs. 6.23 and 6.25 into Eq. 6.20, we then get for some C_5,k > 0, and ρ ≥ 2 and ℓ large enough,

$$ \begin{array}{@{}rcl@{}} &&\mathbb{P}\left\{\max_{n_{\ell-1}<n \leq n_{\ell}}\mathcal{U}_{n}(j,k,\ell)>2\rho^{k/2}y^{*}\right\}\\&\leq& \frac{(\log 2/\delta)^{-\rho C_{4,k}/2}\sqrt{2^{k}A_{k}h_{\ell,j}^{m(d_{\text{vol}}-\varepsilon)}|\log(l_{n})|^{k}}}{C_{1,k}\sqrt{\rho^{k}(|\log l_{n}|\vee \log (2/\delta))^{k}}} \\ &\leq& \sqrt{h_{\ell,j}^{m(d_{\text{vol}}-\varepsilon)}}\log(2/\delta)^{-\rho C_{5,k}}. \end{array} $$

(6.26)

Finally, note also that

$$ \begin{array}{@{}rcl@{}} n_{\ell}^{k/2}\|U_{n}^{(k)}(\pi_{k}G)\|_{\mathcal{G}_{\ell,j}(k)} &=&n_{\ell}^{k} {n\choose k}^{-1}n_{\ell}^{-k/2}\sup_{H\in\mathcal{G}_{\ell,j}^{(k)}}\left|\sum\limits_{\mathbf{i}\in {I_{n}^{k}}}H(\mathbf{X}_{\mathbf{i}},\mathbf{Y}_{\mathbf{i}})\right|\\&=&n_{\ell}^{k} {n\choose k}^{-1} \mathcal{U}_{n}(j,k,\ell) \\&\leq& k! \prod\limits_{j=0}^{k-1}\left( \frac{n_{\ell}}{n-j}\right) \mathcal{U}_{n}(j,k,\ell) \\&\leq& C_{k}M_{k} \mathcal{U}_{n}(j,k,\ell), \end{array} $$

(6.27)

for some C_k > 0. Therefore, by Eq. 6.18, for each k = 2,…,m and ℓ large enough,

$$ \begin{array}{@{}rcl@{}} &&\max_{n_{\ell-1}<n\leq n_{\ell}} A_{n,k}\\&:=&\max_{n_{\ell-1}<n\leq n_{\ell}}\sup_{{ l_{n}\leq h\leq b_{0}}}\sup_{g\in \mathscr{F}_{q}}\sup_{\mathbf{ t}\in\mathbf{I}}\frac{\sqrt{nl_{n}^{m(2d-d_{\text{vol}}+\varepsilon)}}|U^{(k)}_{n}(\pi_{k}\overline{G}_{g,h,\mathbf{ t}}))|}{\sqrt{(|\log l_{n}|\vee \log (2/\delta))^{k}}}\\ &\leq&\max_{n_{\ell-1}<n\leq n_{\ell}}\max_{1\leq j\leq L(\ell)}\sup_{h\in \mathcal{H}_{\ell,j}}\sup_{g\in \mathscr{F}_{q}}\sup_{\mathbf{ t}\in\mathbf{I}}\frac{\sqrt{n_{\ell}l_{n}^{m(2d-d_{\text{vol}}+\varepsilon)}}|U^{(k)}_{n}(\pi_{k}\overline{G}_{g,h,\mathbf{ t}}))|}{\sqrt{(|\log l_{n}|\vee \log (2/\delta))^{k}}} \\ &\leq& \frac{C_{k}M_{k}}{\sqrt{n_{\ell}^{k-1}}}\max_{n_{\ell-1}<n\leq n_{\ell}}\max_{1\leq j\leq L(\ell)}\frac{\mathcal{U}_{n_{\ell}}(j,k,\ell)}{\lambda_{n,k}}\\ &\leq& \frac{C_{k}M_{k}}{\sqrt{{l_{n_{\ell}}^{dm}n_{\ell}^{k-1}}}}\max_{n_{\ell-1}<n\leq n_{\ell}}\max_{1\leq j\leq L(\ell)}\frac{\mathcal{U}_{n_{\ell}}(j,k,\ell)}{\lambda_{n,k}}, \end{array} $$

where λ_n,k was defined as in Eq. 6.24. Now, recall that $L(\ell )\leq 2\log (n_{\ell })$. Then Eq. 6.26 applied with ρ ≥ (2 + γ)/C_5,k, γ > 0 and in combination with the above inequality and the obvious bound

$$\sqrt{{l_{n}^{dm}n^{k-1}}} A_{n,k }\leq \sqrt{{l_{n_{\ell}}^{dm}n_{\ell}^{k-1}}} A_{n,k},$$

valid for all n_ℓ− 1 < n ≤ n_ℓ, implies for C_6,k ≥ 2ρ^k/2C_kM_kC_1,k and for the choice δ = 2^−ℓ+ 1that for k = 2,…,m,

$$ \begin{array}{@{}rcl@{}} \mathbb{P}\left\{\max_{n_{\ell-1}<n \leq n_{\ell}}\sqrt{{l_{n}^{dm}n^{k-1}}} A_{n,k }>C_{6,k}\right\}&\leq& {\sum\limits_{j=1}^{L(\ell)}\sqrt{h_{\ell,j}^{m(d_{\text{vol}}-\varepsilon)}}\log(2/\delta)^{-\rho C_{5,k}}}\\ &\leq&{ L(\ell)\log(2/\delta)^{-\rho C_{5,k}}}\\ &\leq& { 2 (\ell \log2)^{-(1+\gamma)}}. \end{array} $$

(6.28)

This proves, via Borel-Cantelli, that Eq. 6.17 holds, which obviously implies Eq. 6.16 and hence complete the proof of Theorem 2.5.

6.5 Proof of Theorem 2.6: the Unbounded Case

To prove this theorem, we will need to truncate the conditional U-statistic u_n(g,h,t). If condition (2.8) is not satisfied, we consider bandwidths lying in the smaller interval ${\mathscr{H}}_{n_{\ell }}^{\prime }=[a_{n_{\ell }}^{\prime }, b_{0}]$, that may be divided into subintervals as follows

$$ \mathcal{H}^{\prime}_{\ell, j}:=[h^{\prime}_{\ell,j-1},h^{\prime}_{\ell,j}] $$

(6.29)

where the boundaries are given by $h^{'dm}_{\ell , j}:=2^{j}a_{n_{\ell }}^{'dm}$. Note that it is straightforward to show that Eq. 6.18 remains valid if we replace h_ℓ,j by $h^{\prime }_{\ell ,j}$. In particular, we still have $L(\ell )\leq 2\log {n_{\ell }}$, where L(ℓ) is now defined as

$$L(\ell):=\max\{j:h^{\prime}_{\ell,j}\leq 2b_{0}\}.$$

Recall that n_ℓ = 2^ℓ,ℓ ≥ 0, and set for, ℓ ≥ 1,

$$ \gamma_{\ell}=n_{\ell}/\log n_{\ell}. $$

(6.30)

For an arbitrary 𝜖 > 0, we truncate each function $\mathcal {G}$, either the envelope function as follows

where $\tilde {F}$ is the symmetric envelope function of the class $\mathcal {G}$ as defined in Eq. 6.4. u_n(g,h,t) can then also be decomposed for any n_ℓ− 1 < n ≤ n_ℓ since, from Eq. 2.11,

$$ \begin{array}{@{}rcl@{}} u_{n}(g,h,\mathbf{t})&=& \sqrt{n}\{U_{n}^{(m)}(\bar{G}^{(\ell)}_{g,h,\mathbf{t}})-\mathbb{E}U_{n}^{(m)}(\bar{G}^{(\ell)}_{g,h,\mathbf{t}} )\}+ \sqrt{n}\{U_{n}^{(m)}(\tilde{G}^{(\ell)}_{g,h,\mathbf{t}})\\&&-\mathbb{E}U_{n}^{(m)}(\tilde{G}^{(\ell)}_{g,h,\mathbf{t}} )\}\\ &=:& u_{n}^{(\ell)}(g,h,\mathbf{t})+\tilde{u}_{n}^{(\ell)}(g,h,\mathbf{t}). \end{array} $$

The term $u_{n}^{(\ell )}(g,h,\mathbf {t})$ will be called the truncated part and $\tilde {u}_{n}^{(\ell )}(g,h,\mathbf {t})$ the remainder part. To prove Theorem (2.6), we shall apply the Hoeffding decomposition to the truncated part and analyze each of the terms separately, while the remaining part can be treated directly using simple arguments based on standard inequalities. Note, for further use, that

$$ a_{n_{\ell}}^{'dm}=c^{dm}\gamma_{\ell}^{2/p-1}, ~~~~~~~\ell\geq 1. $$

(6.31)

6.5.1 Truncated Part

First, note that by Hoeffding decomposition (6.1), we need to consider the terms of

$$\sum\limits_{k=1}^{m} \binom{m}{k} U_{n}^{(k)}(\pi_{k}\overline{G}^{(\ell)}_{g,h, \mathbf{ t}}).$$

We shall start with the linear term in this decomposition. Following the same reasoning as in the previous section, we can show that $\pi _{1}\overline {G}^{(\ell )}_{g,h, \mathbf {t}}$ is a centered conditional expectation and that the first term of Eq. 6.1 can be written as an empirical process based on the sample (X₁,Y₁),…,(X_n,Y_n) and indexed by the class of functions

$$ \mathcal{S}_{\ell}^{\prime}:=\left\{S^{(\ell)}_{g,h,\mathbf{ t}}(\cdot, \cdot): g \in \mathscr{F}_{q}, h\in \mathcal{H}_{n_{\ell}}^{\prime}, \mathbf{ t}\in \mathbf{I} \subset \mathbb{X}\right\},$$

where $h\in {\mathscr{H}}_{n_{\ell }}^{\prime }$ was defined at the beginning of this section and where

$$S^{(\ell)}_{g,h,\mathbf{ t}}(x,y)=mh^{dm}\mathbb{E}\left( \overline{G}^{(\ell)}_{g,h, \mathbf{ t}} (\mathbf{ X},\mathbf{ Y})\mid (X_{1},Y_{1})=(x,y)\right).$$

To show that $ \mathcal {S}^{\prime }_{\ell }$ is a VC-class, introduce the class of functions of $(\mathbf {x}, \mathbf {y})\in \mathbb {R}^{dm}\times \mathbb {R}^{qm}$,

Since both $\mathcal {G}^{\prime }$ as defined below

$$ \mathcal{G}^{\prime}=\{h^{dm}\overline{G}_{g,h,\mathbf{ t}}(\cdot,\cdot):g \in \mathscr{F}_{q},{ 0<h<1},\mathbf{t} \in \mathbb{R}^{dm}\}, $$

(6.32)

and the class of functions of $\mathbf {y} \in \mathbb {R}^{qm}$ given by are of VC-type (and note that $\mathcal {I}$ has a bounded envelope function), we can apply Lemma A.1 in Einmahl and Mason (2000) to conclude that $\mathcal {C}$ is also of VC-type. Therefore, so is the class of functions $m \mathcal {C}^{(1)}$ on $\mathbb {R}^{d+q}$, where $\mathcal {C}^{(1)}$ consists of the π₁ -projections of the functions in the class $\mathcal {C}$. Thus, we see that $ \mathcal {S}^{\prime }_{\ell } \subset m \mathcal {C}^{(1)}$ and hence $ \mathcal {S}_{\ell }^{\prime }$ is of VC -type with the same characteristics as $m \mathcal {C}^{(1)}$. Now, to find an envelope function for $ \mathcal {S}_{\ell }^{\prime }$, set $\mathbf {t}_{j}:=\left (t_{1}, \ldots , t_{j-1}, t_{j+1}, \ldots , t_{m}\right ) \in \mathbb {R}^{d(m-1)}$ and Z_j(u) := (Z₁,…,Z_j− 1,u,Z_j+ 1,…, $Z_{m}) \in \mathbb {R}^{qm}$ for $u \in \mathbb {R}^{q}$ and $\mathbf {Z} \in \mathbb {R}^{qm}$. We can then rewrite the function $S_{g, h, \mathbf {t}}^{(\ell )}(x, y) \in \mathcal {S}_{\ell }^{\prime }$ as

where $\mathbf {X}^{*}=\left (X_{2}, \ldots , X_{m}\right ) \in \mathbb {R}^{d(m-1)}$ and where (with a little abuse of notation here) the product kernel in (K.iii) is now defined for d(m − 1) -dimensional vectors, that is, $\widetilde {K}(\mathbf {u})={\prod }_{i=1}^{m-1} K\left (u_{i}\right )$, $\mathbf {u} \in \mathbb {R}^{d(m-1)} $. Hence, we can bound $S_{g, h, \mathbf {t}}^{(\ell )}(x, y) \in \mathcal {S}_{\ell }^{\prime }$ simply as

$$ \begin{array}{@{}rcl@{}} \left|S_{g, h, \mathbf{t}}^{(\ell)}(x, y)\right|&\leq& K\left( \frac{t_{1}-x}{h}\right) \mathbb{E}\left[F\left( y, Y_{2}, \ldots, Y_{m}\right) \tilde{K}\left( \frac{\mathbf{t}_{1}-\mathbf{X}^{*}}{h}\right)\right] \\ &&+K\left( \frac{t_{2}-x}{h}\right) \mathbb{E}\left[F\left( Y_{2}, y, Y_{3}, \ldots, Y_{m}\right) \widetilde{K}\left( \frac{\mathbf{t}_{2}-\mathbf{X}^{*}}{h}\right) \right] \\ &&+\cdots+K\left( \frac{t_{m}-x}{h}\right) \mathbb{E}\left[F\left( Y_{2}, \ldots, Y_{m}, y\right) \widetilde{K}\left( \frac{\mathbf{t}_{m}-\mathbf{X}^{*}}{h}\right) \right]\\ &=:& G_{m}(x, y). \end{array} $$

We shall now apply the moment bound in Theorem 7.3 to the subclasses

$$ \mathcal{S}_{\ell, j}^{\prime}:=\left\{S_{g, h, \mathbf{t}}^{(\ell)}(\cdot, \cdot): g \in \mathcal{F}_{q}, h \in \mathcal{H}_{\ell, j}^{\prime}, \mathbf{t} \in \mathbb{R}^{dm}\right\}, \quad 1 \leq j \leq L(\ell), $$

where ${\mathscr{H}}_{\ell , j}^{\prime }$ was defined in Eq. 6.29. Since $ \mathcal {S}_{\ell , j}^{\prime } \subset \mathcal {S}_{\ell }^{\prime }$ for j = 1,…,L(ℓ), all of these subclasses are of VC-type, with the same envelope function and characteristics as the class $m \mathcal {C}^{(1)}$ (which is independent of ℓ), verifying (ii) in Theorem 7.3. For (i), recall that although all of the terms of the envelope function G_m(x,y) are different, their expectations are the same. Therefore, writing Y^∗ for $\left (Y_{2}, \ldots , Y_{m}\right )$ and applying Minkowski’s inequality followed by Jensen’s inequality, we obtain from assumption (2.10) the following upper bound for the second moment of the envelope function:

$$ \begin{array}{@{}rcl@{}} \mathbb{E} {G_{m}^{2}}(X, Y)\!\!\!\!&=&\!\!\!\! \kappa^{2 m} \mathbb{E}_{Y}\left\{\mathbb{E}_{\mathbf{Y}^{*}}\left[F\left( Y, Y_{2}, \ldots, Y_{m}\right)\right]\right.\\ &&\left.\!\!\!\!\!+\mathbb{E}_{\mathbf{Y}^{*}}\left[F\left( Y_{2}, Y, Y_{3}, \ldots, Y_{m}\right)\right]+\cdots+\mathbb{E}_{\mathbf{Y}^{*}}\left[F\left( Y_{2}, \ldots, Y_{m}, Y\right)\right]\right\}^{2} \\ &\leq &\!\!\!\! m^{2} \kappa^{2 m} \mathbb{E} F^{2}\left( Y_{1}, \ldots, Y_{m}\right) \\ &\leq &\!\!\!\! m^{2} \kappa^{2 m} \mu_{p}^{2 / p}. \end{array} $$

Note, further, that by the symmetry of $\widetilde {F}$,

so that Jensen’s inequality, the change of variable u = (t −x)/h and the assumption in Eq. 2.10 give the following upper bound for the second moment of any function in $ \mathcal {S}_{\ell }^{\prime }$:

(6.33)

Therefore, with $\beta \equiv m \mu _{p}^{1 / p}\left (\kappa ^{m} \vee C_{2,\mathbb {P},K,\varepsilon }^{1/2}\right )$, our previous calculations give us that

$$ \mathbb{E} {G_{m}^{2}}(X, Y) \leq \beta^{2} \quad \text { and } \quad \sup_{S \in \mathcal{S}_{\ell, j}^{\prime}} \mathbb{E} S^{2}(X, Y) \leq \beta^{2} h_{\ell, j}^{\prime m(d_{\text{vol}}-\varepsilon) }=: \sigma_{\ell, j}^{2}, $$

verifying condition (iii) as well. Finally, recall from Eq. 6.4 that since $\mathcal {G}$ has envelope function $\widetilde {F}(\mathbf {y})$, it holds for all $x, y \in \mathbb {R}^{d+q}$ that

so that by taking ε > 0 small enough, Theorem 7.3 is now applicable. Thus, for an absolute constant $A_{1}<\infty $, we have

$$ \begin{array}{@{}rcl@{}} \mathbb{E}\left\|\sum\limits_{i=1}^{n_{\ell}} \epsilon_{i} S\left( X_{i}, Y_{i}\right)\right\|_{\mathcal{S}_{\ell, j}^{\prime}} & \leq &A_{1} \sqrt{n_{\ell} h_{\ell, j}^{\prime {m}(d_{\text{vol}}-\varepsilon)}\left|\log h_{\ell, j}^{\prime}\right|} \\ & \leq& A_{1} \sqrt{n_{\ell} h_{\ell, j}^{\prime {m}(d_{\text{vol}}-\varepsilon)}\left( \left|\log h_{\ell, j}^{\prime}\right| \vee \log \log n_{\ell}\right)} \\ &=:& A_{1} \lambda_{j}^{\prime}(\ell), \end{array} $$

(6.34)

where $\epsilon _{1}, \ldots , \epsilon _{n_{\ell }}$ are independent Rademacher variables, independent of $\left (X_{i}, Y_{i}\right ), 1 \leq i \leq n_{\ell }$. Consequently, applying the exponential inequality of Talagrand (1994) to the class $ \mathcal {S}_{\ell , j}^{\prime }$ (see Theorem 7.5 in the Appendix) with $M=m \varepsilon \gamma _{\ell }^{1 / p}, \sigma _{\mathcal {S}_{l, i}^{\prime }}^{2}=\beta ^{2} h_{\ell , j}^{\prime m(d_{\text {vol}}-\varepsilon )}$ and the moment bound in Eq. 6.34, we get, for an absolute constant $A_{2}<\infty $ and all t > 0, that

$$ \begin{array}{@{}rcl@{}} \lefteqn{\mathbb{P}\left\{\max_{n_{\ell-1}<n \leq n_{\ell}}\left\|\sqrt{n} \alpha_{n}\right\|_{\mathcal{S}_{\ell, j}^{\prime}} \geq C_{1}\left( A_{1} \lambda_{j}^{\prime}(\ell)+t\right)\right\}} \\ &\leq& 2\left[\exp \left( -\frac{A_{2} t^{2}}{n_{\ell} \beta^{2} h_{\ell, j}^{\prime m(d_{\text{vol}}-\varepsilon)}}\right)+\exp \left( -\frac{A_{2} t}{m \varepsilon \gamma_{\ell}^{1 / p}}\right)\right]. \end{array} $$

(6.35)

Regarding the application of this inequality with $t=\rho \lambda _{j}^{\prime }(\ell ), \rho >1$, note that it clearly follows from Eq. 6.31 and the definitions of $h_{\ell , j}^{\prime }$ as in Eq. 6.29, γ_ℓ as in Eq. 6.31 and $\lambda _{j}^{\prime }(\ell )$ as in Eq. 6.34 that for all j ≥ 0,

$$ \begin{array}{@{}rcl@{}} \frac{\lambda_{j}^{\prime 2}(\ell)}{n_{\ell} h_{\ell, j}^{\prime m(d_{\text{vol}}-\varepsilon)}}=\left|\log h_{\ell, j}^{\prime}\right| \vee \log \log n_{\ell} \geq \log \log n_{\ell}, \\ \frac{\lambda_{j}^{\prime 2}(\ell)}{\gamma_{\ell}^{2 / p}}=2^{j} c^{dm} h_{\ell, j}^{\prime m(d_{\text{vol}}-d-\varepsilon)} \log n_{\ell}\left( \left|\log h_{\ell, j}^{\prime}\right| \vee \log \log n_{\ell}\right) \geq c^{dm}\left( \log \log n_{\ell}\right)^{2}. \end{array} $$

Consequently, Eq. 6.35, when applied with $t=\rho \lambda _{j}^{\prime }(\ell )$ and any ρ > 1 with ℓ large enough, yields, for suitable constants $A_{2}^{\prime }, A_{2}^{\prime \prime }$ and A₃, the inequality

$$ \begin{array}{@{}rcl@{}} \lefteqn{\mathbb{P}\left\{\max_{n_{\ell-1}<n \leq n_{\ell}}\left\|\sqrt{n} \alpha_{n}\right\|_{\mathcal{S}_{\ell, j}^{\prime}} \geq C_{1}\left( A_{1}+\rho\right) \lambda_{j}^{\prime}(\ell)\right\}} \\ &\leq& 2\left[\exp \left( -A_{2}^{\prime} \rho^{2} \log \log n_{\ell}\right)+\exp \left( -A_{2}^{\prime \prime} \rho \log \log n_{\ell}\right)\right] \\ &\leq& 4\left( \log n_{\ell}\right)^{-A_{3} \rho}. \end{array} $$

(6.36)

Keeping in mind that $m h^{dm} \sqrt {n} U_{n}^{(1)}\left (\pi _{1} \bar {G}_{g, h, \mathrm {t}}^{(\ell )}\right )$ is the empirical process α_n $\left (S_{g, h, \mathrm {t}}^{(\ell )}\right )$ indexed by the class $ \mathcal {S}_{\ell }^{\prime }$ and recalling Eq. 6.18, since d_vol − ε < d, we obtain, for ℓ ≥ 1, that

$$ \begin{array}{@{}rcl@{}} \max_{n_{\ell-1}<n \leq n_{\ell}} A_{n, \ell}^{\prime} &:=&\max_{n_{\ell-1}<n \leq n_{\ell}} \sup_{{a_{n}^{\prime} \leq h \leq b_{0}}} \sup_{g \in \mathcal{F}} \sup_{\mathbf{t} \in \mathbb{R}^{dm}}\\&& \frac{m \sqrt{n h^{m(2d-d_{\text{vol}}+\varepsilon)}}\left|U_{n}^{(1)}\left( \pi_{1} \bar{G}_{g, h, \mathbf{t}}^{(\ell)}\right)\right|}{\sqrt{|\log h| \vee \log \log n}} \\ & \leq& \max_{n_{\ell-1}<n \leq n_{\ell}} \max_{\leq j \leq L(\ell)} \sup_{h \in \mathcal{H}_{\ell, j}^{\prime}} \sup_{g \in \mathcal{F}} \sup_{\mathbf{t} \in \mathbb{R}^{dm}} \\&&\frac{2 \sqrt{2^{\frac{d_{\text{vol}}-\varepsilon}{d}}}\left|\sqrt{n} \alpha_{n}\left( S_{g, h, \mathbf{t}}^{(\ell)}\right)\right|}{\sqrt{n_{\ell} h_{\ell, j}^{\prime m(d_{\text{vol}}-\varepsilon)}\left( \left|\log h_{\ell, j}^{\prime}\right| \vee \log \log n_{\ell}\right)}} \\ & \leq& \max_{n_{\ell-1}<n \leq n_{\ell}} \max_{\leq j \leq L(\ell)} \sup_{H \in \mathcal{S}_{\ell, j}^{\prime}} \frac{3\left|\sqrt{n} \alpha_{n}(H)\right|}{\lambda_{j}^{\prime}(\ell)}. \end{array} $$

Consequently, recalling once again that $L(\ell ) \leq 2 \log n_{\ell }$, we can infer from Eq. 6.36 that for some constant $C_{5}(\rho ) \geq 3 C_{1}\left (A_{1}+\rho \right )$,

$$ \begin{array}{@{}rcl@{}} &&\mathbb{P}\left\{\max_{n_{\ell-1}<n \leq n_{\ell}} A_{n, \ell}^{\prime}>C_{5}(\rho)\right\} \\& \leq & \sum\limits_{j=1}^{L(\ell)} \mathbb{P}\left\{\max_{n_{\ell-1}<n \leq n_{\ell}}\left\|\sqrt{n} \alpha_{n}\right\|_{\mathcal{S}_{\ell, j}^{\prime}}>C_{1}\left( A_{1}+\rho\right) \lambda_{j}^{\prime}(\ell)\right\} \\ & \leq& 8\left( \log n_{\ell}\right)^{1-A_{3} \rho}. \end{array} $$

The Borel-Cantelli lemma, when combined with this inequality for ρ ≥ (2 + δ)/A₃,δ > 0 and with the choice n_ℓ = 2^ℓ, establishes, for some $C^{\prime }<\infty $ and with probability 1, that

$$ \underset{\ell \rightarrow \infty}{\limsup} \max_{n_{\ell-1}<n \leq n_{\ell}} \sup_{{a_{n}^{\prime} \leq h \leq b_{0}}} \sup_{g \in \mathcal{F}} \sup_{\mathbf{t} \in \mathbb{R}^{dm}} \frac{m \sqrt{n h^{m(2d-d_{\text{vol}}+\varepsilon)}}\left|U_{n}^{(1)}\!\left( \pi_{1} \bar{G}_{g, h, \mathbf{t}}^{(\ell)}\right)\right|}{\sqrt{|\log h| \vee \log \log n}} \!\leq\! C^{\prime}, $$

(6.37)

this achieves the control of the first term in Eq. 6.1. We now treat the nonlinear terms. The purpose is to prove that, for k = 2,…,m and with probability 1, all of the other terms of Eq. 6.1 are asymptotically bounded or go to zero at the proper rate, that is

$$ \max_{n_{\ell-1}<n \leq n_{\ell}} \sup_{{a_{n}^{\prime} \leq h \leq b_{0}}} \sup_{g \in \mathcal{F}} \sup_{\mathbf{t} \in \mathbb{R}^{dm}} \frac{\sqrt{n h^{m(2d-d_{\text{vol}}+\varepsilon)}}\left|U_{n}^{(k)}\!\left( \pi_{k} \bar{G}_{g, h, \mathbf{t}}^{(\ell)}\right)\right|}{\sqrt{|\log h| \vee \log \log n}} = O\left( \gamma_{\ell}^{1-k / 2}\right). $$

(6.38)

By following the same reasoning as in the bounded case, we define some classes of functions on $\mathbb {R}^{dm} \times \mathbb {R}^{dm}$ and $\mathbb {R}^{dk} \times \mathbb {R}^{dk}$,

$$ \begin{array}{@{}rcl@{}} \mathcal{G}_{\ell, j}^{\prime}:=\left\{h^{m} \bar{G}_{g, h, \mathbf{t}}^{(\ell)}(\cdot, \cdot): g \in \mathcal{F}, h \in \mathcal{H}_{\ell, j}^{\prime}, {\mathbf{t} \in \mathbb{R}^{dm}}\right\} ,\\ \mathcal{G}_{\ell, j}^{\prime(k)}:=\left\{h^{m}\left( \pi_{k} \bar{G}_{g, h, \mathbf{t}}^{(\ell)}\right)(\cdot, \cdot) /\left( 2^{k} \varepsilon \gamma_{\ell}^{1 / p}\right): g \in \mathcal{F}, h \in \mathcal{H}_{\ell, j}^{\prime},{ \mathbf{t} \in \mathbb{R}^{dm}}\right\} . \end{array} $$

It is then easily verified that these classes are of VC-type with characteristics that are independent of ℓ and with envelope functions $\widetilde {F}$ and $\left (2^{k} \varepsilon \gamma _{\ell }^{1 / p}\right )^{-1} F_{k}$, respectively. The function $\widetilde {F}$ is defined as in Eq. 6.4 and F_k is determined just as in the proof of Theorem 1 of Giné and Mason (2007a). Note that just as in Eqs. 6.19 and 6.21, by setting

$$ \mathcal{U}_{n}^{\prime}(j, k, \ell):=\sup_{H \in \mathcal{G}_{\ell, j}^{\prime(k)}}\left|\frac{1}{n_{\ell}^{k / 2}} \sum\limits_{\mathbf{i} \in {I_{n}^{k}}} H\left( \mathbf{X}_{\mathbf{i}}, \mathbf{Y}_{\mathbf{i}}\right)\right|, \quad n_{\ell-1}<n \leq n_{\ell}, $$

we see that for all k = 2,…,m and n_ℓ− 1 < n ≤ n_ℓ,

$$ \mathcal{U}_{n}^{\prime}(j, k, \ell) \leq n_{\ell}^{k / 2}\left\|U_{n}^{(k)}\left( \pi_{k} G\right)\right\|_{\mathcal{G}_{\varepsilon, j}^{(k)}}. $$

Consequently, applying Theorem 7.2 with c = 1/2 and r = 2 gives us precisely (6.20) with $\mathcal {U}_{n}(j, k, \ell )$ and $\mathcal {U}_{n_{\ell }}(j, k, \ell )$ replaced by $\mathcal {U}_{n}^{\prime }(j, k, \ell )$ and $\mathcal {U}_{n_{\ell }}^{\prime }(j, k, \ell )$, respectively. Therefore, the same methodology as in the bounded case will be applied. Note also that, as held for all the functions in $\mathcal {G}_{\ell , j}^{(k)}$, the functions in $\mathcal {G}_{\ell , j}^{\prime (k)}$ are bounded by 1 and have second moments that can be bounded by $h^{m(d_{\text {vol}}-\varepsilon )} D_{m, k}$ for a suitable D_m,k (by arguing as in Eqs. 6.33 and 6.22). Hence, the expression in Eq. 6.22 is also satisfied for functions in $\mathcal {G}_{\ell , j}^{\prime (k)}$, that is,

$$ \sup_{H \in \mathcal{G}_{\ell, j}^{\prime(k)}} \mathbb{E} H^{2}(\mathbf{X}, \mathbf{Y}) \leq D_{m, k} h_{\ell, j}^{\prime m(d_{\text{vol}}-\varepsilon)}=: \sigma_{\ell, j}^{\prime 2}. $$

Thus, all the conditions for Theorems 7.4 and 7.6 are satisfied so that, after some obvious identifications and modifications, the second part of the proof of Theorem 2.5 (and Eq. 6.26 in particular) gives us, for some C_7,k > 0, all j = 1,…,L(ℓ) and any ρ > 2,

$$ \mathbb{P}\left\{\max_{n_{\ell-1}<n \leq n_{\ell}} \mathcal{U}_{n}^{\prime}(j, k, \ell)>2 \rho^{k / 2} y^{\prime *}\right\} \leq \sqrt{h_{\ell, j}^{\prime m(d_{\text{vol}}-\varepsilon)}}\left( \log n_{\ell}\right)^{-\rho C_{7, k}}, $$

(6.39)

with $y^{\prime *}=C_{1, k}^{\prime } \lambda _{j, k}^{\prime }(\ell )$ for some $C_{1, k}^{\prime }>0$ and where $\lambda _{j, k}^{\prime }(\ell )$ is defined as in Eq. 6.24 with h_ℓ,j replaced by $h_{\ell , j}^{\prime }$, that is,

$$ \lambda_{j, k}^{\prime}(\ell)=\left( \left|\log h_{\ell, j}^{\prime}\right| \vee \log \log n_{\ell}\right)^{k / 2}. $$

Now, to finish the proof of Eq. 6.38, note that, similarly to Eq. 6.27, for some C_k > 0, for n_ℓ− 1 < n ≤ n_ℓ

$$ n_{\ell}^{k / 2}\left\|U_{n}^{(k)}\left( \pi_{k} G\right)\right\|_{\mathcal{G}_{\ell, j}^{\prime}} \leq 2^{k} C_{k} \varepsilon \gamma_{\ell}^{1 / p} \mathcal{U}_{n}^{\prime}(j, k, \ell) . $$

This gives that for some c_k > 0,

$$ \begin{array}{@{}rcl@{}} &&\max_{n_{\ell-1}<n \leq n_{\ell}} A_{n, \ell, k}^{\prime} \\&:=&\max_{n_{\ell-1}<n \leq n_{\ell}} \sup_{{a_{n}^{\prime} \leq h \leq b_{0}}} \sup_{g \in \mathcal{F}} \sup_{\operatorname{t} \in \mathbb{R}^{dm}} \frac{\sqrt{n h^{m(2d-d_{\text{vol}}+\varepsilon)}}\left|U_{n}^{(k)}\left( \pi_{k} \bar{G}_{g, h, \mathbf{t}}^{(\ell)}\right)\right|}{\sqrt{(|\log h| \vee \log \log n)^{k}}} \\ & \leq& \frac{2^{k} c_{k} \varepsilon \gamma_{\ell}^{1 / p}}{\sqrt{n_{\ell}^{k-1}}} \max_{n_{\ell-1}<n \leq n_{\ell}} \max_{\leq j \leq L(\ell)} \frac{\mathcal{U}_{n}^{\prime}(j, k, \ell)}{\lambda_{j, k}^{\prime}(\ell)} \\ &\leq& \frac{2^{k} c_{k} \varepsilon \gamma_{\ell}^{1 / p}}{\sqrt{a_{n_{\ell}}^{\prime md} n_{\ell}^{k-1}}} \max_{n_{\ell-1}<n \leq n_{\ell}} \max_{\leq j \leq L(\ell)} \frac{\mathcal{U}_{n}^{\prime}(j, k, \ell)}{\lambda_{j, k}^{\prime}(\ell)}. \end{array} $$

From Eq. 6.31, we now see that

$$\gamma_{\ell}^{2 / p} / a_{n \ell}^{\prime md} n_{\ell}^{k-1}=c^{-m} n_{\ell}^{2-k} / \log n_{\ell} .$$

By the fact that $\log n / n^{2-k}$ is monotone increasing in n ≥ 2 whenever k ≥ 2, so that for some constant C_8,k > 0, we infer that

$$ \begin{array}{@{}rcl@{}} \lefteqn{\mathbb{P}\left\{\max\limits_{n_{\ell-1}<n \leq n_{\ell}} \sqrt{\frac{\log n}{n^{2-k}}} A_{n, \ell, k}^{\prime}>C_{8, k}\right\}} \\ &\leq& \mathbb{P}\left\{\max\limits_{n_{\ell-1}<n \leq n_{\ell}} \max\limits_{1 \leq j \leq L(\ell)}\frac{U_{n}^{\prime}(j, k, \ell)}{\lambda_{j, k}^{\prime}(\ell)}>\frac{C_{8, k}}{2^{k} c_{k} \varepsilon \gamma_{\ell}^{1 / p}} \sqrt{\frac{n_{\ell}^{2-k} a_{n \ell}^{\prime m} n_{\ell}^{k-1}}{\log n_{\ell}}}\right\}\\ &\leq& \sum\limits_{j=1}^{L(\ell)} \mathbb{P}\left\{\max_{n_{\ell-1}<n \leq n_{\ell}} \mathcal{U}_{n}^{\prime}(j, k, \ell)>\frac{C_{8, k} c^{m / 2}}{2^{k} c_{k} \varepsilon} \lambda_{j, k}^{\prime}(\ell)\right\}. \end{array} $$

Therefore, by choosing $C_{8, k}>2^{k+1} c^{-m / 2} \varepsilon c_{k} C_{1, k}^{\prime }\left ((2+\delta ) / C_{7, k}\right )^{k / 2}$ and noting that by definition $L(\ell ) \leq 2 \log n_{\ell }$ and $h_{\ell , j}^{\prime }<2$ for all j = 1,…,L(ℓ), we can infer from Eq. 6.39 with

$$ \rho=(2+\delta) / C_{7, k}, $$

that

$$ \begin{array}{@{}rcl@{}} &\mathbb{P} &\left\{\max_{n_{\ell-1}<n \leq n_{\ell}} \sqrt{\frac{\log n}{n^{2-k}}} A_{n, \ell, k}^{\prime}>C_{8, k}\right\} \\ & \leq& \sum\limits_{j=1}^{L(\ell)} \mathbb{P}\left\{\max_{n_{\ell-1}<n \leq n_{\ell}} \mathcal{U}_{n}^{\prime}(j, k, \ell)>2\left( \frac{2+\delta}{C_{7, k}}\right)^{k / 2} C_{1, k}^{\prime} \lambda_{j, k}^{\prime}(\ell)\right\} \\ &=&\sum\limits_{j=1}^{L(\ell)} \mathbb{P}\left\{\max_{n_{\ell-1}<n \leq n_{\ell}} \mathcal{U}_{n}^{\prime}(j, k, \ell)>2\left( \frac{2+\delta}{C_{7, k}}\right)^{k / 2} y^{\prime *}\right\} \\ & \leq& L(\ell) \sqrt{h_{\ell, j}^{\prime m}}\left( \log n_{\ell}\right)^{-\rho C_{7, k}} \\ & \leq& 2 \sqrt{2^{m}}\left( \log n_{\ell}\right)^{-(1+\delta)}. \end{array} $$

This immediately implies, via Borel-Cantelli, that for all k = 2,…,m and ℓ ≥ 1,

$$ \begin{array}{@{}rcl@{}} &&\max_{n_{\ell-1}<n \leq n_{\ell}} \sup_{a_{n}^{\prime} \leq h \leq b_{0}} \sup_{g \in \mathcal{F}} \sup_{\mathbf{t} \in \mathbb{R}^{dm}} \frac{\sqrt{n h^{m(2d-d_{\text{vol}}+\varepsilon)}}\left|U_{n}^{(k)}\left( \pi_{k} \bar{G}_{g, h, \mathbf{t}}^{(\ell)}\right)\right|}{\sqrt{(|\log h| \vee \log \log n)^{k}}}\\&&=O\left( \sqrt{\frac{n_{\ell}^{2-k}}{\log n_{\ell}}}\right) \end{array} $$

a.s., which obviously implies Eq. 6.38. Finally, recalling the Hoeffding decomposition (6.1), this implies, together with Eq. 6.37, that for some $C^{\prime \prime }>0$ with probability 1,

$$ \begin{array}{@{}rcl@{}} &&\limsup_{\ell \rightarrow \infty} \max_{n_{\ell-1}<n \leq n_{\ell}} \sup_{a_{n}^{\prime} \leq h \leq b_{0}} \sup_{g \in \mathcal{F}} \sup_{\mathbf{t} \in \mathbb{R}^{dm}} \\&&\frac{\sqrt{n h^{m(2d-d_{\text{vol}}+\varepsilon)}}\left|U_{n}^{(m)}\left( \bar{G}_{g, h, \mathbf{t}}^{(\ell)}\right)-\mathbb{E} U_{n}^{(m)}\left( \bar{G}_{g, h, \mathbf{t}}^{(\ell)}\right)\right|}{\sqrt{|\log h| \vee \log \log n}} \leq C^{\prime \prime} .\end{array} $$

(6.40)

6.6 Remainder Part

Consider now the remainder process $\widetilde {u}_{n}^{(\ell )}(g, h, \mathbf {t})$ based on the unbounded (symmetric) U -kernel given by

where we defined γ_ℓ as in Eq. 6.30. We shall show that this U-process is asymptotically negligible at the rate given in Theorem 2.6. More precisely, we shall prove that as $\ell \rightarrow \infty $,

$$ \begin{array}{@{}rcl@{}} &&\max_{n_{\ell-1}<n\leq n_{\ell}, \alpha^{\prime}<h \leq b_{0}} \sup_{g \in \mathcal{F}} \sup_{\mathfrak{t} \in \mathbb{R}^{dm}} \frac{\sqrt{n h^{m(2d-d_{\text{vol}}+\varepsilon)}}\left|U_{n}^{(m)}\!\left( \widetilde{G}_{g, h, \mathbf{t}}^{(\ell)}\right) - \mathbb{E} U_{n}^{(m)}\!\left( \widetilde{G}_{g, h, \mathbf{t}}^{(\ell)}\right)\right|}{\sqrt{|\log h| \vee \log \log n}}\\&&=o(1)~~a.s. \end{array} $$

(6.41)

Recall that for all $g \in \mathcal {F}, h \in \left [a_{n}^{\prime }, b_{0}\right ]$ and $\mathbf {t}, \mathbf {x} \in \mathbb {R}^{m}, \widetilde {F}(\mathbf {y}) \geq h^{m}\left |\bar {G}_{g, h, \mathbf {t}}(\mathbf {x}, \mathbf {y})\right |$, so from the symmetry of $\widetilde {F}$, it holds that

where is a U -statistic based on the positive and symmetric kernel . Recalling that $a_{n}^{\prime md}=c^{md}$ $(\log n / n)^{1-2 / p}$, we obtain easily that for all g ∈ $\mathcal {F}, h \in \left [a_{n}^{\prime }, b_{0}\right ], \mathbf {t} \in \mathbb {R}^{m}$ and some C > 0

Arguing in the same way, since a U -statistic is an unbiased estimator of its kernel, we get that, uniformly in $g \in \mathcal {F}, h \in \left [a_{n}^{\prime }, b_{0}\right ]$ and $\mathbf {t} \in \mathbb {R}^{m}$,

(6.42)

From Eq. 6.42, we see that as $\ell \rightarrow \infty $,

$$ \max_{1<n \leq n_{\ell}} \sup_{a_{n}^{\prime} \leq h \leq b_{0}} \sup_{g \in \mathcal{F}} \sup_{\mathbf{t} \in \mathbb{R}^{dm}} \frac{\sqrt{n h^{m(2d-d_{\text{vol}}+\varepsilon)}}\left|\mathbb{E} U_{n}^{(m)}\left( \widetilde{G}_{g, h, \mathbf{t}}^{(\ell)}\right)\right|}{\sqrt{|\log h| \vee \log \log n}}=o(1). $$

(6.43)

Thus, to finish the proof of Eq. 6.41, it suffices to show that

(6.44)

First, note that from Chebyshev’s inequality and a well-known inequality for the variance of a U-statistic (see Theorem 5.2 of Hoeffding (1948)), we get, for any δ > 0,

(6.45)

Next, in order to establish the finite convergence of the series of the above probabilities, we split the indicator function into two distinct parts determined by whether $\widetilde {F}(\mathbf {Y})>$ $n_{\ell }^{1 / p}$ or $\varepsilon \gamma _{\ell }^{1 / p}<\widetilde {F}(\mathbf {Y}) \leq n_{\ell }^{1 / p}$, and consider the corresponding second moments in Eq. 6.45 separately. In the first case, note that, from Eqs. 2.10 and 6.4, $\mathbb {E} \widetilde {F}^{p}(\mathbf {Y}) \leq \mu _{p} \kappa ^{p m}(m !)^{p}$ and observe that since p > 2 and n_ℓ = 2^ℓ,

To handle the second case, we shall need the following fact from Einmahl and Mason (2000).

Fact 6.1.

Let $\left (c_{n}\right )_{n \geq 1}$ be a sequence of positive constants such that c_n/n^1/s $\nearrow \infty $ for some s > 0 and let Z be a random variable satisfying

$$\sum\limits_{n=1}^{\infty} \mathbb{P}\left\{|Z|>c_{n}\right\}<\infty .$$

We then have, for any q > s,

Notice that for any p < r ≤ 2p,

Now, set $Z=\widetilde {F}(\mathbf {Y}), c_{n}=n^{1 / p}$ and q = r in Fact 6.1 and note that $c_{n} / n^{1 / s} \nearrow \infty $ for any s such that q = r > s > p. Since q = r > s, we can conclude from Fact 6.1 that this last bound is finite. Finally, note that the bound leading to Eq. 6.45 implies that

Consequently, the above results, together with Eq. 6.45, imply via Borel-Cantelli and the arbitrary choice of δ > 0 that Eq. 6.44 holds, which, when combined with Eqs. 6.43 and 6.45, completes the proof of Eq. 6.41. This also completes the proof of Theorem 2.6 since we have already established the result in Eq. 6.40.

6.7 Proof of Theorem 2.7

Theorem 2.7 is essentially a consequence of Theorem 7.7, details are similar to the proof of Dony and Mason (2008) and therefore omitted.

6.8 Proof of Corollary 2.8

We now turn to the proof of Corollary 2.8. We observe the following standard inequalities

$$ \begin{array}{@{}rcl@{}} | \widehat{r}_{n}^{(m)}(\varphi,\mathbf{t};h)-\widehat{\mathbb{E}} \widehat{r}_{n}^{(m)}(\varphi,\mathbf{t};h)| \!\!\!\!&=&\!\!\!\!\left|\frac{U_{n}(\varphi, h, \mathbf{t})}{U_{n}(1, h, \mathbf{t})}-\frac{\mathbb{E} U_{n}(\varphi, h, \mathbf{t})}{\mathbb{E} U_{n}(1, h, \mathbf{t})}\right| \\ &\leq &\!\!\!\! \frac{\left|U_{n}(\varphi, h, \mathbf{t})-\mathbb{E} U_{n}(\varphi, h, \mathbf{t})\right|}{\left|U_{n}(1, h, \mathbf{t})\right|} \\ &&\!\!\!\!+\frac{\left|\mathbb{E} U_{n}(\varphi, h, \mathbf{t})\right| \cdot\left|U_{n}(1, h, \mathbf{t})-\mathbb{E} U_{n}(1, h, \mathbf{t})\right|}{\left|U_{n}(1, h, \mathbf{t})\right| \cdot\left|\mathbb{E} U_{n}(1, h, \mathbf{t})\right|} \\ &=: &\!\!\!\!(\mathrm{I})+(\mathbb{I}) . \end{array} $$

We can infer from Theorem 7.7 that

$$ \sup_{a_{n} \leq h<b_{n}} \sup_{\mathbf{t} \in \mathbf{I}^{m}}\left|\mathbb{E} U_{n}(1, h, \mathbf{t})-\widetilde{f}(\mathbf{t})\right|\rightarrow 0. $$

(6.46)

Then, from Theorem 2.5, Eq. 6.46 and f_X(⋅) bounded away from zero on J, we get, for some xi₁,xi₂ > 0 and c large enough in $a_{n}=c(\log n / n)^{1 /d m}$,

$$ \liminf_{n \rightarrow \infty} \sup_{a_{n} \leq h<b_{n}} \sup_{\mathbf{t} \in \mathbf{I}^{m}}\left|U_{n}(1, h, \mathbf{t})\right|=\mathbf{x}i_{1}>0 \quad \text { a.s., } $$

and, for n large enough,

$$ \sup_{a_{n} \leq h<b_{n}} \sup_{\mathbf{t} \in \mathbf{I}^{m}}\left|\mathbb{E} U_{n}(1, h, \mathbf{t})\right|=\mathbf{x}i_{2}>0. $$

Further, for $a_{n}^{\prime }$ equalling either ℓ_n or a_n, we readily obtain from the assumptions (2.8) or (2.10) on the envelope function that

$$ \sup_{a_{n}^{\prime} \leq h<b_{n}}\sup_{\varphi\in \mathscr{F}_{q}}\sup_{\mathbf{t} \in \mathbf{I}^{m}}\left| \mathbb{E} U_{n}(\varphi, h, \mathbf{t})\right| =O(1) \text {. } $$

Hence, we can now use Theorem 2.5 to handle ($\mathbb {I}$), while for (I), depending on whether the class ${\mathscr{F}}_{q}$ satisfies Eq. 2.8 or 2.10, we apply Theorem 2.5 or Theorem 2.6, respectively. Taking everything together, we conclude that for c large enough and some $\mathfrak {C}_{3}>0$, with probability 1,

$$ \begin{array}{@{}rcl@{}} &&\!\!\!\!{\limsup_{n \rightarrow \infty} \sup_{a_{n}^{\prime} \leq h<b_{n}} \sup_{\varphi \in \mathscr{F}_{q}} \sup_{\mathbf{t} \in \mathbf{I}^{m}} \frac{\sqrt{n l_{n}^{m(2d-d_{\text{vol}}+\varepsilon)}}\left| \widehat{r}_{n}^{(m)}(\varphi,\mathbf{t};h)-\widehat{\mathbb{E}} \widehat{r}_{n}^{(m)}(\varphi,\mathbf{t};h)\right|}{\sqrt{{|\log h|\vee (\log (2/\delta)\vee \log \log n)}}}}\\ &\leq&\!\!\!\!\limsup_{n \rightarrow \infty} \sup_{a_{n}^{\prime} \leq h<b_{n}} \sup_{\varphi \in \mathscr{F}_{q}} \sup_{\mathbf{t} \in \mathbf{I}^{m}} \frac{\sqrt{nh^{{m(2d-d_{\text{vol}}+\varepsilon)}}}(\mathrm{I})}{\sqrt{|\log h |\vee \log\log n}}\\ &&+\limsup_{n \rightarrow \infty} \sup_{a_{n}^{\prime} \leq h<b_{n}} \sup_{\varphi \in \mathscr{F}_{q}} \sup_{\mathbf{t} \in \mathbf{I}^{m}}\frac{\sqrt{nl_{n}^{m(2d-d_{\text{vol}}+\varepsilon)}}(\mathbb{I})}{\sqrt{|\log l_{n}|\vee \log (2/\delta)}} \\ &\leq&\!\!\!\! \mathfrak{C}_{3}. \end{array} $$

We readily obtain the assertion of the theorem by choosing appropriately δ. $\Box $

6.9 Proof of Proposition 3.2

Recalling the Definition 3.1 of

it is obvious that Φ_ψ is uniformly bounded, in $(y_{1},\ldots ,y_{m}, c_{1},\ldots ,c_{m})\in \mathbb {R}^{2m}$ and $\psi \in {\mathscr{F}}$, since ${\mathscr{F}}$ is uniformly bounded, ψ(t) = 0 for all t > τ and G(τ) < 1. This property, when combined with the VC property of ${\mathscr{F}}$, ensures that the class of function

$$\mathscr{F}_{\Phi} : =\{\mathbf{ {\Phi}}_{\psi} : \psi \in \mathscr{F}\}$$

verifies Eqs. F.ii, F.iii. Similarly, it can be shown that ${\mathscr{F}}_{\Phi }$ is a pointwise measurable class of functions (F.i). Moreover, by (A.3) and Eq. 3.2, the class

$$\mathcal{M}_{\mathbf{\Phi}}:=\{m_{\mathbf{\Phi}_{\psi}} f_{\mathbf{X}} :\psi \in \mathscr{F}\}$$

is almost surely relatively compact concerning the sup- norm topology on $\mathcal I_{\alpha }$. So we can apply Theorem 2.8 with Y = (Y,C) and Ψ = Φ_ψ. The result of Proposition 3.2 is straightforward. $\Box $

Lemma 6.2.

Under assumptions of Theorem 3.1, we have with probability one,

$$ \sup\limits_{h \geq l_{n}}\sup\limits_{\mathbf{ t}\in\mathcal{I}^{m}}\sup\limits_{ \psi\in\mathscr{F}}\left|\breve{r}_{n}^{(m)*}(\psi,\mathbf{t};h) - \breve{r}_{n}^{(m)}(\psi,\mathbf{t};h)\right| = o\left( \sqrt{\frac{\log(1/h)}{nh^{d}}}\right) \text{as} n\rightarrow \infty. $$

(6.47)

6.10 Proof of Lemma 6.2

Recall the following useful lemma.

Lemma 6.3.

Let a_i, i = 1,…,k, b_i i = 1,…,k be real number

$$ \prod\limits_{i=1}^{k}a_{i}-\prod\limits_{i=1}^{k}b_{i}=\sum\limits_{i=1}^{k}(a_{i}-b_{i})\prod\limits_{j=1}^{i-1}b_{j}\prod\limits_{h=1+i}^{k}a_{h}. $$

An application of the preceding lemma gives

$$ \begin{array}{@{}rcl@{}} \lefteqn{\sup\limits_{h \geq l_{n}}\sup\limits_{\mathbf{ t}\in\mathcal{I}^{m}}\sup\limits_{\psi\in\mathscr{F}}\left|\breve{r}_{n}^{(m)*}(\psi,\mathbf{t};h)-\breve{r}_{n}^{(m)}(\psi,\mathbf{t};h)\right|}\\&\!\!\!\!=&\!\!\!\!\sup\limits_{h \geq l_{n}}\sup\limits_{\mathbf{ t}\in\mathcal{I}^{m}}\sup\limits_{\psi\in\mathscr{F}}\left| \sum\limits_{(i_{1},\ldots,i_{m})\in I(m,n)}\displaystyle \delta_{i_{1}}\cdots\delta_{i_{m}}\psi({ Z}_{i_{1}},\ldots,{ Z}_{i_{m}}) \overline{\omega}_{n,K,h,\mathbf{ i}}^{(m)}(\mathbf{t})\right. \\&&\!\!\!\!\left. \left( \frac{1} {\displaystyle (1-G_{n}^{*}(Z_{i_{1}})\cdots(1-G_{n}^{*}(Z_{i_{m}}))}-\frac{1} {\displaystyle (1-G^{*}(Z_{i_{1}})\cdots(1-G^{*}(Z_{i_{m}}))}\right) \vphantom{\sum\limits_{(i_{1},\ldots,i_{m})\in I(m,n)}}\right|\\ &\!\!\!\!\leq&\!\!\!\!\sup\limits_{h \geq l_{n}}\sup\limits_{\mathbf{ t}\in\mathcal{I}^{m}}\sum\limits_{i=1}^{n}|\overline{\omega}_{n,K,h,\mathbf{ i}}^{(m)}(\mathbf{t})|\sup\limits_{\mathbf{t}\in [0,\tau)^{m}}\sup\limits_{ \psi\in\mathscr{F}_{q}}|\psi(\mathbf{t})|\\&&\!\!\!\!\times \sum\limits_{\eta=1}^{m}\left( \frac{1}{1-G_{n}^{*}(Z_{i_{\eta}})}-\frac{1}{1-G^{*}(Z_{i_{\eta}})}\right)\prod\limits_{j=1}^{\eta-1}\left( \frac{1}{1-G_{n}^{*}(Z_{i_{j}})}\right)\\&&\prod\limits_{k=\eta+1}^{m}\left( \frac{1}{1-G(Z_{i_{k}})}\right)\\&\!\!\!\!\leq&\!\!\!\! \sup\limits_{h \geq l_{n}}\sup\limits_{\mathbf{ t}\in\mathcal{I}^{m}}\sum\limits_{i=1}^{n}|\overline{\omega}_{n,K,h,\mathbf{ i}}^{(m)}(\mathbf{t})|\sup\limits_{\mathbf{t}\in [0,\tau)^{m}}\sup\limits_{ \psi\in\mathscr{F}_{q}}|\psi(\mathbf{t})|\\&& \!\!\!\!\times \sum\limits_{\eta=1}^{m} \frac{\sup\limits_{x\leq\tau}|G_{n}^{*} (x)-G (x)|}{[1-G_{n}^{*} (\tau)][1-G (\tau)]} \prod\limits_{j=1}^{\eta-1}\left( \frac{1}{1-G_{n}^{*}(Z_{i_{j}})}\right)\prod\limits_{k=\eta+1}^{m}\left( \frac{1}{1-G(Z_{i_{k}})}\right)\\&\!\!\!\!\leq&\!\!\!\! C\sup\limits_{h \geq l_{n}}\sup\limits_{\mathbf{ t}\in\mathcal{I}^{m}}\sum\limits_{i=1}^{n}|\overline{\omega}_{n,K,h,\mathbf{ i}}^{(m)}(\mathbf{t})|\sup\limits_{\mathbf{t}\in [0,\tau)^{m}}\sup\limits_{ \psi\in\mathscr{F}}|\psi(\mathbf{t})|\frac{\sup\limits_{x\leq\tau}|G_{n}^{*} (x)-G (x)|}{[1-G_{n}^{*} (\tau)][1-G (\tau)]}.\\ \end{array} $$

(6.48)

Since

$$\sup\limits_{\psi\in\mathscr{F}}|\psi(\mathbf{t})|<\infty,$$

the kernel K(⋅) is uniformly bounded and

$$\tau<T_{H}=T_{F}\leq T_{G},$$

the law of iterated logarithm for $G_{n}^{*}(\cdot )$ established in Földes and Rejtő (1981) ensures that

$$\sup\limits_{t\leq\tau}|G_{n}^{*}-G(t)|=O\left( \sqrt{\frac{\log\log n}{n}}\right) \text{almost surely} \text{as} n\rightarrow \infty.$$

By combining the results of Proposition 3.2 and Lemma 6.2, the result of the Theorem 3.1 is immediate by noting that, under the conditions (H.1-3), we have, for n sufficiently large,

$$\sup\limits_{t\leq\tau}|G_{n}^{*}-G(t)|=o\left( \sqrt{\frac{\log(1/h) }{nh^{d}}}\right) \text{almost surely} \text{as} n\rightarrow \infty.$$

Hence the proof is complete.

References

Abrevaya, J. and Jiang, W. (2005). A nonparametric approach to measuring and testing curvature. J. Bus. Econ. Stat. 23, 1–19.
MathSciNet Google Scholar
Arcones, M. A. and Giné, E. (1993). Limit theorems for U-processes. Ann. Probab. 21, 1494–1542.
MathSciNet MATH Google Scholar
Arcones, M. A. and Giné, E. (1995). On the law of the iterated logarithm for canonical U-statistics and processes. Stoch. Process. Appl. 58, 217–245.
MathSciNet MATH Google Scholar
Arcones, M. A. and Wang, Y. (2006). Some new tests for normality based on U-processes. Stat. Probab. Lett. 76, 69–82.
MathSciNet MATH Google Scholar
Arcones, M. A., Chen, Z. and Giné, E. (1994). Estimators related to U-processes with applications to multivariate medians: asymptotic normality. Ann. Stat.22, 1460–1477.
MathSciNet MATH Google Scholar
Borovkova, S., Burton, R. and Dehling, H. (2001). Limit theorems for functionals of mixing processes with applications to U-statistics and dimension estimation. Trans. Am. Math. Soc. 353, 4261–4318.
MathSciNet MATH Google Scholar
Borovskikh, Y. V. (1996). U-statistics in Banach spaces. VSP, Utrecht.
MATH Google Scholar
Bouzebda, S. (2012). On the strong approximation of bootstrapped empirical copula processes with applications. Math. Methods Stat. 21, 153–188.
MathSciNet MATH Google Scholar
Bouzebda, S. and El-hadjali, T. (2020). Uniform convergence rate of the kernel regression estimator adaptive to intrinsic dimension in presence of censored data. J. Nonparametr. Stat. 32, 864–914.
MathSciNet MATH Google Scholar
Bouzebda, S. and Elhattab, I. (2009). A strong consistency of a nonparametric estimate of entropy under random censorship. C. R. Math. Acad. Sci. Paris347, 821–826.
MathSciNet MATH Google Scholar
Bouzebda, S. and Elhattab, I. (2010). Uniform in bandwidth consistency of the kernel-type estimator of the Shannon’s entropy. C. R. Math. Acad. Sci. Paris 348, 317–321.
MathSciNet MATH Google Scholar
Bouzebda, S. and Elhattab, I. (2011). Uniform-in-bandwidth consistency for kernel-type estimators of Shannon’s entropy. Electron. J. Stat. 5, 440–459.
MathSciNet MATH Google Scholar
Bouzebda, S. and Nemouchi, B. (2019). Central limit theorems for conditional empirical and conditional U-processes of stationary mixing sequences. Math. Methods Stat. 28, 169–207.
MathSciNet MATH Google Scholar
Bouzebda, S. and Nemouchi, B. (2020). Uniform consistency and uniform in bandwidth consistency for nonparametric regression estimates and conditional U-statistics involving functional data. J. Nonparametr. Stat. 32, 452–509.
MathSciNet MATH Google Scholar
Bouzebda, S. and Nemouchi, B. (2022). Weak-convergence of empirical conditional processes and conditional U-processes involving functional mixing data. Stat. Inference Stoch. Process. To appear, pp 1–56.
Bouzebda, S. and Nezzal, A. (2022). Uniform consistency and uniform in number of neighbors consistency for nonparametric regression estimates and conditional U-statistics involving functional data. Jpn. J. Stat. Data Sci. 5, 2, 431–533.
MathSciNet MATH Google Scholar
Bouzebda, S., Elhattab, I. and Seck, C. T. (2018). Uniform in bandwidth consistency of nonparametric regression based on copula representation. Statist. Probab. Lett. 137, 173–182.
MathSciNet MATH Google Scholar
Bouzebda, S., Elhattab, I. and Nemouchi, B. (2021). On the uniform-in-bandwidth consistency of the general conditional U-statistics based on the copula representation. J. Nonparametr. Stat. 33, 321–358.
MathSciNet MATH Google Scholar
Brunel, E. and Comte, F. (2006). Adaptive nonparametric regression estimation in presence of right censoring. Math. Methods Stat. 15, 233–255.
MathSciNet Google Scholar
Carbonez, A., Györfi, L. and van der Meulen, E. C. (1995). Partitioning-estimates of a regression function under random censoring. Stat. Decis. 13, 21–37.
MathSciNet MATH Google Scholar
Chen, Y. and Datta, S. (2019). Adjustments of multi-sample U-statistics to right censored data and confounding covariates. Comput. Stat. Data Anal. 135, 1–14.
MathSciNet MATH Google Scholar
Datta, S., Bandyopadhyay, D. and Satten, G. A. (2010). Inverse probability of censoring weighted U-statistics for right-censored data with an application to testing hypotheses. Scand. J. Stat. 37, 680–700.
MathSciNet MATH Google Scholar
de la Peña, V. H. and Giné, E. (1999). Decoupling. Probability and its Applications (New York). Springer, New York. From dependence to independence, Randomly stopped processes. U-statistics and processes. Martingales and beyond.
Google Scholar
Deheuvels, P. (2000). Uniform limit laws for kernel density estimators on possibly unbounded intervals. In Recent advances in reliability theory (Bordeaux, 2000), Stat. Ind. Technol., pp 477–492. Birkhäuser, Boston.
Deheuvels, P. and Mason, D. M. (2004). General asymptotic confidence bands based on kernel-type function estimators. Stat. Inference Stoch. Process. 7, 225–277.
MathSciNet MATH Google Scholar
Denker, M. and Keller, G. (1983). On U-statistics and v. Mises’ statistics for weakly dependent processes. Z. Wahrsch. Verw. Gebiete 64, 505–522.
MathSciNet MATH Google Scholar
Devroye, L. and Lugosi, G. (2001). Combinatorial methods in density estimation Springer Series in Statistics. Springer, New York.
MATH Google Scholar
Dony, J. and Mason, D. M. (2008). Uniform in bandwidth consistency of conditional U-statistics. Bernoulli 14, 1108–1133.
MathSciNet MATH Google Scholar
Einmahl, U. and Mason, D. M. (2000). An empirical process approach to the uniform consistency of kernel-type function estimators. J. Theor. Probab.13, 1–37.
MathSciNet MATH Google Scholar
Einmahl, U. and Mason, D. M. (2005). Uniform in bandwidth consistency of kernel-type function estimators. Ann. Stat. 33, 1380–1403.
MathSciNet MATH Google Scholar
Farahmand, A. m., Szepesvári, C. and Audibert, J. -Y. (2007). Manifold-adaptive dimension estimation. In Proceedings of the 24th International Conference on Machine Learning, ICML ’07, pp 265–272. Association for Computing Machinery, New York.
Földes, A. and Rejtő, L. (1981). A LIL type result for the product limit estimator. Z. Wahrsch. Verw. Gebiete 56, 75–86.
MathSciNet MATH Google Scholar
Ghosal, S., Sen, A. and van der Vaart, A. W. (2000). Testing monotonicity of regression. Ann. Stat. 28, 1054–1082.
MathSciNet MATH Google Scholar
Giné, E. and Mason, D. M. (2007a). Laws of the iterated logarithm for the local U-statistic process. J. Theoret. Probab. 20, 457–485.
MathSciNet MATH Google Scholar
Giné, E. and Mason, D. M. (2007b). On local U-statistic processes and the estimation of densities of functions of several sample variables. Ann. Stat.35, 1105–1145.
MathSciNet MATH Google Scholar
Györfi, L., Kohler, M., Krzyżak, A. and Walk, H. (2002). A distribution-free theory of nonparametric regression. Springer Series in Statistics. Springer, New York.
MATH Google Scholar
Hall, P. (1984). Asymptotic properties of integrated square error and cross-validation for kernel estimation of a regression function. Z. Wahrsch. Verw. Gebiete67, 175–196.
MathSciNet MATH Google Scholar
Halmos, P. R. (1946). The theory of unbiased estimation. Ann. Math. Stat. 17, 34–43.
MathSciNet MATH Google Scholar
Härdle, W. and Marron, J. S. (1985). Optimal bandwidth selection in nonparametric regression function estimation. Ann. Stat. 13, 1465–1481.
MathSciNet MATH Google Scholar
Harel, M. and Puri, M. L. (1996). Conditional U-statistics for dependent random variables. J. Multivariate Anal. 57, 84–100.
MathSciNet MATH Google Scholar
Hein, M. and Audibert, J. -Y. (2005). Intrinsic dimensionality estimation of submanifolds in rd. In Proceedings of the 22nd International Conference on Machine Learning, ICML ’05, pp 289–296. Association for Computing Machinery, New York.
Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution. Ann. Math. Stat. 19, 293–325.
MathSciNet MATH Google Scholar
Hollander, M. and Proschan, F. (1972). Testing whether new is better than used. Ann. Math. Stat. 43, 1136–1146.
MathSciNet MATH Google Scholar
Joly, E. and Lugosi, G. (2016). Robust estimation of U-statistics. Stoch. Process. Appl. 126, 3760–3773.
MathSciNet MATH Google Scholar
Kaplan, E. L. and Meier, P. (1958). Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 53, 457–481.
MathSciNet MATH Google Scholar
Kégl, B. (2002). Intrinsic dimension estimation using packing numbers. In Proceedings of the 15th International Conference on Neural Information Processing Systems, NIPS’02, pp. 697–704. MIT Press, Cambridge.
Kim, J., Shin, J., Rinaldo, A. and Wasserman, L. (2018). Uniform convergence rate of the kernel density estimator adaptive to intrinsic volume dimension.
Kim, J., Shin, J., Rinaldo, A. and Wasserman, L. (2019). Uniform convergence rate of the kernel density estimator adaptive to intrinsic volume dimension. In Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, (K. Chaudhuri and R. Salakhutdinov, eds.), pp. 3398–3407. PMLR, Long Beach.
Kohler, M., Máthé, K. and Pintér, M. (2002). Prediction from randomly right censored data. J. Multivar. Anal. 80, 73–100.
MathSciNet MATH Google Scholar
Koroljuk, V.S. and Borovskich, Y.V. (1994). Theory of U-statistics, volume 273 of Mathematics and Its Applications. Kluwer Academic Publishers Group, Dordrecht. Translated from the 1989 Russian original by P. V. Malyshev and D. V. Malyshev and revised by the authors.
Google Scholar
Kosorok, M. R. (2008). Introduction to empirical processes and semiparametric inference. Springer Series in Statistics. Springer, New York.
MATH Google Scholar
Lee, A. J. (1990). U-statistics, volume 110 of Statistics: Textbooks and Monographs. Marcel Dekker, Inc., New York. Theory and practice.
Lee, S., Linton, O. and Whang, Y. -J. (2009). Testing for stochastic monotonicity. Econometrica 77, 585–602.
MathSciNet MATH Google Scholar
Leucht, A. (2012). Degenerate U- and V-statistics under weak dependence: asymptotic theory and bootstrap consistency. Bernoulli 18, 552–585.
MathSciNet MATH Google Scholar
Leucht, A. and Neumann, M. H. (2013). Degenerate U- and V-statistics under ergodicity: asymptotics, bootstrap and applications in statistics. Ann. Inst. Stat. Math. 65, 349–386.
MathSciNet MATH Google Scholar
Levina, E. and Bickel, P. J. (2004). Maximum likelihood estimation of intrinsic dimension. In Proceedings of the 17th International Conference on Neural Information Processing Systems, NIPS’04, pp. 777–784. MIT Press, Cambridge.
Ling, N. and Vieu, P. (2018). Nonparametric modelling for functional data: selected survey and tracks for future. Statistics 52, 934–949.
MathSciNet MATH Google Scholar
Maillot, B. and Viallon, V. (2009). Uniform limit laws of the logarithm for nonparametric estimators of the regression function in presence of censored data. Math. Methods Stat. 18, 159–184.
MathSciNet MATH Google Scholar
Mason, D. M. (2012). Proving consistency of non-standard kernel estimators. Stat. Inference Stoch. Process. 15, 151–176.
MathSciNet MATH Google Scholar
Mason, D. M. and Swanepoel, J. W. H. (2011). A general result on the uniform in bandwidth consistency of kernel-type function estimators. TEST 20, 72–94.
MathSciNet MATH Google Scholar
Nadaraja, E. A. (1964). On a regression estimate. Teor. Verojatnost. i Primenen. 9, 157–159.
MathSciNet Google Scholar
Nolan, D. and Pollard, D. (1987). U-processes: rates of convergence. Ann. Stat. 15, 780–799.
MathSciNet MATH Google Scholar
Prakasa Rao, B. L. S. and Sen, A. (1995). Limit distributions of conditional U-statistics. J. Theor. Probab. 8, 261–301.
MathSciNet MATH Google Scholar
Schick, A., Wang, Y. and Wefelmeyer, W. (2011). Tests for normality based on density estimators of convolutions. Stat. Probab. Lett. 81, 337–343.
MathSciNet MATH Google Scholar
Sen, A. (1994). Uniform strong consistency rates for conditional U-statistics. Sankhyā Ser. A 56, 179–194.
MathSciNet MATH Google Scholar
Shang, H. L. (2014). Bayesian bandwidth estimation for a functional nonparametric regression model with mixed types of regressors and unknown error density. J. Nonparametr. Stat. 26, 599–615.
MathSciNet MATH Google Scholar
Sherman, R. P. (1993). The limiting distribution of the maximum rank correlation estimator. Econometrica 61, 123–137.
MathSciNet MATH Google Scholar
Sherman, R. P. (1994). Maximal inequalities for degenerate U-processes with applications to optimization estimators. Ann. Stat. 22, 439–459.
MathSciNet MATH Google Scholar
Silverman, B. W. (1978). Distances on circles, toruses and spheres. J. Appl. Probab. 15, 136–143.
MathSciNet MATH Google Scholar
Stute, W. (1991). Conditional U-statistics. Ann. Probab. 19, 812–825.
MathSciNet MATH Google Scholar
Stute, W. (1993). Almost sure representations of the product-limit estimator for truncated data. Ann. Stat. 21, 146–156.
MathSciNet MATH Google Scholar
Stute, W. (1994a). L^p-convergence of conditional U-statistics. J. Multivar. Anal. 51, 71–82.
MATH Google Scholar
Stute, W. (1994b). Universally consistent conditional U-statistics. Ann. Stat. 22, 460–473.
MathSciNet MATH Google Scholar
Stute, W. (1996). Symmetrized NN-conditional U-statistics. In Research developments in probability and statistics, pp. 231–237. VSP, Utrecht.
Stute, W. and Wang, J. -L. (1993). Multi-sample U-statistics for censored data. Scand. J. Stat. 20, 369–374.
MathSciNet MATH Google Scholar
Talagrand, M. (1994). Sharper bounds for Gaussian and empirical processes. Ann. Probab. 22, 28–76.
MathSciNet MATH Google Scholar
van der Vaart, A. W. and Wellner, J. A. (1996). Weak convergence and empirical processes. Springer Series in Statistics. Springer, New York. With applications to statistics.
MATH Google Scholar
von Mises, R. (1947). On the asymptotic distribution of differentiable statistical functions. Ann. Math. Stat. 18, 309–348.
MathSciNet MATH Google Scholar
Watson, G. S. (1964). Smooth regression analysis. Sankhyā Ser. A26, 359–372.
MathSciNet MATH Google Scholar
Yuan, A., Giurcanu, M., Luta, G. and Tan, M. T. (2017). U-statistics with conditional kernels for incomplete data models. Ann. Inst. Stat. Math. 69, 271–302.
MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors are indebted to the Editor-in-Chief, Associate Editor and referee for their very generous comments and suggestions on the first version of our article which helped us to improve the content, presentation, and layout of the manuscript.

Funding

Not applicable.

Availability of Data and Materials. Not applicable.

Author information

Authors and Affiliations

Université de Technologie de Compiègne, LMAC (Laboratory of Applied Mathematics of Compiègne), CS 60 319 - 60 203, Compiègne Cedex, France
Salim Bouzebda & Anouar Abdeldjaoued Ferfache
Département de Mathématiques, Université Frères Mentouri, Constantine 1, Algeria
Thouria El-hadjali

Authors

Salim Bouzebda
View author publications
You can also search for this author in PubMed Google Scholar
Thouria El-hadjali
View author publications
You can also search for this author in PubMed Google Scholar
Anouar Abdeldjaoued Ferfache
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Salim Bouzebda.

Ethics declarations

Conflict of Interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Theorem 7.1.

(Kim et al. 2018) Let $(\mathbb {R}^{d},\mathbb {P})$ be a probability space and let X₁,…,X_n be i.i.d. from $\mathbb {P}$. Let $\mathcal {F}$ be a class of functions from $\mathbb {R}^{d}$ to $\mathbb {R}$ that is uniformly bounded VC-class with dimension ν, i.e., there exists positive numbers A, B such that, for all $f\in \mathcal {F}$, $\left \Vert f\right \Vert _{\infty }\leq B$, and for every probability measure $\mathbb {\mathbb {\mathbb {Q}}}$ on $\mathbb {R}^{d}$ and for every 𝜖 ∈ (0,B), the covering number $\mathcal {N}(\mathcal {F},L_{2}(\mathbb {\mathbb {\mathbb {Q}}}),\epsilon )$ satisfies

$$ \mathcal{N}(\mathcal{F},L_{2}(\mathbb{\mathbb{\mathbb{Q}}}),\epsilon)\leq\left( \frac{AB}{\epsilon}\right)^{\nu}. $$

Let σ > 0 with $\mathbb {E}_{\mathbb {P}}f^{2}\leq \sigma ^{2}$ for all $f\in \mathcal {F}$. Then there exists a universal constant C not depending on any parameters such that

$$\sup_{f\in\mathcal{F}}\left|\frac{1}{n}\sum\limits_{i=1}^{n}f(\mathbf{ X}_{i})-\mathbb{E}[f(\mathbf{ X})]\right|$$

is upper bounded with probability at least 1 − δ,

$$ \begin{array}{@{}rcl@{}} \lefteqn{\sup_{f\in\mathcal{F}}\left|\frac{1}{n}\sum\limits_{i=1}^{n}f(\mathbf{ X}_{i})-\mathbb{E}[f(\mathbf{ X})]\right|}\\&\leq&\!\!\! C\left( \frac{\nu B}{n}\log\left( \frac{2AB}{\sigma}\right) + \sqrt{\frac{\nu\sigma^{2}}{n}\log\left( \frac{2AB}{\sigma}\right)} + \sqrt{\frac{\sigma^{2}\log\left( \frac{1}{\delta}\right)}{n}} + \frac{B\log\left( \frac{1}{\delta}\right)}{n}\right). \end{array} $$

Theorem 7.2 (Theorem 4 of Giné and Mason (2007a)).

Let X₁,X₂,… be i.i.d. S -valued with probability law $\mathbb P $. Let ${\mathscr{H}}$ be a P -separable collection of measurable functions $f: S^{k} \rightarrow \mathbb {R}$ and assume that ${\mathscr{H}}$ is $\mathbb P$-canonical (which means that every f in ${\mathscr{H}}$ is $\mathbb P$ -canonical). Further, assume that

$$\mathbb{E}\left\|f\left( X_{1}, \ldots, X_{k}\right)\right\|_{\mathcal{H}}^{r}<\infty$$

for some r > 1 and let s be the conjugate of r. Then, with S_n Uniform in bandwidth consistency of conditional U -statistics 1131 defined as

$$ S_{n}=\sup_{f \in \mathcal{H}}\left|\sum\limits_{\mathbf{i} \in {I_{n}^{k}}} f\left( X_{i_{1}}, \ldots, X_{i_{k}}\right)\right|, \quad n \geq k, $$

we have, for all x > 0 and 0 < c < 1,

$$ \mathbb{P}\left\{\max_{k \leq m \leq n} S_{m}>x\right\} \leq \frac{\mathbb{P}\left\{S_{n}>c x\right\}^{1 / s}\left( \mathbb{E} {S_{n}^{r}}\right)^{1 / r}}{x(1-c)}. $$

Theorem 7.3 (Proposition 1 of Einmahl and Mason (2005)).

Let $\mathcal {G}$ be a pointwise measurable class of bounded functions with envelope function G such that for some constants C,v ≥ 1 and 0 < σ ≤ β, the following conditions hold:

(i)
$\mathbb {E} G^{2}(X) \leq \beta ^{2}$;
(ii)
$\mathcal {N}(\epsilon , \mathcal {G}) \leq C \epsilon ^{-v}, \quad 0<\epsilon <1 ;$
(iii)
${\sigma _{0}^{2}}:=\sup _{g \in \mathcal {G}} \mathbb {E} g^{2}(X) \leq \sigma ^{2}$
(iv)
$\sup _{g \in \mathcal {G}}\|g\|_{\infty } \leq \frac {1}{4 \sqrt {v}} \sqrt {n \sigma ^{2} / \log \left (C_{1} \beta / \sigma \right )}$, where C₁ = C^1/ν ∨ e.

We then have, for some absolute constant A,

$$ \mathbb{E}\left\|\sum\limits_{i=1}^{n} \varepsilon_{i} g\left( X_{i}\right)\right\|_{\mathcal{G}} \leq A \sqrt{\operatorname{vn} \sigma^{2} \log \left( C_{1} \beta / \sigma\right)}, $$

where ε₁,…,ε_n are i.i.d. Rademacher variables, independent of X₁,…,X_n.

Theorem 7.4 (Corollary 1 of Giné and Mason (2007a)).

Let $\mathcal {F}$ be a collection of measurable functions $f: S^{m} \rightarrow \mathbb {R}$, symmetric in their entries, with absolute values bounded by M > 0, and let P be any probability measure on $(S, \mathcal {S})\left (\right .$ with X_i i.i.d.- $\left .P\right ) $. Assume that $\mathcal {F}$ is of V C -type with envelope function F ≡ M and with characteristics A and v. Then, for every $m \in \mathbb {N}, A \geq e^{m}$v ≥ 1, there exist constants C₁ := C₁(m,A,v,M) and C₂ = C₂(m,A,v,M) such that for k = 1,…,m

$$ n^{k} \mathbb{E}\left\|U_{n}^{(k)}\left( \pi_{k} f\right)\right\|_{\mathcal{F}}^{2} \leq {C_{1}^{2}} 2^{k} \sigma^{2}\left( \log \frac{A}{\sigma}\right)^{k}, $$

assuming

$$n \sigma^{2} \geq C_{2} \log (A / \sigma),$$

where σ² is any number satisfying

$$ \left\|P^{m} f^{2}\right\|_{\mathcal{F}} \leq \sigma^{2} \leq M^{2}. $$

Theorem 7.5.

(Talagrand, 1994) Let $\mathcal {G}$ be a pointwise measurable class of functions satisfying

$$ \|g\|_{\infty} \leq M<\infty, \quad g \in \mathcal{G} . $$

We then have, for all t > 0,

$$ \begin{array}{@{}rcl@{}} &&\mathbb{P}\left\{\max_{1 \leq m \leq n}\left\|\sqrt{m} \alpha_{m}\right\|_{\mathcal{G}} \geq A_{1}\left( \mathbb{E}\left\|\sum\limits_{i=1}^{n} \varepsilon_{i} g\left( X_{i}\right)\right\|_{\mathcal{G}}+t\right)\right\} \\&&\leq 2\left\{\exp \left( -\frac{A_{2} t^{2}}{n \sigma_{\mathcal{G}}^{2}}\right)+\exp \left( -\frac{A_{2} t}{M}\right)\right\}, \end{array} $$

where

$$\sigma_{\mathcal{G}}^{2}=\sup_{g \in \mathcal{G}} \operatorname{Var}(g(X))$$

and A₁,A₂ are universal constants.

We now state the exponential inequality that will permit us to control the probability term in (4.6) and which is stated as Theorem 5.3.14 in de la Peña and Giné (1999).

Theorem 7.6 (Theorem 5.3.14 of de la Peña and Giné (1999))).

Let ${\mathscr{H}}$ be a V C -subgraph class of uniformly bounded measurable real-valued kernels H on $\left (S^{m}, \mathcal {S}^{m}\right )$, symmetric in their entries. Then, for each 1 ≤ k ≤ m, there exist constants $\left .c_{k}, d_{k} \in \right ] 0, \infty [$ such that, for all n ≥ m and t > 0,,

$$ \left\{\left\|n^{k / 2} U_{n}^{(k)}\left( \pi_{k} H\right)\right\|_{\mathcal{H}}>t\right\} \leq c_{k} \exp \left\{-d_{k} t^{2 / k}\right\}. $$

Theorem 7.7.

(Dony and Mason, 2008) Let I = [a,b] be a compact interval. Suppose that ${\mathscr{H}}$ is a uniformly equicontinuous family of real-valued functions φ on J = [a − η,b + η]^d for some d ≥ 1 and η > 0. Further assume that K is an L₁-kernel with support in [−B,B]^d, with B > 0 satisfying ${\int \limits }_{\mathbb {R}^{d}} K(\mathbf {u}) \mathrm {d} \mathbf {u}=1$. Then, uniformly in $\varphi \in {\mathscr{H}}$ and for any sequence of positive constants $b_{n} \rightarrow 0$,

$$ \sup_{0<h<b_{n} \mathbf{z} \in I^{d}} \sup \left|\varphi * K_{h}(\mathbf{z})-\varphi(\mathbf{z})\right| \longrightarrow 0 \quad \text { as } n \rightarrow \infty $$

where K_h(z) = h^−dK(z/h) and

$$ \varphi * K_{h}(\mathbf{z}):=h^{-d} {\int}_{\mathbb{R}^{d}} \varphi(\mathbf{x}) K\left( \frac{\mathbf{z}-\mathbf{x}}{h}\right) \mathrm{d} \mathbf{x} . $$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bouzebda, S., El-hadjali, T. & Ferfache, A.A. Uniform in Bandwidth Consistency of Conditional U-statistics Adaptive to Intrinsic Dimension in Presence of Censored Data. Sankhya A 85, 1548–1606 (2023). https://doi.org/10.1007/s13171-022-00301-7

Download citation

Received: 24 March 2022
Accepted: 02 November 2022
Published: 21 December 2022
Issue Date: August 2023
DOI: https://doi.org/10.1007/s13171-022-00301-7

Keywords

AMS (2000) subject classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Uniform in Bandwidth Consistency of Conditional U-statistics Adaptive to Intrinsic Dimension in Presence of Censored Data

Abstract

Similar content being viewed by others

Estimation for semiparametric varying coefficient models with different smoothing variables under random right censoring

Uniform consistency and uniform in number of neighbors consistency for nonparametric regression estimates and conditional U-statistics involving functional data

A note on estimating the conditional expectation under censoring and association: strong uniform consistency

1 Introduction

2 Main Results

Assumption 1.

Remark 2.1.

Assumption 2.

Lemma 2.2.

Assumption 3.

Assumption 4.

Lemma 2.3.

Example 2.4.

Theorem 2.5.

Theorem 2.6.

Theorem 2.7.

Corollary 2.8.

Corollary 2.9.

Remark 2.10.

Remark 2.11.

Remark 2.12.

Remark 2.13.

3 Extension to the Censored Case

Theorem 3.1.

Proposition 3.2.

3.1 Examples of U-statistics

Example 3.3.

Example 3.4.

Example 3.5.

Example 3.6.

Example 3.7.

Example 3.8.

4 The Bandwidth Selection Criterion

4.1 Discrimination

5 Concluding Remarks and Future Works

6 Mathematical Developments

6.1 Hoeffding Decomposition

6.2 Proof of Theorem 2.5: the Bounded Case

6.3 Linear Term

6.4 The Other Terms of Eq. 6.1

6.5 Proof of Theorem 2.6: the Unbounded Case

6.5.1 Truncated Part

6.6 Remainder Part

Fact 6.1.

6.7 Proof of Theorem 2.7

6.8 Proof of Corollary 2.8

6.9 Proof of Proposition 3.2

Lemma 6.2.

6.10 Proof of Lemma 6.2

Lemma 6.3.

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s Note

Appendix

Appendix

Theorem 7.1.

Theorem 7.2 (Theorem 4 of Giné and Mason (2007a)).

Theorem 7.3 (Proposition 1 of Einmahl and Mason (2005)).

Theorem 7.4 (Corollary 1 of Giné and Mason (2007a)).

Theorem 7.5.

Theorem 7.6 (Theorem 5.3.14 of de la Peña and Giné (1999))).

Theorem 7.7.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

AMS (2000) subject classification

Search

Navigation