Abstract
U-statistics represent a fundamental class of statistics from modelling quantities of interest defined by multi-subject responses. U-statistics generalize the empirical mean of a random variable X to sums over every m-tuple of distinct observations of Stute (Ann. Probab. 19, 812–825 1991) introduced a class of so-called conditional U-statistics, which may be viewed as a generalization of the Nadaraya-Watson estimates of a regression function. Stute proved their strong pointwise consistency to:
We apply the methods developed in Dony and Mason (Bernoulli 14(4), 1108–1133 2008) to establish uniform in t and in bandwidth consistency (i.e., h, h ∈ [an,bn] where \(0<a_{n}<b_{n}\rightarrow 0\) at some specific rate) to r(t) of the estimator proposed by Stute when Y, under weaker conditions on the kernel than previously used in the literature. We extend existing uniform bounds on the kernel conditional U-statistic estimator and make it adaptive to the intrinsic dimension of the underlying distribution of X which the so-called intrinsic dimension will characterize. In addition, uniform consistency is also established over \(\varphi \in {\mathscr{F}}\) for a suitably restricted class \({\mathscr{F}}\), in both cases bounded and unbounded, satisfying some moment conditions. Our theorems allow data-driven local bandwidths for these statistics. Moreover, in the same context, we show the uniform bandwidth consistency for the nonparametric inverse probability of censoring weighted (I.P.C.W.) estimators of the regression function under random censorship, which is of its own interest. The theoretical uniform consistency results established in this paper are (or will be) key tools for many further developments in regression analysis.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Motivated by numerous applications, the theory of U-statistics (introduced in the seminal work by Hoeffding (1948)) and U-processes have received considerable attention in the past decades. U-processes are useful for solving complex statistical problems. Examples are density estimation, nonparametric regression tests and goodness-of-fit tests. More precisely, U-processes appear in statistics in many instances, e.g., as the components of higher order terms in von Mises expansions. In particular, U-statistics play a role in the analysis of estimators (including function estimators) with varying degrees of smoothness. For example, Stute (1993) applies a.s. uniform bounds for \(\mathbb {P}\)-canonical U-processes to the analysis of the product limit estimator for truncated data. Arcones and Wang (2006) present two new tests for normality based on U-processes. Making use of the results of Giné and Mason (2007a), Giné and Mason (2007b), Schick et al. (2011) introduced new tests for normality which used as test statistics weighted L1-distances between the standard normal density and local U-statistics based on standardized observations. Joly and Lugosi (2016) discussed the estimation of the mean of multivariate functions in case of possibly heavy-tailed distributions and introduced the median-of-means, which is based on U-statistics. U-processes are important tools for a broad range of statistical applications such as testing for qualitative features of functions in nonparametric statistics (Lee et al. 2009; Ghosal et al. 2000; Abrevaya and Jiang, 2005) and establishing limiting distributions of M-estimators (see, e.g., Arcones and Giné 1993; Sherman 1993; Sherman 1994; de la Peña and Giné1999). Halmos (1946), von Mises (1947) and Hoeffding (1948), who provided (amongst others) the first asymptotic results for the case that the underlying random variables are independent and identically distributed. Under weak dependency assumptions asymptotic results are for instance shown in Borovkova et al. (2001), in Denker and Keller (1983) or more recently in Leucht (2012) and in more general setting in Leucht and Neumann (2013), Bouzebda and Nemouchi (2019), Bouzebda and Nemouchi (2022). For excellent resource of references on the U-statistics and U-processes, the interested reader may refer to Borovskikh (1996), Koroljuk and Borovskich (1994), Lee (1990), Arcones and Giné (1995), Arcones et al. (1994) and Arcones and Giné (1993). A profound insight into the theory of U-processes is given by de la Peña and Giné (1999). In this paper, we consider the so-called conditional U-statistics introduced by Stute (1991). These statistics may be viewed as generalizations of the Nadaraya-Watson (Nadaraja, 1964; Watson, 1964) estimates of a regression function.
To be more precise, let us consider a sequence of independent and identically distributed random vectors \(\{(\mathbf { X}_{i},\mathbf { Y}_{i}), i\in \mathbb {N}^{*}\}\) with \(\mathbf { X}_{i} \in \mathbb {R}^{d}\) and \(\mathbf { Y}_{i} \in \mathbb {R}^{q}\), d,q ≥ 1. Let \(\mathbb {P}_{\mathbf {X}}=\mathbb {P}\) be an unknown marginal Borel probability distribution in \(\mathbb {R}^{d}\). Let \( \varphi : \mathbb {R}^{qm}\rightarrow \mathbb {R}\) be a measurable function. In this paper, we are primarily concerned with the estimation of the conditional expectation, or regression function of φ(Y1,…,Ym) evaluated at (X1,…,Xm) = t, given by
whenever it exists, i.e., \(\mathbb {E}\left (\left |\varphi (\mathbf { Y}_{1},\ldots ,\mathbf { Y}_{m})\right |\right )<\infty \). We now introduce a kernel function \(K:\mathbb {R}^{dm}\rightarrow \mathbb {R}\). Stute (1991) presented a class of estimators for r(m)(φ,t), called the conditional U-statistics, which is defined for each \(\mathbf { t}\in \mathbb {R}^{dm}\) to be :
where:
is the set of all m-tuples of different integers between 1 and n and {hn}n≥ 1 denotes a sequence of positive constants converging to zero and \(n{h^{m}_{n}} \rightarrow \infty \). For notational simplicity, we let h = hn. In the particular case m = 1, the r(m)(φ,t) is reduced to \(r^{(1)}(\varphi ,\mathbf { t})=\mathbb {E}(\varphi (\mathbf { Y})|\mathbf { X}=\mathbf { t})\) and Stute’s estimator becomes the Nadaraya-Watson estimator of r(1)(φ,t) given by:
The work of Sen (1994) was devoted to estimate the rate of the uniform convergence in t of \(\widehat {r}_{n}^{(m)}(\varphi ,\mathbf {t};h)\) to r(m)(φ,t). In the paper of Prakasa Rao and Sen (1995), the limit distributions of \(\widehat {r}_{n}^{(m)}(\varphi ,\mathbf { t};h)\) are discussed and compared with those obtained by Stute. Harel and Puri (1996) extend the results of Stute (1991), under appropriate mixing conditions, to weakly dependent data and have applied their findings to verify the Bayes risk consistency of the corresponding discrimination rules. Stute (1996) proposed symmetrized nearest neighbour conditional U-statistics as alternatives to the usual kernel-type estimators. An important contribution is given in the paper Dony and Mason (2008) where a much stronger form of consistency holds, namely, uniform in t and in bandwidth consistency (i.e., h ∈ [an,bn] where \(a_{n}<b_{n}\rightarrow 0\) at some specific rate) of \(\widehat {r}_{n}^{(m)}(\varphi ,\mathbf { t};h)\). In addition, uniform consistency is also established over \(\varphi \in {\mathscr{F}}\) for a suitably restricted class \({\mathscr{F}}\). The main tool in their result is the use of the local conditional U-process investigated in Giné and Mason (2007a). In the last decades, empirical process theory has provided very useful and powerful tools to analyze the large sample properties of several nonparametric estimators of functionals of the distribution, such as the regression function and the density function, refer to, van der Vaart and Wellner (1996) and Kosorok (2008). Nolan and Pollard (1987) were the first to introduce the notion of uniform in bandwidth consistency for kernel density estimators and they applied empirical process methods in their study. In the series of papers, Deheuvels (2000), Deheuvels and Mason (2004), Einmahl and Mason (2005), Dony and Mason (2008), Maillot and Viallon (2009), Mason and Swanepoel (2011), Bouzebda and Elhattab(2009, 2010), Bouzebda (2012), Bouzebda et al. (2018, 2021), Bouzebda and Nemouchi (2020), Bouzebda and El-hadjali (2020), Bouzebda and Nezzal (2022) the authors established uniform consistency results for such kernel estimators, where h varies within suitably chosen intervals indexed by n. More precisely, we will consider one of the most commonly used classes of estimators that is formed by the so-called kernel-type estimators. There are basically no restrictions on the choice of the kernel function in our setup, apart from satisfying some mild conditions that we will give after. The selection of the bandwidth, however, is more problematic. It is worth noticing that the choice of the bandwidth is crucial to obtain a good rate of consistency, for example, it has a big influence on the size of the estimate’s bias. In general, we are interested in the selection of bandwidth that produces an estimator which has a good balance between the bias and the variance of the considered estimators. It is then more appropriate to consider the bandwidth varying according to the criteria applied and the available data and location, which cannot be achieved by using classical methods. The interested reader may refer to Mason (2012) for more details and discussion on the subject. In the present paper, we develop methods that permit the study of the kernel-type under nonrestrictive conditions. It is worth noticing that the high-dimensional data sets have several unfortunate properties that make them hard to analyze. The phenomenon that the computational and statistical efficiency of statistical techniques degrade rapidly with the dimension is often referred to as the “curse of dimensionality”. Density and regression estimation on manifolds has received much less attention than the “full-dimensional” counterpart. However, understanding density estimation in situations where the intrinsic dimension can be much lower than the ambient dimension is becoming ever more important: modern systems can capture data at an increasing resolution while the number of degrees of freedom stays relatively constant. One of the limiting aspects of density (regression)-based approaches is their performance in high dimensions.
We know that the notion of the intrinsic dimension, say dM, has been studied in the statistical machine learning literature so as to establish fast estimation rates in high-dimensional kernel regression settings. There are numerous known techniques for doing so e.g. Kégl (2002), Levina and Bickel (2004), Hein and Audibert (2005), Farahmand et al. (2007). We first introduce a concept proposed by Kim et al. (2018, 2019), the so-called volume dimension, to characterize the intrinsic dimension of the underlying distribution. More specifically, the volume dimension dvol is the decay rate of the probability of vanishing Euclidean balls. Let ∥⋅∥ be the Euclidean 2-norm. For \(\mathbf { x}\in \mathbb {R}^{d}\) and r > 0, we use the notation \(\mathbb {B}_{\mathbb {R}^{d}}(\mathbf { x},r)\) for the open Euclidean ball centered at x and radius r, i.e.,
When a probability distribution \(\mathbb {P}\) has a bounded density fX(⋅) supported on a well-behaved manifold M of dimension dM, it is known that, for any point x ∈ M, the measure on the ball \(\mathbb {B}_{\mathbb {R}^{d}}(\mathbf { x},r)\) centered at x and radius r decays as
when r is small enough. From this, Kim et al. (2018) define the volume dimension of a probability distribution \(\mathbb {P}\) to be the maximum possible exponent rate that can dominate the probability volume decay on balls, i.e., fix a subset \(\mathbb {X}\subset \mathbb {R}^{d}\), then
The primary purpose of the present work is to extend the work of Kim et al. (2018) to the more general estimators, including the kernel density estimator as a particular case (studied in Kim et al. 2018), this generalization is far from being trivial and harder to control some complex classes of functions, which form a basically unsolved open problem in the literature. We aim to fill this gap in the literature by combining results Kim et al. (2018) with techniques developed in Einmahl and Mason (2005) and Dony and Mason (2008). The present extends our previous work in Bouzebda and El-hadjali (2020) in several directions; the main one is that the present paper considers the conditional U-statistics, including the regression treated in the last mentioned paper. More precisely, a uniform in bandwidth consistency results for some general kernel-type estimators are established. However, as will be seen later, the problem requires much more than “simply” combining ideas from the existing results. Delicate mathematical derivations will be required to cope with the empirical processes that we consider in this extended setting. In addition, we will consider the nonparametric Inverse Probability of Censoring Weighted (I.P.C.W.) estimators of the multivariate regression function under random censorship and obtain uniform in bandwidth consistency results that are of independent interest.
An outline of the remainder of our paper is as follows. In the forthcoming section, we introduce the mathematical framework and provide our main results concerning the uniform in bandwidth consistency of conditional U-statistics adaptive to intrinsic dimension extending the setting of the work of Dony and Mason (2008). In Section 3, we consider the conditional U-statistics in the right censored data framework. Examples of U-statistics kernel are provided in Section 3.1. In Section 4, we present how to select the bandwidth through the cross-validation procedures. An application to the nonparametric discrimination problem is discussed in Section 4.1. Some concluding remarks are given in Section 5. To prevent interrupting the presentation flow, all proofs are gathered in Section 6. A few relevant technical results are given in the ?? for easy reference.
2 Main Results
For a fixed integer m ≤ n, consider a class \({\mathscr{F}}_{q}\) of measurable functions \(g : \mathbb {R}^{qm} \rightarrow \mathbb {R}\) defined on \(\mathbb {R}^{qm}\), such that \(\mathbb {E}g^{2}(\mathbf {Y}_{1},\ldots , \mathbf {Y}_{m}) <\infty \), which satisfies the conditions, (F.i)–(F.iii) given below. First, to avoid measurability problems, we assume that
that is, there exists a countable subclass \({\mathscr{F}}_{0}\) of \({\mathscr{F}}_{q}\) such that we can find, for any function \(g \in {\mathscr{F}}_{q}\), a sequence of functions \(g_{m}\in {\mathscr{F}}_{0}\) for which
This condition is discussed in van der Vaart and Wellner (1996). We also assume that \({\mathscr{F}}_{q}\) has a measurable envelope function
Notice that condition (F.i) implies that the supremum in Eq. F.ii is measurable. Finally, we assume that \({\mathscr{F}}_{q}\) is of VC-type, with characteristics A and v (“VC” for Vapnik and Červonenkis), meaning that for some A ≥ 3 and v1 ≥ 1,
where Q is any probability measure on \((\mathbb {R}^{qm},{\mathscr{B}})\), where \({\mathscr{B}}\) represents the σ-field of Borel sets of \(\mathbb {R}^{qm}\), such that \(\|F\|_{L_{2}(Q)}< \infty \), and where for ε > 0, \(\mathcal {N}({\mathscr{F}}_{q},L_{2}(Q),\varepsilon )\) is defined as the smallest number of L2(Q)-open balls of radius ε required to cover \({\mathscr{F}}_{q}\). (If Eq. F.iii holds for \({\mathscr{F}}_{q}\), then we say that the VC-type class \({\mathscr{F}}_{q}\) admits the characteristics A and v1). In this section, we follow Kim et al. (2018) for weakening the conditions on the kernel and making it adaptive to the intrinsic dimension of the underlying distribution and without assumptions on the distribution. It is worth noticing that for general distributions such as the one with support is a lower-dimensional manifold, the usual change of variables argument is no longer directly applicable. However, we can provide a bound based on the volume dimension under an integrability condition on the kernel, given below. Let K(⋅) be a kernel function defined on \(\mathbb {R}^{dm}\), that is a measurable function satisfying
Assumption 1.
(Integrability condition) Let \(\|K\|_{\infty }=\sup _{\mathbf {x}\in \mathbb {R}^{d}}|K(\mathbf {x})|=:\kappa <\infty ,\) and fix k > 0. We have: either dvol = 0 or
Remark 2.1.
(Kim et al. 2018) It is important to emphasize that Assumption 1 is weak, as it is satisfied by commonly used kernels. For instance, if the kernel function K(x) decays at a polynomial rate strictly faster than dvol/k (which is at most d/k) as \(\mathbf { x}\to \infty \), that is, if
for any 𝜖 > 0, the integrability condition Eq. K.ii is satisfied. Also, if the kernel function K(x) is spherically symmetric, that is, if there exists \(\widetilde {K}:[0,\infty )\to \mathbb {R}\) with \(K(\mathbf { x})=\widetilde {K}(\left \Vert \mathbf { x}\right \Vert _{2})\), then the integrability condition Eq. K.ii is satisfied provided \(\left \Vert K\right \Vert _{k} <\infty \). Kernels with bounded support also satisfy the condition Eq. K.ii. Thus, most of the commonly used kernels including Uniform, Epanechnikov, and Gaussian kernels satisfy the above integrability condition.
Now, we consider the class of functions
with \(\|K\|_{2}<\infty \).
Assumption 2.
Assume that \({\mathscr{K}}\) is bounded VC-class with envelope κ and dimension ν2, i.e., there exists positive number A2 ≥ 1 and ν2 ≥ 1 such that, for every probability measure \(\mathbb {Q}\) on \(\mathbb {R}^{d}\) and for every \(\varepsilon \in (0,\|K\|_{\infty })\), the covering numbers \(\mathcal {N}(\mathcal {K },L_{2}(\mathbb {Q}),\varepsilon )\) satisfies
Furthermore, let
denote the product kernel. Next, if \((S, \mathcal {S})\) is a measurable space, define the general U-statistic with kernel \(H:S^{k}\rightarrow \mathbb {R}\) based on S-valued random variables Z1,⋯ ,Zn as
where I(k,n) is defined as in Eq. 1.3 with m = k. Note that we do not require H to be symmetric here. For a bandwidth 0 < h and \(g\in {\mathscr{F}}_{q}\), consider the U-kernel
where, as usual, Kh(z) = h−dK(z/h), \(z\in \mathbb {R}^{d}\), and for the sample (X1,Y1), …,(Xn,Yn), define
where, throughout this paper, we shall use the notation
Now, introduce the U-statistic process
We denote by I and J two fixed subsets of \(\mathbb {R}^{d}\) such that
where
Introduce the class of functions defined on the compact subset Jm of \(\mathbb {R}^{dm}\),
where r(m)(φ,⋅) is defined in Eq. 1.1 and the function \(\widetilde {f}: \mathbb {R}^{dm} \rightarrow \mathbb {R}\) is defined as
where \(f\left (\cdot , \cdot \right )\) denote the joint density of (X,Y). We fix a subset \(\mathbb {X}\subset \mathbb {R}^{d}\) on which we are considering the uniform convergence of the kernel regression estimator. We first characterize the intrinsic dimension of the distribution \(\mathbb {P}\), proposed by Kim et al. (2018), by its rate of the probability volume growth on balls. If a probability distribution has a positive measure on a manifold with a positive reach, then the volume dimension is always between 0 and the manifold’s dimension. In particular, the volume dimension of any probability distribution is between 0 and the ambient dimension d.
Lemma 2.2.
(Kim et al. 2018)Let \(\mathbb {P}\) be a probability distribution on \(\mathbb {R}^{d}\), and dvol be its volume dimension. Then for any ν ∈ [0,dvol), there exists a constant \(C_{\nu , \mathbb {P}}\) depending only on \(\mathbb {P}\) and ν such that for all \(\mathbf { x}\in \mathbb {X}\) and r > 0,
For the exact optimal rate, we impose conditions on how the probability volume decay in Eq. 2.3.
Assumption 3.
Let \(\mathbb {P}\) be a probability distribution on \(\mathbb {R}^{d}\), and dvol be its volume dimension. For ν ∈ [0,dvol), we assume that
Assumption 4.
Let \(\mathbb {P}\) be a probability distribution on \(\mathbb {R}^{d}\), and dvol be its volume dimension. For ν ∈ [0,dvol), we assume that
These assumptions are in fact weak and hold for common probability distributions. In particular, Assumptions 3 and 4 hold when the probability distribution has a bounded density with respect to the d-dimensional Lebesgue measure. By combining Assumption 1 and Lemma (2.0.2) of Kim et al. (2018), we can bound \(\mathbb {E}_{\mathbb {P}}\left [K^{2}\right ]\) in terms of the volume dimension dvol.
Lemma 2.3.
Let \((\mathbb {R}^{d},\mathbb {P})\) be a probability space and let \(X \sim \mathbb {P}\). For any kernel K(⋅) satisfying Assumption 1 with k > 0, the expectation of the k-moment of the kernel is upper bounded as
for any ε ∈ (0,dvol), where \(C_{k,\mathbb {P},K,\varepsilon }\) is a constant depending only on \(k,\mathbb {P},K\) and ε. Further, if dvol = 0 or under Assumption 1 in Kim et al. (2018), ε can be 0 in Eq. 2.6.
We give an example from Kim et al. (2018) of an unbounded density. In this case, the volume dimension is strictly smaller than the dimension of the support, which illustrates why the dimension of the support is not enough to characterize the dimensionality of a distribution.
Example 2.4.
(Kim et al. 2018) Let \(\mathbb {P}\) be a distribution on \(\mathbb {R}^{d}\) having a density p with respect to the d-dimensional Lebesgue measure. Fix β < d, and suppose \(p:\mathbb {R}^{d}\to \mathbb {R}\) is defined as
Then, for each fixed r > 0,
Hence from definition in Eq. 1.4, the volume dimension is
In our setting, we will use
It is clear that \(\widehat {r}_{n}^{(m)}(\varphi ,\mathbf {t};h)\) can be rewritten, for all \(\varphi \in {\mathscr{F}}\), as
where we denote by Un(1,t;h) the U-statistic Un(φ,t;h) with φ ≡ 1. To prove the uniform consistency of \(\widehat {r}_{n}^{(m)}(\varphi ,\mathbf {t};h)\) to r(m)(φ,t), we shall consider another, more appropriate, centering factor than the expectation \(\mathbb {E} \widehat {r}_{n}^{(m)}(\varphi ,\mathbf { t};h)\), which may not exist or may be difficult to compute. Define the centring
This centering permits us to derive results on the convergence rates of the process
to zero and the consistency of \( \widehat {r}_{n}^{(m)}(\varphi ,\mathbf { t};h)\) uniform in t and in bandwidth.
Theorem 2.5.
Let \(l_{n} = c(\log n/n)^{1/dm}\) for c > 0. If the class of functions \({\mathscr{F}}_{q}\) is bounded, in the sense that for some \(0 < M < \infty \),
Fix ε ∈ (0,dvol). Further, if dvol = 0 or under Assumption 3, ε can be 0. Suppose
Then we can infer, under the above mentioned assumptions on \({\mathscr{F}}_{q}\) and Assumptions 1, 2 and 4, that for all δ > 0, there exists a constant \(0<\mathfrak {C}_{1}<\infty \) such that we have with probability at least 1 − δ
Theorem 2.6.
Let \(a_{n} = c((\log n/n)^{1-2/p})^{1/dm}\) for c > 0. If \({\mathscr{F}}_{q}\) is unbounded, but satisfies, for some p > 2,
then we can infer, under the above mentioned assumptions on \({\mathscr{F}}_{q}\) and and Assumptions 1, 2 and 4, that for all c > 0 and 0 < b0 < 1, there exists a constant \(0<\mathfrak {C}_{2}<\infty \) such that
for any ε ∈ (0,dvol).
We mention that Kim et al. (2018) do not need continuity of the density for their results. (Of course, continuity of the density is crucial for controlling the bias.) Some related results on uniform convergence over compact subsets have been obtained by Bouzebda and El-hadjali (2020) for a much larger class of estimators including kernel estimators for regression functions among others. In this general setting, however, it is often not possible to obtain the convergence uniformly over \(\mathbb {R}^{d}\). Density estimators are in that sense somewhat exceptional.
Theorem 2.7.
Besides being bounded, suppose that the marginal density function fX of X is continuous and strictly positive on the interval I. Assume that the class of functions \({\mathscr{M}}\) is uniformly equicontinuous. It then follows that for all sequences 0 < bn < 1 with \(b_{n} \rightarrow 0\),
where Im = I ×… ×I.
Corollary 2.8.
Besides being bounded, suppose that the marginal density function fX of X is continuous and strictly positive on the interval I. It then follows, under the above mentioned assumptions on \({\mathscr{F}}_{q}\) and and Assumptions 1, 2 and 4, that for all c > 0 and all sequences 0 < bn < 1 with \(a_{n}^{\prime }\leq b_{n} \rightarrow 0\), there exists a constant \(0<\mathfrak {C}_{3}<\infty \) such that
where for any ε ∈ (0,dvol) and \(a_{n}^{\prime }\) is either an or ln, depending on whether the class \({\mathscr{F}}_{q}\) is bounded or not.
We can now state the main result of this section which follows easily from Theorems 2.5 and 2.6.
Corollary 2.9.
Under the conditions of Theorems 2.5 and 2.6 on fX and the class of functions \({\mathscr{F}}_{q}\) and Assumptions 1, 2 and 4, it follows that for all sequences \(0<a_{n}^{\prime }\leq \widetilde a_{n}\leq b_{n}<1\) satisfying \(b_{n}\rightarrow 0\) and \(n\widetilde a_{n} /\log n\rightarrow \infty \),
Remark 2.10.
Under additional, weak regularity conditions on \(\mathbb {P}\), the value of ε can be taken equal to 0 (2.9). Under the assumption that the distribution has a bounded Lebesgue density, dvol = d so our result recovers existing results in literature in terms of rates of convergence, in particular the results presented in Dony and Mason (2008). Our results complement those Dony and Mason (2008) by relaxing the condition on the kernel functions as it was done by Kim et al. (2018). At this point, we mention that our results are stated in the multivariate setting and more importantly are adaptative to the dimension volume. This alleviates the problem of the curse of dimension. To be more precise, it is well known that the estimation problems of a regression function are especially hard in the case when the dimension of the explanatory X is large. It is worth noticing that one consequence of this is that the optimal minimax rate of convergence n− 2k/(2k+d) for the estimation of a k times differentiable regression function converges to zero rather slowly if the dimension d of X is large compared to k. To circumvent the so-called curse of dimensionality, the only way is to impose additional assumptions on the regression functions. The simplest way is to consider the linear models but this rather restrictive parametric assumption can be extended in several ways. An idea is to consider the additive models to simplify the problem of regression estimation by fitting only functions to the data which have the same additive structure. In projection pursuit one generalizes this further by assuming that the regression function is a sum of univariate functions applied to projections of x in various directions, we note that this includes the single index models as particular cases, the interested reader may refer to Györfi et al. (2002, Chapter 22) for more rigorous developments of such techniques. Other ways to be investigated are the semi-parametric models, considered intermediary models between linear and nonparametric ones, aiming to combine the flexibility of nonparametric approaches with the interpretability of the parametric ones, for details on these methods for functional data, one can refer to Ling and Vieu (2018, Section 4.2) and the reference therein.
Remark 2.11.
We note that the main problem in using an estimator such as in Eq. 1.2 is to choose properly the smoothing parameter h. The uniform in bandwidth consistency results given in Corollary 2.9 shows that any choice of h between an and bn ensures the consistency of \(\widehat {r}_{n}^{(m)}(\varphi ,\mathbf {t};h)\). Namely, the fluctuation of the bandwidth in a small interval does not affect the consistency of the nonparametric estimator \(\widehat {r}_{n}^{(m)}(\varphi ,\mathbf {t};h)\) of r(m)(φ,t).
Remark 2.12.
For notational convenience, we have chosen the same bandwidth sequence for each margins. This assumption can be dropped easily. If one wants to make use of the vector bandwidths (see, in particular, Chapter 12 of Devroye and Lugosi (2001)). With obvious changes of notation, our results and their proofs remain true when h is replaced by a vector bandwidth \(\mathbf { h}_{n} = (h^{(1)}_{n}, \ldots , h^{(d)}_{n})\), where \({\min \limits } h^{(i)}_{n} > 0\). In this situation we set \(h={\prod }_{i=1}^{d} h^{(i)}\), and for any vector v = (v1,…,vd) we replace v/h by (v1/h(1),…,v1/h(d)). For ease of presentation, we chose to use real-valued bandwidths throughout.
Remark 2.13.
In the sequel, we will need to symmetrize the functions Gg,h,t(x,y). To do this, we have
where \(\mathbf {x}_{\sigma }:=(x_{\sigma _{1}}, \ldots , x_{\sigma _{m}})\) et \(\mathbf {y}_{\sigma }:=(y_{\sigma _{1}}, \ldots , y_{\sigma _{m}})\). Obviously, after symmetrization we have
So the U-statistic process in Eq. 2.2 may be redefined using the symmetrized kernels, hence we consider
For more details, consult for instance the book of de la Peña and Giné (1999).
3 Extension to the Censored Case
Consider a triple (Y,C,X) of random variables defined in \(\mathbb {R} \times \mathbb {R} \times \mathbb {R}^{d}\). Here Y is the variable of interest, C is a censoring variable and X is a concomitant variable. Throughout, we will use Maillot and Viallon (2009) notation and we work with a sample {(Yi,Ci,Xi)1≤i≤n} of independent and identically distributed replication of (Y,C,X), n ≥ 1. Actually, in the right censorship model, the pairs (Yi,Ci), 1 ≤ i ≤ n, are not directly observed and the corresponding information is given by \(Z_{i} := \min \limits \{Y_{i},C_{i}\}\) and , 1 ≤ i ≤ n. Accordingly, the observed sample is
Survival data in clinical trials or failure time data in reliability studies, for example, are often subject to such censoring. To be more specific, many statistical experiments result in incomplete samples, even under well-controlled conditions. For example, clinical data for surviving most types of disease are usually censored by other competing risks to life which result in death. In the sequel, we impose the following assumptions upon the distribution of (X,Y ). Denote by \(\mathcal { I}\) a given compact set in \(\mathbb {R}^{d}\) with nonempty interior and set, for any α > 0,
We will assume that, for a given α > 0, (X,Y ) [resp. X] has a density function fX,Y [resp. fX] with respect to the Lebesgue measure on \(\mathcal I_{\alpha } \times \mathbb {R}\) [resp. \(\mathcal I_{\alpha }\)]. For \(-\infty < t < \infty \), set
the right-continuous distribution functions of Y, C and Z respectively. For any right-continuous distribution function L defined on \(\mathbb {R}\), denote by
the upper point of the corresponding distribution. Now consider a pointwise measurable class \({\mathscr{F}}\) of real measurable functions defined on \(\mathbb {R}\), and assume that \({\mathscr{F}}\) is of VC-type. We recall the regression function of ψ(Y ) evaluated at X = x, for \(\psi \in {\mathscr{F}}\) and \(\mathbf { x} \in \mathcal I_{\alpha }\), given by
when Y is right-censored. To estimate r(1)(ψ,⋅), we make use of the Inverse Probability of Censoring Weighted (I.P.C.W.) estimators have recently gained popularity in the censored data literature (see Kohler et al. (2002), Carbonez et al. (1995), Brunel and Comte (2006)). The key idea of I.P.C.W. estimators is as follows. Introduce the real-valued function Φψ(⋅,⋅) defined on \(\mathbb {R}^{2}\) by
Assuming the function G(⋅) to be known, first note that Φψ(Yi,Ci) = δiψ (Zi)/(1 − G(Zi)) is observed for every 1 ≤ i ≤ n. Moreover, under the Assumption (\({\mathscr{I}}\)) below,
- (\({\mathscr{I}}\)):
-
C and (Y,X) are independent.
We have
Therefore, any estimate of \(r^{(1)}({\Phi }_{\psi },\cdot )\), which can be built on fully observed data, turns out to be an estimate for r(1)(ψ,⋅) too. Thanks to this property, most statistical procedures known to provide estimates of the regression function in the uncensored case can be naturally extended to the censored case. For instance, kernel-type estimates are particularly easy to construct. Set, for \(\mathbf { x}\in \mathcal {I}\), h ≥ ln, 1 ≤ i ≤ n,
We assume that h satisfies (H.1). In view of Eqs. 3.1, 3.2, and 3.3, whenever G(⋅) is known, a kernel estimator of r(1)(ψ,⋅) is given by
The function G(⋅) is generally unknown and has to be estimated. We will denote by \(G^{*}_{n}(\cdot )\) the Kaplan-Meier estimator of the function G(⋅) (Kaplan and Meier, 1958). Namely, adopting the conventions
and 00 = 1 and setting
we have
Given this notation, we will investigate the following estimator of r(1)(ψ,⋅)
refer to Kohler et al. (2002) and Maillot and Viallon (2009). Adopting the convention 0/0 = 0, this quantity is well defined, since \(G_{n}^{*}(Z_{i})=1\) if and only if Zi = Z(n) and δ(n) = 0, where Z(k) is the k th ordered statistic associated with the sample (Z1,…,Zn) for k = 1,…,n and δ(k) is the δj corresponding to Zk = Zj. When the variable of interest is right-censored, functional of the (conditional) law can generally not be estimated on the complete support (see Brunel and Comte 2006). To obtain our results, we will work under the following assumptions.
- (A.1):
-
, where τ < TH and \({\mathscr{F}}_{1}\) is a pointwise measurable class of real measurable functions defined on \(\mathbb {R}\) and of type VC.
- (A.2):
-
The class of functions \({\mathscr{F}}\) has a measurable and uniformly bounded envelope function Υ with,
$$ {\Upsilon}(y_{1},\ldots,y_{m})\geq \sup_{\psi \in \mathscr{F}}\mid\psi (y_{1},\ldots,y_{m})\mid , y_{i}\leq T_{H}.$$ - (A.3):
-
The class of functions \({\mathscr{M}}\) is relatively compact concerning the sup- norm topology on \(\mathcal I_{\alpha }^{m}\).
In what follows, we will study the uniform convergence of \(\widetilde {m}^{*}_{\psi ,n,h}(\mathbf {x})\) centred by the following centring factor
This choice is justified by the fact that, under hypothesis (\({\mathscr{I}}\)) we have
Let us assume the following conditions.
- (H.1):
-
h ↓ 0,0 < h < 1, and \(n h^{d} \uparrow \infty \);
- (H.2):
-
\( n h^{d} / \log n \rightarrow \infty \) as \(n \rightarrow \infty ;\)
- (H.3):
-
\( \log \left (1 / h\right ) / \log \log n \rightarrow \infty \) as \(n \rightarrow \infty ;\)
We now have all the ingredients to state the result corresponding to the censored case. Let \(\left \{h_{n}^{\prime }\right \}_{n \geq 1}\) and \(\left \{h_{n}^{\prime \prime }\right \}_{n \geq 1}\) be two sequences of positive constants fulfilling Assumptions (H.1-H.3) with
Bouzebda and El-hadjali (2020) showed under assumptions (A.1–3), for m = 1, (I), Assumption 3, for any kernel K(⋅) satisfying Assumptions 1 and 2, with probability at least 1 − δ,
for some positive constant \(\mathfrak {C}_{4}\). A right-censored version of an unconditional U-statistic with a kernel of degree m ≥ 1 is introduced by the principle of a mean preserving reweighting scheme in Datta et al. (2010). Stute and Wang (1993) have proved almost sure convergence of multi-sample U-statistics under random censorship and provided application by considering the consistency of a new class of tests designed for testing equality in distribution. To overcome potential biases arising from right-censoring of the outcomes and the presence of confounding covariates, Chen and Datta (2019) proposed adjustments to the classical U-statistics. Yuan et al. (2017) proposed a different way in the estimation procedure of the U-statistic by using a substitution estimator of the conditional kernel given the observed data. To our best knowledge, the problem of the estimation of the conditional U-statistics was opened up to the present, and it gives and main motivation to the study of this section. A natural extension of the function defined in Eq. 3.1 is given by
From this, we have an analogous relation to Eq. 3.2 given by
An analogue estimator to Eq. 1.2 in the censored case is given by
where, for i = (i1,…,im) ∈ I(m,n),
The estimator that we will investigate is given by
Theorem 3.1.
Let \(a_{n} = c((\log n/n)^{1-2/p})^{1/dm}\) for c > 0. If the class of functions \({\mathscr{F}}\) is bounded, in the sense (A.2). Fix ε ∈ (0,dvol). Further, if dvol = 0 or under Assumption 3, ε can be 0. Suppose that
Then we can infer, under the above mentioned assumptions on \({\mathscr{K}}\), (A.1-2) (\({\mathscr{I}}\)), that for all c > 0 and 0 < b0 < 1, there exists a constant \(0<\mathfrak {C}_{5}<\infty \) such that
Proposition 3.2.
Under assumptions (A.1-3), (\({\mathscr{I}}\)) and for any kernel K(⋅) satisfying Assumptions 1, 2 and 3. Let \(\left \{h_{n}^{\prime }\right \}_{n \geq 1}\) and \(\left \{h_{n}^{\prime \prime }\right \}_{n \geq 1}\) be two sequences of positive constants fulfilling Assumptions (H.1-H.3) with
With probability at least 1 − δ, there exists a constant \(0<\mathfrak {C}_{6}<\infty \) such that
3.1 Examples of U-statistics
Example 3.3.
Let \(\widehat {{Y}_{1}{Y}_{2}}\) denote the oriented angle between Y1,Y2 ∈ T, T is the circle of radius 1 and center 0 in \(\mathbb {R}^{2}\). Let:
Silverman (1978) has used this kernel to propose a U-process to test uniformity on the circle.
Example 3.4.
Hoeffding (1948) introduced the parameter
where \(D(y_{1}, y_{2})=F(y_{1}, y_{2})-F(y_{1}, \infty ) F(\infty , y_{2})\) and F(⋅,⋅) is the distribution function of Y1 and Y2. The parameter △ has the property that △ = 0 if and only if Y1 and Y2 are independent. From Lee (1990), an alternative expression for △ can be developed by introducing the functions
and
We have
We have
The corresponding U-statistics may be used to test the conditional independence.
Example 3.5.
For m = 3, let , the corresponding U-Statistic corresponds to the Hollander-Proschan test-statistic (Hollander and Proschan, 1972).
Example 3.6.
For
we obtain the variance of Y.
Example 3.7.
For
we obtain:
Example 3.8.
Let Y = (Y1,Y2) such that Y2 is a smooth curve, Y2 ∈ L2([0,1]) and \(Y_{1} \in \mathbb {R}\) has a continuous distribution. For
which can be used to treat the problem of testing for conditional association between a functional variable belonging to Hilbert space and a scalar variable. More precisely, this gives the conditional Kendall’s Tau type statistics.
4 The Bandwidth Selection Criterion
Many methods have been established and developed to construct, in asymptotically optimal ways, bandwidth selection rules for nonparametric kernel estimators especially for Nadaraya-Watson regression estimator we quote among them Hall (1984), Härdle and Marron (1985). This parameter has to be selected suitably, either in the standard finite-dimensional case, or in the infinite dimensional framework for ensuring good practical performances. Following Dony and Mason (2008), the leave-one-out cross-validation procedure allows to define, for any fixed i = (i1,…,im) ∈ I(m,n):
where
In order to minimize the quadratic loss function, we introduce the following criterion, we have for some (known) non-negative weight function \(\mathcal {W}(\cdot ):\)
where
A natural way for choosing the bandwidth is to minimize the precedent criterion, so let’s choose \(\widehat {h}_{n} \in [a_{n},b_{n}]\) minimizing among h ∈ [an,bn]:
we can conclude, by Corollary 2.9, that :
The main interest of our results is the possibility to derive the asymptotic properties of our estimate even if the bandwidth parameter is a random variable, like in the last equation. One can replace (4.2) by
where
In practice, one takes for i ∈ I(m,n), the uniform global weights \(\widetilde {\mathcal {W}}\left (\mathbf {X}_{\mathbf {i}}\right )= 1\), and the local weights
For sake of brevity, we have just considered the most popular method, that is, the cross-validated selected bandwidth. This may be extended to any other bandwidth selector such as the bandwidth based on Bayesian ideas (Shang, 2014).
4.1 Discrimination
Now, we apply the results to the problem of discrimination described in Section 3 of Stute (1994b), refer to also to Stute (1994a). We will use a similar notation and setting. Let φ(⋅) be any function taking at most finitely many values, say 1,…,M. The sets
then yield a partition of the feature space. Predicting the value of φ(Y1,…, Ym) is tantamount to predicting the set in the partition to which (Y1,…,Ym) belongs. For any discrimination rule g, we have
where
The above inequality becomes equality if
g0(⋅) is called the Bayes rule, and the pertaining probability of error
is called the Bayes risk. Each of the above unknown function mj’s can be consistently estimated by one of the methods discussed in the preceding sections. Let, for 1 ≤ j ≤ M,
Set
Let us introduce
Then, one can show that the discrimination rule g0,n(⋅) is asymptotically Bayes’ risk consistent
5 Concluding Remarks and Future Works
In the present work, we have used general methods based upon empirical process techniques to prove uniform in bandwidth consistency for kernel-type estimators of the conditional U-statistics. We have considered an extended setting when the dimension can be lower than the ambient dimension. In addition, our work complements the paper Kim et al. (2018) by considering other examples of kernel estimates. Our proof relies on the work of Dony and Mason (2008). Our results extend and complement the last cited reference by establishing the convergence rate adaptive to the volume dimension. Our results are especially useful to establish uniform consistency of data-driven bandwidth kernel-type function estimators. The interest in doing so would be to extend our work to k-nearest neighbours estimators. Presently it is beyond reasonable hope to achieve this program without new technical arguments. We will not treat the uniform consistency of such estimators in the present paper, and leave this for future investigation.
6 Mathematical Developments
This section is devoted to the proof of our results. The previously defined notation continues to be used in what follows.
Our main tool to analyze \(u_{n}^{(m)}(g,h,\mathbf { t})\) will be the Hoeffding decomposition, which we recall here for the reader’s convenience.
6.1 Hoeffding Decomposition
The Hoeffding decomposition, Hoeffding (1948), states the following, which is easy to check,
where the k th Hoeffding projection for a (symmetric) function \(L:S^{m}\times S^{m} \rightarrow \mathbb {R}\) with respect to \(\mathbb {P}\) is defined for xk = (x1,…,xk) ∈ Sk and yk = (y1,…,yk) ∈ Sk as
where \(\mathbb {P}\) is any probability measure on \((S, \mathcal {S})\) and for measures \(\mathbb {Q}_{i}\) on S we have
Considering (Xi,Yi),i ≥ 1, i.i.d.-\(\mathbb {P}\) and assuming \(L \in L_{2}(\mathbb {P}^{m})\), this is an orthogonal decomposition and
where we denote Xk and Yk for (X1,…,Xk) and (Y1,…,Yk), respectively. Thus the kernels πkL are canonical for \(\mathbb {P}\). Also, πk,k ≥ 1, are nested projections, that is, πk ∘ πl = πk if k ≤ l, and
For more details, consult de la Peña and Giné (1999). The proofs of our results are largely inspired from Dony and Mason (2008), Kim et al. (2018) and Bouzebda and El-hadjali (2020).
6.2 Proof of Theorem 2.5: the Bounded Case
6.3 Linear Term
To establish the relation (2.9), we need to study the linear term (the first term) of Eq. 6.1, given by
Keeping in mind the fact that the class \({\mathscr{F}}_{q}\) is a VC-type class of functions with an envelope function F and the class \({\mathscr{K}}\) is a VC-type with envelope κ, which implies that the class of functions on \(\mathbb {R}^{qm}\times \mathbb {R}^{dm}\) given by \(\{h^{dm}G_{g,h,\mathbf { t}}(\cdot ,\cdot ):g \in {\mathscr{F}}_{q},\mathbf {t} \in \mathbb {R}^{dm}\}\) is of VC-type (via Lemma A.1 in Einmahl and Mason (2000)), as well as the class
for which we denote the VC-type characteristics by A and v, and the envelope function by
By considering the following class of functions on \(\mathbb {R}^{dk}\times \mathbb {R}^{qk}\), for k = 1,…,m,
and following Giné and Mason (2007a) one can show that each class \(\mathcal {G}^{(k)}\) is of VC-type with characteristics A and v and envelope function
Recall that the sample (Xi,Yi),1 ≤ i ≤ n is i.i.d. and from the definition of the Hoeffding projections, for all \((x,y) \in \mathbb {R}^{d} \times \mathbb {R}^{q}\), we get
Introduce the following function on \(\mathbb {R}^{d} \times \mathbb {R}^{q}\):
Making use of this notation, we can write
For all \(g \in {\mathscr{F}}_{q}\), h ≥ ln and \(\mathbf {t} \in \mathbb {R}^{dm}\), the linear term of the decomposition in Eq. 6.1 times hdm is given by
where we recall that the last expression is the empirical process αn(⋅) based on the sample (X1,Y1),…,(Xn,Yn) and we set for \(\mathbf {t}\in \mathbf {I}\subset \mathbb {X}\), \(g \in {\mathscr{F}}_{q}\) and h ≥ ln the class of normalised functions on \(\mathbb {R}^{dm} \times \mathbb {R}^{qm}\),
Now, we have to bound Sg,h,t. From Eq. 2.8, we get
In order to bound the VC dimension of \( \mathcal {S}_{n}\), we remark that \( \mathcal {S}_{n}=m\mathcal {G}^{(1)}\) is VC-type with characteristics A and v as defined in Eq. 6.5 for k = 1. For reader convenience, we give more details. Let us give the bound for the VC dimension of \(\mathcal {S}_{n}\). Fix \(\eta <l_{n}^{-dm}\left \| S_{g,h,\mathbf {t}}\right \|_{\infty }\) and a probability measure \(\mathbb {Q}\) on \(\mathbb {R}^{d}\). Suppose
is covered by balls of the form
and \(\left (\mathcal {S}_{n}, \mathbb {L}_{2}(\mathbb {Q})\right )\) is covered b
where
For 1 ≤ i ≤ N1,1 ≤ j ≤ N2 and 1 ≤ k ≤ N3, we let
Also, choose \(b_{0}>\left (\frac {\sqrt {n}\eta }{2\left \|S_{g,h,\mathbf {t}}\right \|_{\infty }}\right )^{-1/dm},\) \( \mathbf {t}_{0}\in \mathbf {I}, g_{0}\in {\mathscr{F}}_{q}\) and let
We will show that
For the first case when \(h\leq \left (\frac {\sqrt {n}\eta }{2\left \|S_{g,h,\mathbf {t}}\right \|_{\infty }}\right )^{-1/dm}\), find hi, Kj and gk with
Then the distance between \(\frac {1}{\sqrt {n}h^{dm}}S_{g,h,\mathbf {t}}\) and \(\frac {1}{\sqrt {n}h_{i}^{dm}}S_{j,k}\) is upper bounded as follows
Now the first term of Eq. 6.9 is upper bounded as
Also, the second term of Eq. 6.9 is upper bounded as
The last term of Eq. 6.9 is upper bounded as
By combining the Eqs. 6.10, 6.11 and 6.12 to 6.9, we readily obtain the following bound
For the second case when \(h>\left (\frac {\sqrt {n}\eta }{2\left \|S_{g,h,\mathbf {t}}\right \|_{\infty }}\right )^{-1/dm}\), we have
holds, and hence
Therefore Eq. 6.8 is shown. Hence by combining Eqs. F.ii, F.iii and 2.8 with Lemma 9.9, p.160 of Kosorok (2008), gives that every probability measure \(\mathbb {Q}\) on \(\mathbb {R}^{d}\) and for every \(\eta \in \left (0, \sqrt {n}h^{-dm}\left \|S_{g,h,\mathbf {t}}\right \|_{\infty }\right )\), the covering number \(\mathcal {N}(\mathcal {S}_{n},\eta )\) is upper bounded as
for some finite constant \(0<A<\infty \). For p > 2, note that assumption (2.8) implies that
From Lemma 2.3 and using Jensen’s inequality, we observe that, for p ≥ k
where
Then, we have
By Hölder inequality and using once more Lemma 2.3 we obtain
where
Hence, from Lemma 2.3, we have
where
Then, we readily obtain
Now from Eq. 6.15, applying Theorem 7.1 to the class \(\mathcal {S}_{n}\) defined in Eq. 6.7 gives that
is upper bounded with probability at least 1 − δ as
Using the condition
we conclude that
6.4 The Other Terms of Eq. 6.1
We will follow the steps of the proof of Dony and Mason (2008). Now we consider the other terms of the Hoeffding decomposition (6.1) and show that is almost surely upper bounded, that is, for each k = 2,…,m,
By the fact that \(nl_{n}^{dm}=c^{dm}\log n\), this will be established if we can obtain that for each k = 2,…,m,
To establish the uniform in bandwidth convergence rates, we have to use a blocking argument and a decomposition of the interval [ln,b0], for b0 large enough, into smaller intervals. For this, set nℓ = 2ℓ,ℓ ≥ 0 and consider the intervals \({\mathscr{H}}_{\ell , j}:=[h_{\ell ,j-1},h_{\ell ,j}]\) where the boundaries are given by \(h^{m}_{\ell , j}:=2^{j}l_{n_{\ell }}^{m}\). By setting
remark that
implying, in particular, that \(L(\ell )\leq 2\log n_{\ell }\). This fact will be used tacitly to conclude some crucial steps of the proofs. Next, for 1 ≤ j ≤ L(ℓ), consider the class of functions on \(\mathbb {R}^{dm}\times \mathbb {R}^{qm}\),
as well the class on \(\mathbb {R}^{dm}\times \mathbb {R}^{qm}\),
where Mk = 2kκmM. Clearly, each class \(\mathcal {G}_{\ell ,j}\) is of VC-type with the same characteristics as \(\mathcal {G}^{k}\) (and thus as \(\mathcal {G}\)) with envelope function \(M_{k}^{-1}F_{k}\), where Fk is the envelope function of \(\mathcal {G}^{k}\). Notice that from Eqs. 2.8 and 6.6,
and hence each function in \(\mathcal {G}_{\ell ,j}^{(k)}\) is bounded by 1. Define now for nℓ− 1 ≤ n ≤ nℓ,ℓ = 1,2,…,
From Theorem 4 of Giné and Mason (2007b) as in Dony and Mason (2008), we get for c = 1/2,r = 2 and all x > 0 that for any ℓ ≥ 1,
We shall apply an exponential inequality and a moment bound for U-statistics, due to, respectively, de la Peña and Giné (1999) and Giné and Mason (2007b), on the class \(\mathcal {G}_{\ell ,j}(k)\) to bound (6.20). To use these results, we must first derive some bounds. First, it is readily checked that
for all nℓ− 1 < n ≤ nℓ. Second, notice that in Assumption 2, the kernel K(⋅) is assumed to be bounded by κ and, for notational convenience in the proofs, to have support in [− 1/2,1/2], so that by assumption (2.8) and Mk = 2kκmM, for \(H\in \mathcal {G}_{\ell ,j}^{(k)}\), we have by Eqs. 6.2 and 6.14,
For \(D_{m,k}= \widetilde { C}_{k,\mathbb {P},K,\varepsilon }^{m}4^{-k}\kappa ^{-2m}\), this gives us that
Since πkπkL = πkL for all k ≤ 1, we can now apply Corollary 1 of Giné and Mason (2007b) to the class \(\mathcal {G}_{\ell ,j}^{(k)}\) with σ2 as in Eq. 6.22 and easily obtain that for some constant Ak,
To control the probability term in Eq. 6.20, we shall apply an exponential inequality to the same class \(\mathcal {G}_{\ell ,j}^{(k)}\) recall that each \(H \in \mathcal {G}_{\ell ,j}^{(k)}\) is bounded by 1. Setting
where \(C_{1,k}< \infty \), Theorem 5.3.14 of de la Peña and Giné (1999) gives us constants C2,k,C3,k and C4,k such that for j = 1,…,L(ℓ) and for any ρ > 1,
plugging the bounds Eqs. 6.23 and 6.25 into Eq. 6.20, we then get for some C5,k > 0, and ρ ≥ 2 and ℓ large enough,
Finally, note also that
for some Ck > 0. Therefore, by Eq. 6.18, for each k = 2,…,m and ℓ large enough,
where λn,k was defined as in Eq. 6.24. Now, recall that \(L(\ell )\leq 2\log (n_{\ell })\). Then Eq. 6.26 applied with ρ ≥ (2 + γ)/C5,k, γ > 0 and in combination with the above inequality and the obvious bound
valid for all nℓ− 1 < n ≤ nℓ, implies for C6,k ≥ 2ρk/2CkMkC1,k and for the choice δ = 2−ℓ+ 1that for k = 2,…,m,
This proves, via Borel-Cantelli, that Eq. 6.17 holds, which obviously implies Eq. 6.16 and hence complete the proof of Theorem 2.5.
6.5 Proof of Theorem 2.6: the Unbounded Case
To prove this theorem, we will need to truncate the conditional U-statistic un(g,h,t). If condition (2.8) is not satisfied, we consider bandwidths lying in the smaller interval \({\mathscr{H}}_{n_{\ell }}^{\prime }=[a_{n_{\ell }}^{\prime }, b_{0}]\), that may be divided into subintervals as follows
where the boundaries are given by \(h^{'dm}_{\ell , j}:=2^{j}a_{n_{\ell }}^{'dm}\). Note that it is straightforward to show that Eq. 6.18 remains valid if we replace hℓ,j by \(h^{\prime }_{\ell ,j}\). In particular, we still have \(L(\ell )\leq 2\log {n_{\ell }}\), where L(ℓ) is now defined as
Recall that nℓ = 2ℓ,ℓ ≥ 0, and set for, ℓ ≥ 1,
For an arbitrary 𝜖 > 0, we truncate each function \(\mathcal {G}\), either the envelope function as follows
where \(\tilde {F}\) is the symmetric envelope function of the class \(\mathcal {G}\) as defined in Eq. 6.4. un(g,h,t) can then also be decomposed for any nℓ− 1 < n ≤ nℓ since, from Eq. 2.11,
The term \(u_{n}^{(\ell )}(g,h,\mathbf {t})\) will be called the truncated part and \(\tilde {u}_{n}^{(\ell )}(g,h,\mathbf {t})\) the remainder part. To prove Theorem (2.6), we shall apply the Hoeffding decomposition to the truncated part and analyze each of the terms separately, while the remaining part can be treated directly using simple arguments based on standard inequalities. Note, for further use, that
6.5.1 Truncated Part
First, note that by Hoeffding decomposition (6.1), we need to consider the terms of
We shall start with the linear term in this decomposition. Following the same reasoning as in the previous section, we can show that \(\pi _{1}\overline {G}^{(\ell )}_{g,h, \mathbf {t}}\) is a centered conditional expectation and that the first term of Eq. 6.1 can be written as an empirical process based on the sample (X1,Y1),…,(Xn,Yn) and indexed by the class of functions
where \(h\in {\mathscr{H}}_{n_{\ell }}^{\prime }\) was defined at the beginning of this section and where
To show that \( \mathcal {S}^{\prime }_{\ell }\) is a VC-class, introduce the class of functions of \((\mathbf {x}, \mathbf {y})\in \mathbb {R}^{dm}\times \mathbb {R}^{qm}\),
Since both \(\mathcal {G}^{\prime }\) as defined below
and the class of functions of \(\mathbf {y} \in \mathbb {R}^{qm}\) given by are of VC-type (and note that \(\mathcal {I}\) has a bounded envelope function), we can apply Lemma A.1 in Einmahl and Mason (2000) to conclude that \(\mathcal {C}\) is also of VC-type. Therefore, so is the class of functions \(m \mathcal {C}^{(1)}\) on \(\mathbb {R}^{d+q}\), where \(\mathcal {C}^{(1)}\) consists of the π1 -projections of the functions in the class \(\mathcal {C}\). Thus, we see that \( \mathcal {S}^{\prime }_{\ell } \subset m \mathcal {C}^{(1)}\) and hence \( \mathcal {S}_{\ell }^{\prime }\) is of VC -type with the same characteristics as \(m \mathcal {C}^{(1)}\). Now, to find an envelope function for \( \mathcal {S}_{\ell }^{\prime }\), set \(\mathbf {t}_{j}:=\left (t_{1}, \ldots , t_{j-1}, t_{j+1}, \ldots , t_{m}\right ) \in \mathbb {R}^{d(m-1)}\) and Zj(u) := (Z1,…,Zj− 1,u,Zj+ 1,…, \(Z_{m}) \in \mathbb {R}^{qm}\) for \(u \in \mathbb {R}^{q}\) and \(\mathbf {Z} \in \mathbb {R}^{qm}\). We can then rewrite the function \(S_{g, h, \mathbf {t}}^{(\ell )}(x, y) \in \mathcal {S}_{\ell }^{\prime }\) as
where \(\mathbf {X}^{*}=\left (X_{2}, \ldots , X_{m}\right ) \in \mathbb {R}^{d(m-1)}\) and where (with a little abuse of notation here) the product kernel in (K.iii) is now defined for d(m − 1) -dimensional vectors, that is, \(\widetilde {K}(\mathbf {u})={\prod }_{i=1}^{m-1} K\left (u_{i}\right )\), \(\mathbf {u} \in \mathbb {R}^{d(m-1)} \). Hence, we can bound \(S_{g, h, \mathbf {t}}^{(\ell )}(x, y) \in \mathcal {S}_{\ell }^{\prime }\) simply as
We shall now apply the moment bound in Theorem 7.3 to the subclasses
where \({\mathscr{H}}_{\ell , j}^{\prime }\) was defined in Eq. 6.29. Since \( \mathcal {S}_{\ell , j}^{\prime } \subset \mathcal {S}_{\ell }^{\prime }\) for j = 1,…,L(ℓ), all of these subclasses are of VC-type, with the same envelope function and characteristics as the class \(m \mathcal {C}^{(1)}\) (which is independent of ℓ), verifying (ii) in Theorem 7.3. For (i), recall that although all of the terms of the envelope function Gm(x,y) are different, their expectations are the same. Therefore, writing Y∗ for \(\left (Y_{2}, \ldots , Y_{m}\right )\) and applying Minkowski’s inequality followed by Jensen’s inequality, we obtain from assumption (2.10) the following upper bound for the second moment of the envelope function:
Note, further, that by the symmetry of \(\widetilde {F}\),
so that Jensen’s inequality, the change of variable u = (t −x)/h and the assumption in Eq. 2.10 give the following upper bound for the second moment of any function in \( \mathcal {S}_{\ell }^{\prime }\):
Therefore, with \(\beta \equiv m \mu _{p}^{1 / p}\left (\kappa ^{m} \vee C_{2,\mathbb {P},K,\varepsilon }^{1/2}\right )\), our previous calculations give us that
verifying condition (iii) as well. Finally, recall from Eq. 6.4 that since \(\mathcal {G}\) has envelope function \(\widetilde {F}(\mathbf {y})\), it holds for all \(x, y \in \mathbb {R}^{d+q}\) that
so that by taking ε > 0 small enough, Theorem 7.3 is now applicable. Thus, for an absolute constant \(A_{1}<\infty \), we have
where \(\epsilon _{1}, \ldots , \epsilon _{n_{\ell }}\) are independent Rademacher variables, independent of \(\left (X_{i}, Y_{i}\right ), 1 \leq i \leq n_{\ell }\). Consequently, applying the exponential inequality of Talagrand (1994) to the class \( \mathcal {S}_{\ell , j}^{\prime }\) (see Theorem 7.5 in the Appendix) with \(M=m \varepsilon \gamma _{\ell }^{1 / p}, \sigma _{\mathcal {S}_{l, i}^{\prime }}^{2}=\beta ^{2} h_{\ell , j}^{\prime m(d_{\text {vol}}-\varepsilon )}\) and the moment bound in Eq. 6.34, we get, for an absolute constant \(A_{2}<\infty \) and all t > 0, that
Regarding the application of this inequality with \(t=\rho \lambda _{j}^{\prime }(\ell ), \rho >1\), note that it clearly follows from Eq. 6.31 and the definitions of \(h_{\ell , j}^{\prime }\) as in Eq. 6.29, γℓ as in Eq. 6.31 and \(\lambda _{j}^{\prime }(\ell )\) as in Eq. 6.34 that for all j ≥ 0,
Consequently, Eq. 6.35, when applied with \(t=\rho \lambda _{j}^{\prime }(\ell )\) and any ρ > 1 with ℓ large enough, yields, for suitable constants \(A_{2}^{\prime }, A_{2}^{\prime \prime }\) and A3, the inequality
Keeping in mind that \(m h^{dm} \sqrt {n} U_{n}^{(1)}\left (\pi _{1} \bar {G}_{g, h, \mathrm {t}}^{(\ell )}\right )\) is the empirical process αn \(\left (S_{g, h, \mathrm {t}}^{(\ell )}\right )\) indexed by the class \( \mathcal {S}_{\ell }^{\prime }\) and recalling Eq. 6.18, since dvol − ε < d, we obtain, for ℓ ≥ 1, that
Consequently, recalling once again that \(L(\ell ) \leq 2 \log n_{\ell }\), we can infer from Eq. 6.36 that for some constant \(C_{5}(\rho ) \geq 3 C_{1}\left (A_{1}+\rho \right )\),
The Borel-Cantelli lemma, when combined with this inequality for ρ ≥ (2 + δ)/A3,δ > 0 and with the choice nℓ = 2ℓ, establishes, for some \(C^{\prime }<\infty \) and with probability 1, that
this achieves the control of the first term in Eq. 6.1. We now treat the nonlinear terms. The purpose is to prove that, for k = 2,…,m and with probability 1, all of the other terms of Eq. 6.1 are asymptotically bounded or go to zero at the proper rate, that is
By following the same reasoning as in the bounded case, we define some classes of functions on \(\mathbb {R}^{dm} \times \mathbb {R}^{dm}\) and \(\mathbb {R}^{dk} \times \mathbb {R}^{dk}\),
It is then easily verified that these classes are of VC-type with characteristics that are independent of ℓ and with envelope functions \(\widetilde {F}\) and \(\left (2^{k} \varepsilon \gamma _{\ell }^{1 / p}\right )^{-1} F_{k}\), respectively. The function \(\widetilde {F}\) is defined as in Eq. 6.4 and Fk is determined just as in the proof of Theorem 1 of Giné and Mason (2007a). Note that just as in Eqs. 6.19 and 6.21, by setting
we see that for all k = 2,…,m and nℓ− 1 < n ≤ nℓ,
Consequently, applying Theorem 7.2 with c = 1/2 and r = 2 gives us precisely (6.20) with \(\mathcal {U}_{n}(j, k, \ell )\) and \(\mathcal {U}_{n_{\ell }}(j, k, \ell )\) replaced by \(\mathcal {U}_{n}^{\prime }(j, k, \ell )\) and \(\mathcal {U}_{n_{\ell }}^{\prime }(j, k, \ell )\), respectively. Therefore, the same methodology as in the bounded case will be applied. Note also that, as held for all the functions in \(\mathcal {G}_{\ell , j}^{(k)}\), the functions in \(\mathcal {G}_{\ell , j}^{\prime (k)}\) are bounded by 1 and have second moments that can be bounded by \(h^{m(d_{\text {vol}}-\varepsilon )} D_{m, k}\) for a suitable Dm,k (by arguing as in Eqs. 6.33 and 6.22). Hence, the expression in Eq. 6.22 is also satisfied for functions in \(\mathcal {G}_{\ell , j}^{\prime (k)}\), that is,
Thus, all the conditions for Theorems 7.4 and 7.6 are satisfied so that, after some obvious identifications and modifications, the second part of the proof of Theorem 2.5 (and Eq. 6.26 in particular) gives us, for some C7,k > 0, all j = 1,…,L(ℓ) and any ρ > 2,
with \(y^{\prime *}=C_{1, k}^{\prime } \lambda _{j, k}^{\prime }(\ell )\) for some \(C_{1, k}^{\prime }>0\) and where \(\lambda _{j, k}^{\prime }(\ell )\) is defined as in Eq. 6.24 with hℓ,j replaced by \(h_{\ell , j}^{\prime }\), that is,
Now, to finish the proof of Eq. 6.38, note that, similarly to Eq. 6.27, for some Ck > 0, for nℓ− 1 < n ≤ nℓ
This gives that for some ck > 0,
From Eq. 6.31, we now see that
By the fact that \(\log n / n^{2-k}\) is monotone increasing in n ≥ 2 whenever k ≥ 2, so that for some constant C8,k > 0, we infer that
Therefore, by choosing \(C_{8, k}>2^{k+1} c^{-m / 2} \varepsilon c_{k} C_{1, k}^{\prime }\left ((2+\delta ) / C_{7, k}\right )^{k / 2}\) and noting that by definition \(L(\ell ) \leq 2 \log n_{\ell }\) and \(h_{\ell , j}^{\prime }<2\) for all j = 1,…,L(ℓ), we can infer from Eq. 6.39 with
that
This immediately implies, via Borel-Cantelli, that for all k = 2,…,m and ℓ ≥ 1,
a.s., which obviously implies Eq. 6.38. Finally, recalling the Hoeffding decomposition (6.1), this implies, together with Eq. 6.37, that for some \(C^{\prime \prime }>0\) with probability 1,
6.6 Remainder Part
Consider now the remainder process \(\widetilde {u}_{n}^{(\ell )}(g, h, \mathbf {t})\) based on the unbounded (symmetric) U -kernel given by
where we defined γℓ as in Eq. 6.30. We shall show that this U-process is asymptotically negligible at the rate given in Theorem 2.6. More precisely, we shall prove that as \(\ell \rightarrow \infty \),
Recall that for all \(g \in \mathcal {F}, h \in \left [a_{n}^{\prime }, b_{0}\right ]\) and \(\mathbf {t}, \mathbf {x} \in \mathbb {R}^{m}, \widetilde {F}(\mathbf {y}) \geq h^{m}\left |\bar {G}_{g, h, \mathbf {t}}(\mathbf {x}, \mathbf {y})\right |\), so from the symmetry of \(\widetilde {F}\), it holds that
where is a U -statistic based on the positive and symmetric kernel . Recalling that \(a_{n}^{\prime md}=c^{md}\) \((\log n / n)^{1-2 / p}\), we obtain easily that for all g ∈ \(\mathcal {F}, h \in \left [a_{n}^{\prime }, b_{0}\right ], \mathbf {t} \in \mathbb {R}^{m}\) and some C > 0
Arguing in the same way, since a U -statistic is an unbiased estimator of its kernel, we get that, uniformly in \(g \in \mathcal {F}, h \in \left [a_{n}^{\prime }, b_{0}\right ]\) and \(\mathbf {t} \in \mathbb {R}^{m}\),
From Eq. 6.42, we see that as \(\ell \rightarrow \infty \),
Thus, to finish the proof of Eq. 6.41, it suffices to show that
First, note that from Chebyshev’s inequality and a well-known inequality for the variance of a U-statistic (see Theorem 5.2 of Hoeffding (1948)), we get, for any δ > 0,
Next, in order to establish the finite convergence of the series of the above probabilities, we split the indicator function into two distinct parts determined by whether \(\widetilde {F}(\mathbf {Y})>\) \(n_{\ell }^{1 / p}\) or \(\varepsilon \gamma _{\ell }^{1 / p}<\widetilde {F}(\mathbf {Y}) \leq n_{\ell }^{1 / p}\), and consider the corresponding second moments in Eq. 6.45 separately. In the first case, note that, from Eqs. 2.10 and 6.4, \(\mathbb {E} \widetilde {F}^{p}(\mathbf {Y}) \leq \mu _{p} \kappa ^{p m}(m !)^{p}\) and observe that since p > 2 and nℓ = 2ℓ,
To handle the second case, we shall need the following fact from Einmahl and Mason (2000).
Fact 6.1.
Let \(\left (c_{n}\right )_{n \geq 1}\) be a sequence of positive constants such that cn/n1/s \(\nearrow \infty \) for some s > 0 and let Z be a random variable satisfying
We then have, for any q > s,
Notice that for any p < r ≤ 2p,
Now, set \(Z=\widetilde {F}(\mathbf {Y}), c_{n}=n^{1 / p}\) and q = r in Fact 6.1 and note that \(c_{n} / n^{1 / s} \nearrow \infty \) for any s such that q = r > s > p. Since q = r > s, we can conclude from Fact 6.1 that this last bound is finite. Finally, note that the bound leading to Eq. 6.45 implies that
Consequently, the above results, together with Eq. 6.45, imply via Borel-Cantelli and the arbitrary choice of δ > 0 that Eq. 6.44 holds, which, when combined with Eqs. 6.43 and 6.45, completes the proof of Eq. 6.41. This also completes the proof of Theorem 2.6 since we have already established the result in Eq. 6.40.
6.7 Proof of Theorem 2.7
Theorem 2.7 is essentially a consequence of Theorem 7.7, details are similar to the proof of Dony and Mason (2008) and therefore omitted.
6.8 Proof of Corollary 2.8
We now turn to the proof of Corollary 2.8. We observe the following standard inequalities
We can infer from Theorem 7.7 that
Then, from Theorem 2.5, Eq. 6.46 and fX(⋅) bounded away from zero on J, we get, for some xi1,xi2 > 0 and c large enough in \(a_{n}=c(\log n / n)^{1 /d m}\),
and, for n large enough,
Further, for \(a_{n}^{\prime }\) equalling either ℓn or an, we readily obtain from the assumptions (2.8) or (2.10) on the envelope function that
Hence, we can now use Theorem 2.5 to handle (\(\mathbb {I}\)), while for (I), depending on whether the class \({\mathscr{F}}_{q}\) satisfies Eq. 2.8 or 2.10, we apply Theorem 2.5 or Theorem 2.6, respectively. Taking everything together, we conclude that for c large enough and some \(\mathfrak {C}_{3}>0\), with probability 1,
We readily obtain the assertion of the theorem by choosing appropriately δ. \(\Box \)
6.9 Proof of Proposition 3.2
Recalling the Definition 3.1 of
it is obvious that Φψ is uniformly bounded, in \((y_{1},\ldots ,y_{m}, c_{1},\ldots ,c_{m})\in \mathbb {R}^{2m}\) and \(\psi \in {\mathscr{F}}\), since \({\mathscr{F}}\) is uniformly bounded, ψ(t) = 0 for all t > τ and G(τ) < 1. This property, when combined with the VC property of \({\mathscr{F}}\), ensures that the class of function
verifies Eqs. F.ii, F.iii. Similarly, it can be shown that \({\mathscr{F}}_{\Phi }\) is a pointwise measurable class of functions (F.i). Moreover, by (A.3) and Eq. 3.2, the class
is almost surely relatively compact concerning the sup- norm topology on \(\mathcal I_{\alpha }\). So we can apply Theorem 2.8 with Y = (Y,C) and Ψ = Φψ. The result of Proposition 3.2 is straightforward. \(\Box \)
Lemma 6.2.
Under assumptions of Theorem 3.1, we have with probability one,
6.10 Proof of Lemma 6.2
Recall the following useful lemma.
Lemma 6.3.
Let ai, i = 1,…,k, bi i = 1,…,k be real number
An application of the preceding lemma gives
Since
the kernel K(⋅) is uniformly bounded and
the law of iterated logarithm for \(G_{n}^{*}(\cdot )\) established in Földes and Rejtő (1981) ensures that
By combining the results of Proposition 3.2 and Lemma 6.2, the result of the Theorem 3.1 is immediate by noting that, under the conditions (H.1-3), we have, for n sufficiently large,
Hence the proof is complete.
References
Abrevaya, J. and Jiang, W. (2005). A nonparametric approach to measuring and testing curvature. J. Bus. Econ. Stat. 23, 1–19.
Arcones, M. A. and Giné, E. (1993). Limit theorems for U-processes. Ann. Probab. 21, 1494–1542.
Arcones, M. A. and Giné, E. (1995). On the law of the iterated logarithm for canonical U-statistics and processes. Stoch. Process. Appl. 58, 217–245.
Arcones, M. A. and Wang, Y. (2006). Some new tests for normality based on U-processes. Stat. Probab. Lett. 76, 69–82.
Arcones, M. A., Chen, Z. and Giné, E. (1994). Estimators related to U-processes with applications to multivariate medians: asymptotic normality. Ann. Stat.22, 1460–1477.
Borovkova, S., Burton, R. and Dehling, H. (2001). Limit theorems for functionals of mixing processes with applications to U-statistics and dimension estimation. Trans. Am. Math. Soc. 353, 4261–4318.
Borovskikh, Y. V. (1996). U-statistics in Banach spaces. VSP, Utrecht.
Bouzebda, S. (2012). On the strong approximation of bootstrapped empirical copula processes with applications. Math. Methods Stat. 21, 153–188.
Bouzebda, S. and El-hadjali, T. (2020). Uniform convergence rate of the kernel regression estimator adaptive to intrinsic dimension in presence of censored data. J. Nonparametr. Stat. 32, 864–914.
Bouzebda, S. and Elhattab, I. (2009). A strong consistency of a nonparametric estimate of entropy under random censorship. C. R. Math. Acad. Sci. Paris347, 821–826.
Bouzebda, S. and Elhattab, I. (2010). Uniform in bandwidth consistency of the kernel-type estimator of the Shannon’s entropy. C. R. Math. Acad. Sci. Paris 348, 317–321.
Bouzebda, S. and Elhattab, I. (2011). Uniform-in-bandwidth consistency for kernel-type estimators of Shannon’s entropy. Electron. J. Stat. 5, 440–459.
Bouzebda, S. and Nemouchi, B. (2019). Central limit theorems for conditional empirical and conditional U-processes of stationary mixing sequences. Math. Methods Stat. 28, 169–207.
Bouzebda, S. and Nemouchi, B. (2020). Uniform consistency and uniform in bandwidth consistency for nonparametric regression estimates and conditional U-statistics involving functional data. J. Nonparametr. Stat. 32, 452–509.
Bouzebda, S. and Nemouchi, B. (2022). Weak-convergence of empirical conditional processes and conditional U-processes involving functional mixing data. Stat. Inference Stoch. Process. To appear, pp 1–56.
Bouzebda, S. and Nezzal, A. (2022). Uniform consistency and uniform in number of neighbors consistency for nonparametric regression estimates and conditional U-statistics involving functional data. Jpn. J. Stat. Data Sci. 5, 2, 431–533.
Bouzebda, S., Elhattab, I. and Seck, C. T. (2018). Uniform in bandwidth consistency of nonparametric regression based on copula representation. Statist. Probab. Lett. 137, 173–182.
Bouzebda, S., Elhattab, I. and Nemouchi, B. (2021). On the uniform-in-bandwidth consistency of the general conditional U-statistics based on the copula representation. J. Nonparametr. Stat. 33, 321–358.
Brunel, E. and Comte, F. (2006). Adaptive nonparametric regression estimation in presence of right censoring. Math. Methods Stat. 15, 233–255.
Carbonez, A., Györfi, L. and van der Meulen, E. C. (1995). Partitioning-estimates of a regression function under random censoring. Stat. Decis. 13, 21–37.
Chen, Y. and Datta, S. (2019). Adjustments of multi-sample U-statistics to right censored data and confounding covariates. Comput. Stat. Data Anal. 135, 1–14.
Datta, S., Bandyopadhyay, D. and Satten, G. A. (2010). Inverse probability of censoring weighted U-statistics for right-censored data with an application to testing hypotheses. Scand. J. Stat. 37, 680–700.
de la Peña, V. H. and Giné, E. (1999). Decoupling. Probability and its Applications (New York). Springer, New York. From dependence to independence, Randomly stopped processes. U-statistics and processes. Martingales and beyond.
Deheuvels, P. (2000). Uniform limit laws for kernel density estimators on possibly unbounded intervals. In Recent advances in reliability theory (Bordeaux, 2000), Stat. Ind. Technol., pp 477–492. Birkhäuser, Boston.
Deheuvels, P. and Mason, D. M. (2004). General asymptotic confidence bands based on kernel-type function estimators. Stat. Inference Stoch. Process. 7, 225–277.
Denker, M. and Keller, G. (1983). On U-statistics and v. Mises’ statistics for weakly dependent processes. Z. Wahrsch. Verw. Gebiete 64, 505–522.
Devroye, L. and Lugosi, G. (2001). Combinatorial methods in density estimation Springer Series in Statistics. Springer, New York.
Dony, J. and Mason, D. M. (2008). Uniform in bandwidth consistency of conditional U-statistics. Bernoulli 14, 1108–1133.
Einmahl, U. and Mason, D. M. (2000). An empirical process approach to the uniform consistency of kernel-type function estimators. J. Theor. Probab.13, 1–37.
Einmahl, U. and Mason, D. M. (2005). Uniform in bandwidth consistency of kernel-type function estimators. Ann. Stat. 33, 1380–1403.
Farahmand, A. m., Szepesvári, C. and Audibert, J. -Y. (2007). Manifold-adaptive dimension estimation. In Proceedings of the 24th International Conference on Machine Learning, ICML ’07, pp 265–272. Association for Computing Machinery, New York.
Földes, A. and Rejtő, L. (1981). A LIL type result for the product limit estimator. Z. Wahrsch. Verw. Gebiete 56, 75–86.
Ghosal, S., Sen, A. and van der Vaart, A. W. (2000). Testing monotonicity of regression. Ann. Stat. 28, 1054–1082.
Giné, E. and Mason, D. M. (2007a). Laws of the iterated logarithm for the local U-statistic process. J. Theoret. Probab. 20, 457–485.
Giné, E. and Mason, D. M. (2007b). On local U-statistic processes and the estimation of densities of functions of several sample variables. Ann. Stat.35, 1105–1145.
Györfi, L., Kohler, M., Krzyżak, A. and Walk, H. (2002). A distribution-free theory of nonparametric regression. Springer Series in Statistics. Springer, New York.
Hall, P. (1984). Asymptotic properties of integrated square error and cross-validation for kernel estimation of a regression function. Z. Wahrsch. Verw. Gebiete67, 175–196.
Halmos, P. R. (1946). The theory of unbiased estimation. Ann. Math. Stat. 17, 34–43.
Härdle, W. and Marron, J. S. (1985). Optimal bandwidth selection in nonparametric regression function estimation. Ann. Stat. 13, 1465–1481.
Harel, M. and Puri, M. L. (1996). Conditional U-statistics for dependent random variables. J. Multivariate Anal. 57, 84–100.
Hein, M. and Audibert, J. -Y. (2005). Intrinsic dimensionality estimation of submanifolds in rd. In Proceedings of the 22nd International Conference on Machine Learning, ICML ’05, pp 289–296. Association for Computing Machinery, New York.
Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution. Ann. Math. Stat. 19, 293–325.
Hollander, M. and Proschan, F. (1972). Testing whether new is better than used. Ann. Math. Stat. 43, 1136–1146.
Joly, E. and Lugosi, G. (2016). Robust estimation of U-statistics. Stoch. Process. Appl. 126, 3760–3773.
Kaplan, E. L. and Meier, P. (1958). Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 53, 457–481.
Kégl, B. (2002). Intrinsic dimension estimation using packing numbers. In Proceedings of the 15th International Conference on Neural Information Processing Systems, NIPS’02, pp. 697–704. MIT Press, Cambridge.
Kim, J., Shin, J., Rinaldo, A. and Wasserman, L. (2018). Uniform convergence rate of the kernel density estimator adaptive to intrinsic volume dimension.
Kim, J., Shin, J., Rinaldo, A. and Wasserman, L. (2019). Uniform convergence rate of the kernel density estimator adaptive to intrinsic volume dimension. In Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, (K. Chaudhuri and R. Salakhutdinov, eds.), pp. 3398–3407. PMLR, Long Beach.
Kohler, M., Máthé, K. and Pintér, M. (2002). Prediction from randomly right censored data. J. Multivar. Anal. 80, 73–100.
Koroljuk, V.S. and Borovskich, Y.V. (1994). Theory of U-statistics, volume 273 of Mathematics and Its Applications. Kluwer Academic Publishers Group, Dordrecht. Translated from the 1989 Russian original by P. V. Malyshev and D. V. Malyshev and revised by the authors.
Kosorok, M. R. (2008). Introduction to empirical processes and semiparametric inference. Springer Series in Statistics. Springer, New York.
Lee, A. J. (1990). U-statistics, volume 110 of Statistics: Textbooks and Monographs. Marcel Dekker, Inc., New York. Theory and practice.
Lee, S., Linton, O. and Whang, Y. -J. (2009). Testing for stochastic monotonicity. Econometrica 77, 585–602.
Leucht, A. (2012). Degenerate U- and V-statistics under weak dependence: asymptotic theory and bootstrap consistency. Bernoulli 18, 552–585.
Leucht, A. and Neumann, M. H. (2013). Degenerate U- and V-statistics under ergodicity: asymptotics, bootstrap and applications in statistics. Ann. Inst. Stat. Math. 65, 349–386.
Levina, E. and Bickel, P. J. (2004). Maximum likelihood estimation of intrinsic dimension. In Proceedings of the 17th International Conference on Neural Information Processing Systems, NIPS’04, pp. 777–784. MIT Press, Cambridge.
Ling, N. and Vieu, P. (2018). Nonparametric modelling for functional data: selected survey and tracks for future. Statistics 52, 934–949.
Maillot, B. and Viallon, V. (2009). Uniform limit laws of the logarithm for nonparametric estimators of the regression function in presence of censored data. Math. Methods Stat. 18, 159–184.
Mason, D. M. (2012). Proving consistency of non-standard kernel estimators. Stat. Inference Stoch. Process. 15, 151–176.
Mason, D. M. and Swanepoel, J. W. H. (2011). A general result on the uniform in bandwidth consistency of kernel-type function estimators. TEST 20, 72–94.
Nadaraja, E. A. (1964). On a regression estimate. Teor. Verojatnost. i Primenen. 9, 157–159.
Nolan, D. and Pollard, D. (1987). U-processes: rates of convergence. Ann. Stat. 15, 780–799.
Prakasa Rao, B. L. S. and Sen, A. (1995). Limit distributions of conditional U-statistics. J. Theor. Probab. 8, 261–301.
Schick, A., Wang, Y. and Wefelmeyer, W. (2011). Tests for normality based on density estimators of convolutions. Stat. Probab. Lett. 81, 337–343.
Sen, A. (1994). Uniform strong consistency rates for conditional U-statistics. Sankhyā Ser. A 56, 179–194.
Shang, H. L. (2014). Bayesian bandwidth estimation for a functional nonparametric regression model with mixed types of regressors and unknown error density. J. Nonparametr. Stat. 26, 599–615.
Sherman, R. P. (1993). The limiting distribution of the maximum rank correlation estimator. Econometrica 61, 123–137.
Sherman, R. P. (1994). Maximal inequalities for degenerate U-processes with applications to optimization estimators. Ann. Stat. 22, 439–459.
Silverman, B. W. (1978). Distances on circles, toruses and spheres. J. Appl. Probab. 15, 136–143.
Stute, W. (1991). Conditional U-statistics. Ann. Probab. 19, 812–825.
Stute, W. (1993). Almost sure representations of the product-limit estimator for truncated data. Ann. Stat. 21, 146–156.
Stute, W. (1994a). Lp-convergence of conditional U-statistics. J. Multivar. Anal. 51, 71–82.
Stute, W. (1994b). Universally consistent conditional U-statistics. Ann. Stat. 22, 460–473.
Stute, W. (1996). Symmetrized NN-conditional U-statistics. In Research developments in probability and statistics, pp. 231–237. VSP, Utrecht.
Stute, W. and Wang, J. -L. (1993). Multi-sample U-statistics for censored data. Scand. J. Stat. 20, 369–374.
Talagrand, M. (1994). Sharper bounds for Gaussian and empirical processes. Ann. Probab. 22, 28–76.
van der Vaart, A. W. and Wellner, J. A. (1996). Weak convergence and empirical processes. Springer Series in Statistics. Springer, New York. With applications to statistics.
von Mises, R. (1947). On the asymptotic distribution of differentiable statistical functions. Ann. Math. Stat. 18, 309–348.
Watson, G. S. (1964). Smooth regression analysis. Sankhyā Ser. A26, 359–372.
Yuan, A., Giurcanu, M., Luta, G. and Tan, M. T. (2017). U-statistics with conditional kernels for incomplete data models. Ann. Inst. Stat. Math. 69, 271–302.
Acknowledgments
The authors are indebted to the Editor-in-Chief, Associate Editor and referee for their very generous comments and suggestions on the first version of our article which helped us to improve the content, presentation, and layout of the manuscript.
Funding
Not applicable.
Availability of Data and Materials. Not applicable.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Theorem 7.1.
(Kim et al. 2018) Let \((\mathbb {R}^{d},\mathbb {P})\) be a probability space and let X1,…,Xn be i.i.d. from \(\mathbb {P}\). Let \(\mathcal {F}\) be a class of functions from \(\mathbb {R}^{d}\) to \(\mathbb {R}\) that is uniformly bounded VC-class with dimension ν, i.e., there exists positive numbers A, B such that, for all \(f\in \mathcal {F}\), \(\left \Vert f\right \Vert _{\infty }\leq B\), and for every probability measure \(\mathbb {\mathbb {\mathbb {Q}}}\) on \(\mathbb {R}^{d}\) and for every 𝜖 ∈ (0,B), the covering number \(\mathcal {N}(\mathcal {F},L_{2}(\mathbb {\mathbb {\mathbb {Q}}}),\epsilon )\) satisfies
Let σ > 0 with \(\mathbb {E}_{\mathbb {P}}f^{2}\leq \sigma ^{2}\) for all \(f\in \mathcal {F}\). Then there exists a universal constant C not depending on any parameters such that
is upper bounded with probability at least 1 − δ,
Theorem 7.2 (Theorem 4 of Giné and Mason (2007a)).
Let X1,X2,… be i.i.d. S -valued with probability law \(\mathbb P \). Let \({\mathscr{H}}\) be a P -separable collection of measurable functions \(f: S^{k} \rightarrow \mathbb {R}\) and assume that \({\mathscr{H}}\) is \(\mathbb P\)-canonical (which means that every f in \({\mathscr{H}}\) is \(\mathbb P\) -canonical). Further, assume that
for some r > 1 and let s be the conjugate of r. Then, with Sn Uniform in bandwidth consistency of conditional U -statistics 1131 defined as
we have, for all x > 0 and 0 < c < 1,
Theorem 7.3 (Proposition 1 of Einmahl and Mason (2005)).
Let \(\mathcal {G}\) be a pointwise measurable class of bounded functions with envelope function G such that for some constants C,v ≥ 1 and 0 < σ ≤ β, the following conditions hold:
-
(i)
\(\mathbb {E} G^{2}(X) \leq \beta ^{2}\);
-
(ii)
\(\mathcal {N}(\epsilon , \mathcal {G}) \leq C \epsilon ^{-v}, \quad 0<\epsilon <1 ;\)
-
(iii)
\({\sigma _{0}^{2}}:=\sup _{g \in \mathcal {G}} \mathbb {E} g^{2}(X) \leq \sigma ^{2}\)
-
(iv)
\(\sup _{g \in \mathcal {G}}\|g\|_{\infty } \leq \frac {1}{4 \sqrt {v}} \sqrt {n \sigma ^{2} / \log \left (C_{1} \beta / \sigma \right )}\), where C1 = C1/ν ∨ e.
We then have, for some absolute constant A,
where ε1,…,εn are i.i.d. Rademacher variables, independent of X1,…,Xn.
Theorem 7.4 (Corollary 1 of Giné and Mason (2007a)).
Let \(\mathcal {F}\) be a collection of measurable functions \(f: S^{m} \rightarrow \mathbb {R}\), symmetric in their entries, with absolute values bounded by M > 0, and let P be any probability measure on \((S, \mathcal {S})\left (\right .\) with Xi i.i.d.- \(\left .P\right ) \). Assume that \(\mathcal {F}\) is of V C -type with envelope function F ≡ M and with characteristics A and v. Then, for every \(m \in \mathbb {N}, A \geq e^{m}\)v ≥ 1, there exist constants C1 := C1(m,A,v,M) and C2 = C2(m,A,v,M) such that for k = 1,…,m
assuming
where σ2 is any number satisfying
Theorem 7.5.
(Talagrand, 1994) Let \(\mathcal {G}\) be a pointwise measurable class of functions satisfying
We then have, for all t > 0,
where
and A1,A2 are universal constants.
We now state the exponential inequality that will permit us to control the probability term in (4.6) and which is stated as Theorem 5.3.14 in de la Peña and Giné (1999).
Theorem 7.6 (Theorem 5.3.14 of de la Peña and Giné (1999))).
Let \({\mathscr{H}}\) be a V C -subgraph class of uniformly bounded measurable real-valued kernels H on \(\left (S^{m}, \mathcal {S}^{m}\right )\), symmetric in their entries. Then, for each 1 ≤ k ≤ m, there exist constants \(\left .c_{k}, d_{k} \in \right ] 0, \infty [\) such that, for all n ≥ m and t > 0,,
Theorem 7.7.
(Dony and Mason, 2008) Let I = [a,b] be a compact interval. Suppose that \({\mathscr{H}}\) is a uniformly equicontinuous family of real-valued functions φ on J = [a − η,b + η]d for some d ≥ 1 and η > 0. Further assume that K is an L1-kernel with support in [−B,B]d, with B > 0 satisfying \({\int \limits }_{\mathbb {R}^{d}} K(\mathbf {u}) \mathrm {d} \mathbf {u}=1\). Then, uniformly in \(\varphi \in {\mathscr{H}}\) and for any sequence of positive constants \(b_{n} \rightarrow 0\),
where Kh(z) = h−dK(z/h) and
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bouzebda, S., El-hadjali, T. & Ferfache, A.A. Uniform in Bandwidth Consistency of Conditional U-statistics Adaptive to Intrinsic Dimension in Presence of Censored Data. Sankhya A 85, 1548–1606 (2023). https://doi.org/10.1007/s13171-022-00301-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13171-022-00301-7
Keywords
- Non-parametric estimation
- regression
- conditional empirical processes
- conditional U-processes
- kernel estimation
- functional estimation
- VC-classes.