Kernel Selection in Nonparametric Regression

Halconruy, H.; Marie, N.

doi:10.3103/S1066530720010044

Kernel Selection in Nonparametric Regression

Published: 31 August 2021

Volume 29, pages 32–56, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Mathematical Methods of Statistics Aims and scope Submit manuscript

Kernel Selection in Nonparametric Regression

Download PDF

H. Halconruy^1,2 &
N. Marie^3,2

92 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

In the regression model $Y=b(X)+\sigma(X)\varepsilon$, where $X$ has a density $f$, this paper deals with an oracle inequality for an estimator of $bf$, involving a kernel in the sense of Lerasle et al. [13], selected via the PCO method. In addition to the bandwidth selection for kernel-based estimators already studied in Lacour et al. [12] and Comte and Marie [3], the dimension selection for anisotropic projection estimators of $f$ and $bf$ is covered.

Optimal Kernel Selection for Density Estimation

Nonparametric relative regression for associated random variables

Article 18 March 2016

Kernel regression estimation for incomplete data with applications

Article 02 July 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 INTRODUCTION

Consider $n\in\mathbb{N}^{*}$ independent $\mathbb{R}^{d}\times\mathbb{R}$-valued ($d\in\mathbb{N}^{*}$) random variables $(X_{1},Y_{1}),\dots,(X_{n},Y_{n})$, having the same probability distribution assumed to be absolutely continuous with respect to Lebesgue’s measure, and

$$\widehat{s}_{K,\ell}(n;x):=\frac{1}{n}\sum_{i=1}^{n}K(X_{i},x)\ell(Y_{i});\quad x\in\mathbb{R}^{d},$$

where $\ell:\mathbb{R}\rightarrow\mathbb{R}$ is a Borel function and $K$ is a symmetric continuous map from $\mathbb{R}^{d}\times\mathbb{R}^{d}$ into $\mathbb{R}$. This is an estimator of the function $s:\mathbb{R}^{d}\rightarrow\mathbb{R}$ defined by

$$s(x):=\mathbb{E}(\ell(Y_{1})|X_{1}=x)f(x);\quad\forall x\in\mathbb{R}^{d},$$

where $f$ is a density of $X_{1}$. For $\ell=1$, $\widehat{s}_{K,\ell}(n;.)$ coincides with the estimator of $f$ studied in Lerasle et al. [13], covering Parzen-Rosenblatt’s and projection estimators already deeply studied in the literature (see Parzen [16], Rosenblatt [17], Tsybakov [18], etc.), but for $\ell\not=1$, it covers estimators involved in nonparametric regression. Assume that for every $i\in\{1,\dots,n\}$,

$$Y_{i}=b(X_{i})+\sigma(X_{i})\varepsilon_{i},$$

(1)

where $\varepsilon_{i}$ is a centered random variable of variance $1$, independent of $X_{i}$, and $b,\sigma:\mathbb{R}^{d}\rightarrow\mathbb{R}$ are Borel functions.

If $\ell=\textrm{Id}_{\mathbb{R}}$, $k$ is a symmetric kernel and

$$K(x^{\prime},x)=\prod_{q=1}^{d}\frac{1}{h_{q}}k\left(\frac{x_{q}^{\prime}-x_{q}}{h_{q}}\right)\textrm{\ with\ }h_{1},\dots,h_{d}>0$$
(2)

for every $x,x^{\prime}\in\mathbb{R}^{d}$, then $\widehat{s}_{K,\ell}(n;.)$ is the numerator of the well-known Nadaraya-Watson estimator of the regression function $b$ (see Nadaraya [15] and Watson [20]). Precisely, $\widehat{s}_{K,\ell}(n;.)$ is an estimator of $s=bf$ because $\varepsilon_{1}$ is independent to $X_{1}$ and $\mathbb{E}(\varepsilon_{1})=0$. If $\ell\not=\textrm{Id}_{\mathbb{R}}$, then $\widehat{s}_{K,\ell}(n;.)$ is the numerator of the estimator studied in Einmahl and Mason [7, 8].
If $\ell=\textrm{Id}_{\mathbb{R}}$, $\mathcal{B}_{m_{q}}=\{\varphi_{1}^{m_{q}},\dots,\varphi_{m_{q}}^{m_{q}}\}$ ($m_{q}\in\mathbb{N}^{*}$ and $q\in\{1,\dots,d\}$) is an orthonormal family of $\mathbb{L}^{2}(\mathbb{R})$ and

$$K(x^{\prime},x)=\prod_{q=1}^{d}\sum_{j=1}^{m_{q}}\varphi_{j}^{m_{q}}(x_{q})\varphi_{j}^{m_{q}}(x_{q}^{\prime})$$
(3)

for every $x,x^{\prime}\in\mathbb{R}^{d}$, then $\widehat{s}_{K,\ell}(n;.)$ is the projection estimator on $\mathcal{S}=\textrm{span}(\mathcal{B}_{m_{1}}\otimes\dots\otimes\mathcal{B}_{m_{d}})$ of $s=bf$.

Now, assume that $b=0$ in Model (1): for every $i\in\{1,\dots,n\}$,

$$Y_{i}=\sigma(X_{i})\varepsilon_{i}.$$

(4)

If $\ell(x)=x^{2}$ for every $x\in\mathbb{R}$, then $\widehat{s}_{K,\ell}(n;.)$ is an estimator of $s=\sigma^{2}f$.

These ten last years, several data-driven procedures have been proposed in order to select the bandwidth of Parzen–Rosenblatt’s estimator ($\ell=1$ and $K$ defined by (2)). First, Goldenshluger–Lepski’s method, introduced in [10], which reaches the adequate bias-variance compromise, but is not completely satisfactory on the numerical side (see Comte and Rebafka [5]). More recently, in [12], Lacour, Massart and Rivoirard proposed the PCO (Penalized Comparison to Overfitting) method and proved an oracle inequality for the associated adaptive Parzen-Rosenblatt’s estimator by using a concentration inequality for the U-statistics due to Houdré and Reynaud-Bouret [11]. Together with Varet, they established the numerical efficiency of the PCO method in Varet et al. [19]. Still in the density estimation framework, the PCO method has been extended to bandwidths selection for the recursive Wolverton-Wagner estimator in Comte and Marie [3].

Comte and Marie [4] deal with an oracle inequality and numerical experiments for an adaptive Nadaraya-Watson’s estimator with a numerator and a denominator having distinct bandwidths, both selected via the PCO method. Since the output variable in a regression model has no reason to be bounded, there were significant additional difficulties, bypassed in [4], to establish an oracle inequality for the numerator’s adaptive estimator. Via similar arguments, the present article deals with an oracle inequality for $\widehat{s}_{\widehat{K},\ell}(n;.)$, where $\widehat{K}$ is selected via the PCO method in the spirit of Lerasle et al. [13]. As in Comte and Marie [4], one can deduce an oracle inequality for the adaptive quotient estimator $\widehat{s}_{\widehat{K},\ell}(n;.)/\widehat{s}_{\widehat{L},1}(n;.)$ of $\mathbb{E}(\ell(Y_{1})|X_{1}=\cdot)$, where $\widehat{K}$ and $\widehat{L}$ are both selected via the PCO method.

In addition to the bandwidth selection for kernel-based estimators already studied in [12, 4], the present paper covers the dimension selection for projection estimators of $f$, $bf$ when $Y_{1},\dots,Y_{n}$ are defined by Model (1) with $\ell=\textrm{Id}_{\mathbb{R}}$, and $\sigma^{2}f$ when $Y_{1},\dots,Y_{n}$ are defined by Model (4) with $\ell(x)=x^{2}$ for every $x\in\mathbb{R}$. For projection estimators, when $d=1$, the usual model selection method (see Comte [2, Chapter 2, Section 5]) seems hard to beat. However, when $d>1$ and $K$ is defined by (3), $m_{1},\dots,m_{d}$ are selected via a Goldenshluger–Lepski type method (see Chagny [1]), which has the same numerical weakness than the Goldenshluger–Lepski method for bandwidth selection when $K$ is defined by (2). So, for the dimension selection for anisotropic projection estimators, the PCO method is interesting.

In Section 2, some examples of kernels sets are provided and a risk bound on $\widehat{s}_{K,\ell}(n;.)$ is established. Section 3 deals with an oracle inequality for $\widehat{s}_{\widehat{K},\ell}(n;.)$, where $\widehat{K}$ is selected via the PCO method.

2 RISK BOUND

Throughout the paper, $s\in\mathbb{L}^{2}(\mathbb{R}^{d})$. Let $\mathcal{K}_{n}$ be a set of symmetric continuous maps from $\mathbb{R}^{d}\times\mathbb{R}^{d}$ into $\mathbb{R}$, of cardinality less or equal than $n$, fulfilling the following assumption.

Assumption 2.1. There exists a deterministic constant $\mathfrak{m}_{\mathcal{K},\ell}>0$, not depending on $n$, such that

(1) for every $K\in\mathcal{K}_{n}$,

$$\sup_{x^{\prime}\in\mathbb{R}^{d}}||K(x^{\prime},.)||_{2}^{2}\leqslant\mathfrak{m}_{\mathcal{K},\ell}n;$$

(2) for every $K\in\mathcal{K}_{n}$,

$$||s_{K,\ell}||_{2}^{2}\leqslant\mathfrak{m}_{\mathcal{K},\ell}$$

with

$$s_{K,\ell}:=\mathbb{E}(\widehat{s}_{K,\ell}(n;.))=\mathbb{E}(K(X_{1},.)\ell(Y_{1}));$$

(3) for every $K,K^{\prime}\in\mathcal{K}_{n}$,

$$\mathbb{E}(\langle K(X_{1},.),K^{\prime}(X_{2},.)\ell(Y_{2})\rangle_{2}^{2})\leqslant\mathfrak{m}_{\mathcal{K},\ell}\overline{s}_{K^{\prime},\ell}$$

with

$$\overline{s}_{K^{\prime},\ell}:=\mathbb{E}(||K^{\prime}(X_{1},.)\ell(Y_{1})||_{2}^{2}).$$

(4) for every $K\in\mathcal{K}_{n}$ and $\psi\in\mathbb{L}^{2}(\mathbb{R}^{d})$,

$$\mathbb{E}(\langle K(X_{1},.),\psi\rangle_{2}^{2})\leqslant\mathfrak{m}_{\mathcal{K},\ell}||\psi||_{2}^{2}.$$

The elements of $\mathcal{K}_{n}$ are called kernels. Let us provide two natural examples of kernels sets.

Proposition 2.2. Consider

$$\mathcal{K}_{k}(h_{\textrm{min}}):=\left\{(x^{\prime},x)\mapsto\prod_{q=1}^{d}\frac{1}{h_{q}}k\left(\frac{x_{q}^{\prime}-x_{q}}{h_{q}}\right);\ h_{1},\dots,h_{d}\in\mathcal{H}(h_{\textrm{min}})\right\},$$

where $k$ is a symmetric kernel (in the usual sense), $h_{\textrm{min}}\in[n^{-1/d},1]$ and $\mathcal{H}(h_{\textrm{min}})$ is a finite subset of $[h_{\textrm{min}},1]$. The kernels set $\mathcal{K}_{k}(h_{\textrm{min}})$ fulfills Assumption 2.1 and, for any $K\in\mathcal{K}_{k}(h_{\textrm{min}})$ (i.e., defined by (2) with $h_{1},\dots,h_{d}\in\mathcal{H}(h_{\textrm{min}})$),

$$\overline{s}_{K,\ell}=||k||_{2}^{2d}\mathbb{E}(\ell(Y_{1})^{2})\prod_{q=1}^{d}\frac{1}{h_{q}}.$$

Proposition 2.3. Consider

$$\mathcal{K}_{\mathcal{B}_{1},\dots,\mathcal{B}_{n}}(m_{\textrm{max}}):=\left\{(x^{\prime},x)\mapsto\prod_{q=1}^{d}\sum_{j=1}^{m_{q}}\varphi_{j}^{m_{q}}(x_{q})\varphi_{j}^{m_{q}}(x_{q}^{\prime});\ m_{1},\dots,m_{d}\in\{1,\dots,m_{\textrm{max}}\}\right\},$$

where $m_{\textrm{max}}^{d}\in\{1,\dots,n\}$ and, for every $m\in\{1,\dots,n\}$, $\mathcal{B}_{m}=\{\varphi_{1}^{m},\dots,\varphi_{m}^{m}\}$ is an orthonormal family of $\mathbb{L}^{2}(\mathbb{R})$ such that

$$\sup_{x^{\prime}\in\mathbb{R}}\sum_{j=1}^{m}\varphi_{j}^{m}(x^{\prime})^{2}\leqslant\mathfrak{m}_{\mathcal{B}}m,$$

with $\mathfrak{m}_{\mathcal{B}}>0$ not depending on $m$ and $n$, and such that one of the two following conditions is satisfied:

$$\mathcal{B}_{m}\subset\mathcal{B}_{m+1};\quad\forall m\in\{1,\dots,n-1\}$$

(5)

or

$$\overline{\mathfrak{m}}_{\mathcal{B}}:=\sup\{|\mathbb{E}(K(X_{1},x))|;\ K\in\mathcal{K}_{\mathcal{B}_{1},\dots,\mathcal{B}_{n}}(m_{\textrm{max}})\quad\textit{and}\quad x\in\mathbb{R}^{d}\}\textit{ is finite and doesn't depend on $n$.}$$

(6)

The kernels set $\mathcal{K}_{\mathcal{B}_{1},\dots,\mathcal{B}_{n}}(m_{\textrm{max}})$ fulfills Assumption 2.1 and, for any $K\in\mathcal{K}_{\mathcal{B}_{1},\dots,\mathcal{B}_{n}}(m_{\textrm{max}})$ (i.e., defined by (3) with $m_{1},\dots,m_{n}\in\{1,\dots,m_{\textrm{max}}\}$),

$$\overline{s}_{K,\ell}\leqslant\mathfrak{m}_{\mathcal{B}}^{d}\mathbb{E}(\ell(Y_{1})^{2})\prod_{q=1}^{d}m_{q}.$$

Remark 2.4. For the sake of simplicity, the present paper focuses on $\mathcal{K}_{\mathcal{B}_{1},\dots,\mathcal{B}_{n}}(m_{\textrm{max}})$, but Proposition 2.3 is still true for the weighted projection kernels set

$$\mathcal{K}_{\mathcal{B}_{1},\dots,\mathcal{B}_{n}}(w_{1},\dots,w_{n};m_{\textrm{max}})$$

$${}:=\left\{(x^{\prime},x)\mapsto\prod_{q=1}^{d}\sum_{j=1}^{m_{q}}w_{j}\varphi_{j}^{m_{q}}(x_{q})\varphi_{j}^{m_{q}}(x_{q}^{\prime});\ m_{1},\dots,m_{d}\in\{1,\dots,m_{\textrm{max}}\}\right\},$$

where $w_{1},\dots,w_{n}\in[0,1]$.

Remark 2.5. Note that Condition (5) is close, but more restrictive than Condition (19) of Lerasle et al. [13], Proposition 3.2, which is that the spaces ${\textrm{span}}(\mathcal{B}_{m})$, $m\in\mathbb{N}$ are nested. See Massart [14], Subsection 7.5.2 for examples of nested spaces. Our condition (5) is fulfilled by the trigonometric basis, Hermite’s basis or Laguerre’s basis.

Note also that in the same proposition of Lerasle et al. [13], Condition (20) coincides with our condition (6). The regular histograms basis satisfies condition (6). Indeed, by taking $\varphi_{j}^{m}=\psi_{j}^{m}:=\sqrt{m}\mathbf{1}_{[(j-1)/m,j/m[}$ for every $m\in\{1,\dots,n\}$ and $j\in\{1,\dots,m\}$,

$$\left|\mathbb{E}\left[\prod_{q=1}^{d}\sum_{j=1}^{m_{q}}\psi_{j}^{m_{q}}(X_{1,q})\psi_{j}^{m_{q}}(x_{q})\right]\right|=\sum_{j_{1}=1}^{m_{1}}\cdots\sum_{j_{d}=1}^{m_{d}}\left(\prod_{q=1}^{d}m_{q}\mathbf{1}_{[(j_{q}-1)/m_{q},j_{q}/m_{q}[}(x_{q})\right)$$

$${}\times\int\limits_{(j_{1}-1)/m_{1}}^{j_{1}/m_{1}}\cdots\int\limits_{(j_{d}-1)/m_{d}}^{j_{d}/m_{d}}f(x_{1}^{\prime},\dots,x_{d}^{\prime})dx_{1}^{\prime}\cdots dx_{d}^{\prime}$$

$${}\leqslant||f||_{\infty}\prod_{q=1}^{d}\sum_{j=1}^{m_{q}}\mathbf{1}_{[(j-1)/m_{q},j/m_{q}[}(x_{q})\leqslant||f||_{\infty}$$

for every $m_{1},\dots,m_{d}\in\{1,\dots,n\}$ and $x\in\mathbb{R}^{d}$.

The following proposition shows that Legendre’s basis also fulfills condition (6).

Proposition 2.6. For every $m\in\{1,\dots,n\}$ and $j\in\{1,\dots,m\}$, let $\xi_{j}^{m}$ be the function defined on $[-1,1]$ by

$$\xi_{j}^{m}(x):=\sqrt{\frac{2j+1}{2}}Q_{j}(x);\quad\forall x\in[-1,1],$$

where

$$Q_{j}:x\in[-1,1]\longmapsto\frac{1}{2^{j}j!}\cdot\frac{d^{j}}{dx^{j}}(x^{2}-1)^{j}$$

is the $j$th Legendre’s polynomial. If $f\in C^{2d}([0,1]^{d})$ and $\mathcal{B}_{m}=\{\xi_{1}^{m},\dots,\xi_{m}^{m}\}$ for every $m\in\{1,\cdots,n\}$, then $\mathcal{K}_{\mathcal{B}_{1},\dots,\mathcal{B}_{m}}(m_{\textrm{max}})$ fulfills condition (6).

The following proposition provides a suitable control of the variance of $\widehat{s}_{K,\ell}(n;.)$.

Proposition 2.7. Under Assumptions 2.1.(1)–2.1.(3), if $s\in\mathbb{L}^{2}(\mathbb{R}^{d})$ and if there exists $\alpha>0$ such that $\mathbb{E}(\exp(\alpha|\ell(Y_{1})|))<\infty$, then there exists a deterministic constant $\mathfrak{c}_{2.7}>0$, not depending on $n$, such that for every $\theta\in]0,1[$,

$$\mathbb{E}\left(\sup_{K\in\mathcal{K}_{n}}\left\{\left|||\widehat{s}_{K,\ell}(n;.)-s_{K,\ell}||_{2}^{2}-\frac{\overline{s}_{K,\ell}}{n}\right|-\frac{\theta}{n}\overline{s}_{K,\ell}\right\}\right)\leqslant\mathfrak{c}_{2.7}\frac{\log(n)^{5}}{\theta n}.$$

Finally, let us state the main result of this section.

Theorem 2.8. Under Assumption 2.1, if $s\in\mathbb{L}^{2}(\mathbb{R}^{d})$ and if there exists $\alpha>0$ such that $\mathbb{E}(\exp(\alpha|\ell(Y_{1})|))<\infty$, then there exist deterministic constants $\mathfrak{c}_{2.8},\overline{\mathfrak{c}}_{2.8}>0$, not depending on $n$, such that for every $\theta\in]0,1[$,

$$\mathbb{E}\left(\sup_{K\in\mathcal{K}_{n}}\left\{||\widehat{s}_{K,\ell}(n;.)-s||_{2}^{2}-(1+\theta)\left(||s_{K,\ell}-s||_{2}^{2}+\frac{\overline{s}_{K,\ell}}{n}\right)\right\}\right)\leqslant\mathfrak{c}_{2.8}\frac{\log(n)^{5}}{\theta n}$$

and

$$\mathbb{E}\left(\sup_{K\in\mathcal{K}_{n}}\left\{||s_{K,\ell}-s||_{2}^{2}+\frac{\overline{s}_{K,\ell}}{n}-\frac{1}{1-\theta}||\widehat{s}_{K,\ell}(n;.)-s||_{2}^{2}\right\}\right)\leqslant\overline{\mathfrak{c}}_{2.8}\frac{\log(n)^{5}}{\theta(1-\theta)n}.$$

Remark 2.9. Note that the first inequality in Theorem 2.8 gives a risk bound on the estimator $\widehat{s}_{K,\ell}(n;.)$:

$$\mathbb{E}(||\widehat{s}_{K,\ell}(n;.)-s||_{2}^{2})\leqslant(1+\theta)\left(||s_{K,\ell}-s||_{2}^{2}+\frac{\overline{s}_{K,\ell}}{n}\right)+\mathfrak{c}_{2.8}\frac{\log(n)^{5}}{\theta n}$$

for every $\theta\in]0,1[$. The second inequality is useful in order to establish a risk bound on the adaptive estimator defined in the next section (see Theorem 3.2).

Remark 2.10. In Proposition 2.7 and Theorem 2.8, the exponential moment condition may appear too strong. Nevertheless, this is de facto satisfied when

$$\ell(Y_{1}),\dots,\ell(Y_{n})\textit{ have a compactly supported distribution.}$$

(7)

This last condition is satisfied in the density estimation framework because $\ell=1$, but even in the nonparametric regression framework, where $\ell$ is not bounded, when $Y_{1},\dots,Y_{n}$ have a compactly supported distribution. Moreover, note that under condition (7), the risk bounds of Theorem 2.8 can be stated in deviation, without additional steps in the proof. Precisely, under Assumption 2.1 and condition (7), if $s\in\mathbb{L}^{2}(\mathbb{R}^{d})$, then there exists a deterministic constant $\mathfrak{c}_{L}>0$, depending on $L=\sup_{z\in{\textrm{supp}}(\mathbb{P}_{Y_{1}})}|\ell(z)|$ but not on $n$, such that for every $\vartheta\in]0,1[$ and $\lambda>0$,

$$\sup_{K\in\mathcal{K}_{n}}\left|||s_{K,\ell}-s||_{2}^{2}+\frac{\overline{s}_{K,\ell}}{n}-\frac{1}{1-\vartheta}||\widehat{s}_{K,\ell}(n;.)-s||_{2}^{2}\right|\leqslant\frac{\mathfrak{c}_{L}}{\vartheta n}(1+\lambda)^{3}$$

with probability larger than $1-9.4|\mathcal{K}_{n}|e^{-\lambda}$.

When condition (7) doesn’t hold true, one can replace the exponential moment condition of Proposition 2.7 and Theorem 2.8 by a $q$th order moment condition on $\ell(Y_{1})$ ($q\in\mathbb{N}^{*}$), but with a damaging effect on the rate of convergence of $\widehat{s}_{K,\ell}(n;.)$. For instance, at Remark B.5, it is established that under a $(12-4\varepsilon)/\beta$th moment condition ($\varepsilon\in]0,1[$ and $0<\beta<\varepsilon/2$), the rate of convergence is of order $O(1/n^{1-\varepsilon})$ (instead of $1/n$) in Lemma B.2. This holds true for the three technical lemmas of Appendix B.1, and then for Proposition 2.7 and Theorem 2.8.

3 KERNEL SELECTION

This section deals with a risk bound on the adaptive estimator $\widehat{s}_{\widehat{K},\ell}(n;.)$, where

$$\widehat{K}\in\arg\min_{K\in\mathcal{K}_{n}}\{||\widehat{s}_{K,\ell}(n;\cdot)-\widehat{s}_{K_{0},\ell}(n;\cdot)||_{2}^{2}+\textrm{pen}_{\ell}(K)\},$$

$K_{0}$ is an overfitting proposal for $K$ in the sense that

$$K_{0}\in\arg\max_{K\in\mathcal{K}_{n}}\left\{\sup_{x\in\mathbb{R}^{d}}|K(x,x)|\right\},$$

and

$$\textrm{pen}_{\ell}(K):=\frac{2}{n^{2}}\sum_{i=1}^{n}\langle K(.,X_{i}),K_{0}(.,X_{i})\rangle_{2}\ell(Y_{i})^{2};\quad\forall K\in\mathcal{K}_{n}.$$

(8)

Example. On the one hand, for any $K\in\mathcal{K}_{k}(h_{\textrm{min}})$ (i.e., defined by (2) with $h_{1},\dots,h_{d}\in\mathcal{H}(h_{\textrm{min}})$),

$$\sup_{x\in\mathbb{R}^{d}}|K(x,x)|=|k(0)|^{d}\prod_{q=1}^{d}\frac{1}{h_{q}}.$$

Then, for $\mathcal{K}_{n}=\mathcal{K}_{k}(h_{\textrm{min}})$,

$$K_{0}(x^{\prime},x)=\frac{1}{h_{\textrm{min}}^{d}}\prod_{q=1}^{d}k\left(\frac{x_{q}^{\prime}-x_{q}}{h_{\textrm{min}}}\right);\quad\forall x,x^{\prime}\in\mathbb{R}^{d}.$$

On the other hand, for any $K\in\mathcal{K}_{\mathcal{B}_{1},\dots,\mathcal{B}_{n}}(m_{\textrm{max}})$ (i.e., defined by (3) with $m_{1},\dots,m_{n}\in\{1,\dots,m_{\textrm{max}}\}$),

$$\sup_{x\in\mathbb{R}^{d}}|K(x,x)|=\sup_{x\in\mathbb{R}^{d}}\prod_{q=1}^{d}\sum_{j=1}^{m_{q}}\varphi_{j}^{m_{q}}(x_{q})^{2}.$$

Then, for $\mathcal{K}_{n}=\mathcal{K}_{\mathcal{B}_{1},\dots,\mathcal{B}_{n}}(m_{\textrm{max}})$, at least for the usual bases mentioned at Remark 2.5,

$$K_{0}(x^{\prime},x)=\prod_{q=1}^{d}\sum_{j=1}^{m_{\textrm{max}}}\varphi_{j}^{m_{\textrm{max}}}(x_{q})\varphi_{j}^{m_{\textrm{max}}}(x_{q}^{\prime});\quad\forall x,x^{\prime}\in\mathbb{R}^{d}.$$

In the sequel, in addition to Assumption 2.1, the kernels set $\mathcal{K}_{n}$ fulfills the following assumption.

Assumption 3.1. There exists a deterministic constant $\overline{\mathfrak{m}}_{\mathcal{K},\ell}>0$, not depending on $n$, such that

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}\langle K(X_{1},.),s_{K^{\prime},\ell}\rangle_{2}^{2}\right)\leqslant\overline{\mathfrak{m}}_{\mathcal{K},\ell}.$$

The following theorem provides an oracle inequality for the adaptive estimator $\widehat{s}_{\widehat{K},\ell}(n;.)$.

Theorem 3.2. Under Assumptions 2.1 and 3.1, if $s\in\mathbb{L}^{2}(\mathbb{R}^{d})$ and if there exists $\alpha>0$ such that $\mathbb{E}(\exp(\alpha|\ell(Y_{1})|))<\infty$, then there exists a deterministic constant $\mathfrak{c}_{3.2}>0$, not depending on $n$, such that for every $\vartheta\in]0,1[$,

$$\mathbb{E}(||\widehat{s}_{\widehat{K},\ell}(n;.)-s||_{2}^{2})\leqslant(1+\vartheta)\min_{K\in\mathcal{K}_{n}}\mathbb{E}(||\widehat{s}_{K,\ell}(n;.)-s||_{2}^{2})+\frac{\mathfrak{c}_{3.2}}{\vartheta}\left(||s_{K_{0},\ell}-s||_{2}^{2}+\frac{\log(n)^{5}}{n}\right).$$

Remark 3.3. As mentioned in Comte and Marie [4, p. 6], when $\mathcal{K}_{n}=\mathcal{K}_{k}(h_{\textrm{min}})$, if $s$ belongs to a Nikol’skii ball and $h_{\textrm{min}}=1/n$, then Theorem 3.2 says that the PCO estimator has a performance of same order than $O_{n}:=\min_{K\in\mathcal{K}_{n}}\mathbb{E}(||\widehat{s}_{K,\ell}(n;.)-s||_{2}^{2})$ up to a factor $1+\vartheta$. When $\mathcal{K}_{n}=\mathcal{K}_{\mathcal{B}_{1},\dots,\mathcal{B}_{n}}(m_{\textrm{max}})$, it depends on the bases $\mathcal{B}_{1},\dots,\mathcal{B}_{n}$. For instance, with the same ideas than in Comte and Marie [4], thanks to DeVore and Lorentz [6, Theorem 2.3 p. 205], if $s$ belongs to a Sobolev space and $m_{\textrm{max}}=n$, then our Theorem 3.2 also says that the PCO estimator has a performance of same order than $O_{n}$.

Notation. For any $B\in\mathcal{B}(\mathbb{R}^{d})$, $||.||_{2,f,B}$ is the norm on $\mathbb{L}^{2}(B,f(x)\lambda_{d}(dx))$ defined by

$$||\varphi||_{2,f,B}:=\left(\int\limits_{B}\varphi(x)^{2}f(x)\lambda_{d}(dx)\right)^{1/2}\textrm{; }\forall\varphi\in\mathbb{L}^{2}(B,f(x)\lambda_{d}(dx)).$$

The following corollary provides an oracle inequality for $\widehat{s}_{\widehat{K},\ell}(n;.)/\widehat{s}_{\widehat{L},1}(n;.)$, where $\widehat{K}$ and $\widehat{L}$ are both selected via the PCO method.

Corollary 3.4. Let $(\beta_{j})_{j\in\mathbb{N}}$ be a decreasing sequence of elements of $]0,\infty[$ such that $\lim_{\infty}\beta_{j}=0$ and, for every $j\in\mathbb{N}$, consider

$$B_{j}:=\{x\in\mathbb{R}^{d}:f(x)\geqslant\beta_{j}\}.$$

Under Assumptions 2.1 and 3.1 for $\ell$ and $1$, if $s,f\in\mathbb{L}^{2}(\mathbb{R}^{d})$ and if there exists $\alpha>0$ such that $\mathbb{E}(\exp(\alpha|\ell(Y_{1})|))<\infty$, then there exists a deterministic constant $\mathfrak{c}_{3.2}>0$, not depending on $n$, such that for every $\vartheta\in]0,1[$,

$$\mathbb{E}\left[\left|\left|\frac{\widehat{s}_{\widehat{K},\ell}(n;.)}{\widehat{s}_{\widehat{L},1}(n;.)}-\frac{s}{f}\right|\right|_{2,f,B_{n}}^{2}\right]\leqslant\frac{\mathfrak{c}_{3.2}}{\beta_{n}^{2}}\left[(1+\vartheta)\min_{(K,L)\in\mathcal{K}_{n}^{2}}\{\mathbb{E}(||\widehat{s}_{K,\ell}(n;.)-s||_{2}^{2})+\mathbb{E}(||\widehat{s}_{L,1}(n;.)-f||_{2}^{2})\}\right.$$

$${}+\left.\frac{1}{\vartheta}\left(||s_{K_{0},\ell}-s||_{2}^{2}+||s_{K_{0},1}-f||_{2}^{2}+\frac{\log(n)^{5}}{n}\right)\right],$$

where

$$\widehat{K}\in\arg\min_{K\in\mathcal{K}_{n}}\{||\widehat{s}_{K,\ell}(n;\cdot)-\widehat{s}_{K_{0},\ell}(n;\cdot)||_{2}^{2}+{\textrm{pen}}_{\ell}(K)\}$$

and

$$\widehat{L}\in\arg\min_{L\in\mathcal{K}_{n}}\{||\widehat{s}_{L,1}(n;\cdot)-\widehat{s}_{K_{0},1}(n;\cdot)||_{2}^{2}+{\textrm{pen}}_{1}(L)\}.$$

The proof of Corollary 3.4 is the same than the proof of Comte and Marie [4], Corollary 4.3.

Finally, let us discuss about Assumption 3.1. This assumption is difficult to check in practice, then let us provide a sufficient condition.

Assumption 3.5. The function $s$ is bounded and

$$\mathfrak{m}_{\mathcal{K}}:=\sup\{||K(x^{\prime},.)||_{1}^{2};\ K\in\mathcal{K}_{n}\quad\textrm{and}\quad x^{\prime}\in\mathbb{R}^{d}\}$$

doesn’t depend on $n$.

Under Assumption 3.5, $\mathcal{K}_{n}$ fulfills Assumption 3.1. Indeed,

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}\langle K(X_{1},.),s_{K^{\prime},\ell}\rangle_{2}^{2}\right)\leqslant\left(\sup_{K^{\prime}\in\mathcal{K}_{n}}||s_{K^{\prime},\ell}||_{\infty}^{2}\right)\mathbb{E}\left(\sup_{K\in\mathcal{K}_{n}}||K(X_{1},.)||_{1}^{2}\right)$$

$${}\leqslant\mathfrak{m}_{\mathcal{K}}\sup\left\{\left(\int\limits_{-\infty}^{\infty}|K^{\prime}(x^{\prime},x)s(x)|dx\right)^{2};\ K^{\prime}\in\mathcal{K}_{n}\quad\textrm{and}\quad x^{\prime}\in\mathbb{R}\right\}\leqslant\mathfrak{m}_{\mathcal{K}}^{2}||s||_{\infty}^{2}.$$

Note that in the nonparametric regression framework (see Model (1)), to assume $s$ bounded means that $bf$ is bounded. For instance, this condition is fulfilled by the linear regression models with Gaussian inputs.

Let us provide two examples of kernels sets fulfilling Assumption 3.5, the sufficient condition for Assumption 3.1:

Consider $K\in\mathcal{K}_{k}(h_{\textrm{min}})$. Then, there exist $h_{1},\dots,h_{d}\in\mathcal{H}(h_{\textrm{min}})$ such that

$$K(x^{\prime},x)=\prod_{q=1}^{d}\frac{1}{h_{q}}k\left(\frac{x_{q}^{\prime}-x_{q}}{h_{q}}\right);\quad\forall x,x^{\prime}\in\mathbb{R}^{d}.$$

Clearly, $||K(x^{\prime},.)||_{1}=||k||_{1}^{d}$ for every $x^{\prime}\in\mathbb{R}^{d}$. So, for $\mathcal{K}_{n}=\mathcal{K}_{k}(h_{\textrm{min}})$, $\mathfrak{m}_{\mathcal{K}}\leqslant||k||_{1}^{2d}$.
For $\mathcal{K}_{n}=\mathcal{K}_{\mathcal{B}_{1},\dots,\mathcal{B}_{n}}(m_{\textrm{max}})$, the condition on $\mathfrak{m}_{\mathcal{K}}$ seems harder to check in general. Let us show that it is satisfied for the regular histograms basis defined in Section 2. For every $m_{1},\dots,m_{d}\in\{1,\dots,n\}$,

$$\left|\left|\prod_{q=1}^{d}\sum_{j=1}^{m_{q}}\psi_{j}^{m_{q}}(x_{q}^{\prime})\psi_{j}^{m_{q}}(.)\right|\right|_{1}\leqslant\prod_{q=1}^{d}\left(m_{q}\sum_{j=1}^{m_{q}}\mathbf{1}_{[(j-1)/m_{q},j/m_{q}[}(x_{q}^{\prime})\int\limits_{(j-1)/m_{q}}^{j/m_{q}}dx\right)\leqslant 1.$$

Now, let us show that even if it doesn’t fulfill Assumption 3.5, the trigonometric basis fulfills Assumption 3.1.

Proposition 3.6. Consider $\chi_{1}:=\mathbf{1}_{[0,1]}$ and, for every $j\in\mathbb{N}^{*}$, the functions $\chi_{2j}$ and $\chi_{2j+1}$ defined on $\mathbb{R}$ by

$$\chi_{2j}(x):=\sqrt{2}\cos(2\pi jx)\mathbf{1}_{[0,1]}(x)\quad{and}\quad\chi_{2j+1}(x):=\sqrt{2}\sin(2\pi jx)\mathbf{1}_{[0,1]}(x);\quad\forall x\in\mathbb{R}.$$

If $s\in C^{2}(\mathbb{R}^{d})$ and $\mathcal{B}_{m}=\{\chi_{1},\dots,\chi_{m}\}$ for every $m\in\{1,\dots,n\}$, then $\mathcal{K}_{\mathcal{B}_{1},\dots,\mathcal{B}_{n}}(m_{\textrm{max}})$ fulfills Assumption 3.1.

REFERENCES

G. Chagny, ‘‘Warped Bases for Conditional Density Estimation,’’ Math. Methods Statist. 22, 253–282 (2013).
Article MathSciNet Google Scholar
F. Comte, ‘‘Estimation non-paramétrique,’’ Spartacus IDH (2014).
MATH Google Scholar
F. Comte and N. Marie, ‘‘Bandwidth selection for the Wolverton–Wagner estimator,’’ Journal of Statistical Planning and Inference 207, 198–214 (2020).
Article MathSciNet Google Scholar
F. Comte and N. Marie, ‘‘On a Nadaraya–Watson estimator with two bandwidths,’’ Submitted (2020).
F. Comte and T. Rebafka, ‘‘Nonparametric weighted estimators for Biased data,’’ Journal of Statistical Planning and Inference 174, 104–128 (2016).
Article MathSciNet Google Scholar
R. A. DeVore and G. G. Lorentz, Constructive Approximation (Springer-Verlag, 1993).
U. Einmahl and D. M. Mason, ‘‘An empirical process approach to the uniform consistency of kernel-type function estimators,’’ Journal of Theoretical Probability 13, 1–37 (2000).
Article MathSciNet Google Scholar
U. Einmahl and D. M. Mason, ‘‘Uniform in Bandwidth Consistency of Kernel-Type Function Estimators,’’ Annals of Statistics 33, 1380–1403 (2005).
Article MathSciNet Google Scholar
E. Giné and R. Nickl, Mathematical Foundations of Infinite-Dimensional Statistical Models (Cambridge University Press, 2015).
MATH Google Scholar
A. Goldenshluger and O. Lepski, ‘‘Bandwidth selection in kernel density estimation: Oracle inequalities and adaptive minimax optimality,’’ The Annals of Statistics 39, 1608–1632 (2011).
Article MathSciNet Google Scholar
C. Houdré and P. Reynaud-Bouret, Exponential Inequalities, with Constants, for U-statistics of Order Two. Stochastic Inequalities and Applications, vol. 56 of Progr. Proba., Birkhauser, 2003, pp. 55–69.
C. Lacour, P. Massart, and V. Rivoirard, ‘‘Estimator Selection: a New Method with Applications to Kernel Density Estimation,’’ Sankhya A 79 (2), 298–335 (2017).
Article MathSciNet Google Scholar
M. Lerasle, N.M. Magalhaes, and P. Reynaud-Bouret. Optimal Kernel Selection for Density Estimation. High Dimensional Probabilities VII: The Cargese Volume, vol. 71 of Prog. Proba., Birkhauser, 2016, pp. 435–460.
P. Massart, Concentration Inequalities and Model Selection. Lecture Notes in Mathematics 1896 (Springer, 2007).
E.A. Nadaraya, ‘‘On a regression estimate,’’ (Russian) Verojatnost. i Primenen. 9, 157–159 (1964).
MathSciNet MATH Google Scholar
E. Parzen, ‘‘On the estimation of a probability density function and the mode,’’ The Annals of Mathematical Statistics 33, 1065–1076 (1962).
Article MathSciNet Google Scholar
M. Rosenblatt, ‘‘Remarks on some nonparametric estimates of a density function,’’ The Annals of Mathematical Statistics 27, 832–837 (1956).
Article MathSciNet Google Scholar
A. Tsybakov, Introduction to Nonparametric Estimation (Springer, 2009).
Book Google Scholar
S. Varet, C. Lacour, P. Massart, and V. Rivoirard, Numerical Performance of Penalized Comparison to Overfitting for Multivariate Density Estimation. Preprint, 2020.
G. S. Watson, ‘‘Smooth regression analysis,’’ Sankhya A 26, 359–372 (1964).
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

LTCI, Télécom Paris, Palaiseau, France
H. Halconruy
ESME Sudria, Paris, France
H. Halconruy & N. Marie
Laboratoire Modal’X, Université Paris Nanterre, Nanterre, France
N. Marie

Authors

H. Halconruy
View author publications
You can also search for this author in PubMed Google Scholar
N. Marie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to H. Halconruy or N. Marie.

APPENDIX A.

A. DETAILS ON KERNELS SETS: PROOFS OF PROPOSITIONS 2.2, 2.3, 2.6, AND 3.6

A.1. Proof of Proposition 2.2

Consider $K,K^{\prime}\in\mathcal{K}_{k}(h_{\textrm{min}})$. Then, there exist $h,h^{\prime}\in\mathcal{H}(h_{\textrm{min}})^{d}$ such that

$$K(x^{\prime},x)=k_{h}(x^{\prime}-x)\quad\textrm{and}\quad K^{\prime}(x^{\prime},x)=k_{h^{\prime}}(x^{\prime}-x)$$

for every $x,x^{\prime}\in\mathbb{R}^{d}$, where

$$k_{h}(x):=\prod_{q=1}^{d}\frac{1}{h_{q}}k\left(\frac{x_{q}}{h_{q}}\right);\quad\forall x\in\mathbb{R}^{d}.$$

(1) For every $x^{\prime}\in\mathbb{R}^{d}$, since $nh_{\textrm{min}}^{d}\geqslant 1$,

$$||K(x^{\prime},.)||_{2}^{2}=\left(\prod_{q=1}^{d}\frac{1}{h_{q}^{2}}\right)\left[\int\limits_{\mathbb{R}^{d}}\prod_{q=1}^{d}k\left(\frac{x_{q}^{\prime}-x_{q}}{h_{q}}\right)^{2}\lambda_{d}(dx)\right]=||k||_{2}^{2d}\prod_{q=1}^{d}\frac{1}{h_{q}}$$

$${}\leqslant||k||_{2}^{2d}\frac{1}{h_{\textrm{min}}^{d}}\leqslant||k||_{2}^{2d}n.$$

(A.1)

(2) Since $s_{K,\ell}=K\ast s$ and by Young’s inequality, $||s_{K,\ell}||_{2}^{2}\leqslant||k||_{1}^{2d}||s||_{2}^{2}$.

(3) On the one hand, thanks to Eq. (A.1),

$$\overline{s}_{K^{\prime},\ell}=\mathbb{E}(||K^{\prime}(X_{1},.)\ell(Y_{1})||_{2}^{2})=||k||_{2}^{2d}\mathbb{E}(\ell(Y_{1})^{2})\prod_{q=1}^{d}\frac{1}{h_{q}^{\prime}}.$$

On the other hand, for every $x,x^{\prime}\in\mathbb{R}^{d}$,

$$\langle K(x,.),K^{\prime}(x^{\prime},.)\rangle_{2}=\int\limits_{\mathbb{R}^{d}}k_{h}(x-x^{\prime\prime})k_{h^{\prime}}(x^{\prime}-x^{\prime\prime})\lambda_{d}(dx^{\prime\prime})=(k_{h}\ast k_{h^{\prime}})(x-x^{\prime}).$$

Then,

$$\mathbb{E}(\langle K(X_{1},.),K^{\prime}(X_{2},.)\ell(Y_{2})\rangle_{2}^{2})=\mathbb{E}((k_{h}\ast k_{h^{\prime}})(X_{1}-X_{2})^{2}\ell(Y_{2})^{2})$$

$${}=\int\limits_{\mathbb{R}^{d+1}}\left[\ell(y)^{2}\int\limits_{\mathbb{R}^{d}}(k_{h}\ast k_{h^{\prime}})(x^{\prime}-x)^{2}f(x^{\prime})\lambda_{d}(dx^{\prime})\right]\mathbb{P}_{(X_{2},Y_{2})}(dx,dy)$$

$${}\leqslant||f||_{\infty}||k_{h}\ast k_{h^{\prime}}||_{2}^{2}\mathbb{E}(\ell(Y_{2})^{2})\leqslant||f||_{\infty}||k||_{1}^{2d}\overline{s}_{K^{\prime},\ell}.$$

(4) For every $\psi\in\mathbb{L}^{2}(\mathbb{R}^{d})$,

$$\mathbb{E}(\langle K(X_{1},.),\psi\rangle_{2}^{2})=\mathbb{E}((k_{h}\ast\psi)(X_{1})^{2})$$

$${}\leqslant||f||_{\infty}||k_{h}\ast\psi||_{2}^{2}\leqslant||f||_{\infty}||k||_{1}^{2d}||\psi||_{2}^{2}.$$

A.2. Proof of Proposition 2.3

Consider $K,K^{\prime}\in\mathcal{K}_{\mathcal{B}_{1},\dots,\mathcal{B}_{n}}(m_{\textrm{max}})$. Then, there exist $m,m^{\prime}\in\{1,\dots,m_{\textrm{max}}\}^{d}$ such that

$$K(x^{\prime},x)=\prod_{q=1}^{d}\sum_{j=1}^{m_{q}}\varphi_{j}^{m_{q}}(x_{q})\varphi_{j}^{m_{q}}(x_{q}^{\prime})\quad\textrm{and}\quad K^{\prime}(x^{\prime},x)=\prod_{q=1}^{d}\sum_{j=1}^{m_{q}^{\prime}}\varphi_{j}^{m_{q}^{\prime}}(x_{q})\varphi_{j}^{m_{q}^{\prime}}(x_{q}^{\prime})$$

for every $x,x^{\prime}\in\mathbb{R}^{d}$.

(1) For every $x^{\prime}\in\mathbb{R}^{d}$, since $m_{\textrm{max}}^{d}\leqslant n$,

$$||K(x^{\prime},.)||_{2}^{2}=\prod_{q=1}^{d}\sum_{j,j^{\prime}=1}^{m_{q}}\varphi_{j^{\prime}}^{m_{q}}(x_{q}^{\prime})\varphi_{j}^{m_{q}}(x_{q}^{\prime})\int\limits_{-\infty}^{\infty}\varphi_{j^{\prime}}^{m_{q}}(x)\varphi_{j}^{m_{q}}(x)dx=\prod_{q=1}^{d}\sum_{j=1}^{m_{q}}\varphi_{j}^{m_{q}}(x_{q}^{\prime})^{2}$$

$${}\leqslant\mathfrak{m}_{\mathcal{B}}^{d}\prod_{q=1}^{d}m_{q}\leqslant\mathfrak{m}_{\mathcal{B}}^{d}n.$$

(A.2)

(2) Since

$$s_{K,\ell}(.)=\sum_{j_{1}=1}^{m_{1}}\cdots\sum_{j_{d}=1}^{m_{d}}\langle s,\varphi_{j_{1}}^{m_{1}}\otimes\cdots\otimes\varphi_{j_{d}}^{m_{d}}\rangle_{2}(\varphi_{j_{1}}^{m_{1}}\otimes\cdots\otimes\varphi_{j_{d}}^{m_{d}})(.)$$

by Pythagoras theorem, $||s_{K,\ell}||_{2}^{2}\leqslant||s||_{2}^{2}$.

(3) First of all, thanks to Eq. (A.2),

$$\overline{s}_{K^{\prime},\ell}=\mathbb{E}\left[\ell(Y_{1})^{2}\prod_{q=1}^{d}\sum_{j=1}^{m_{q}^{\prime}}\varphi_{j}^{m_{q}^{\prime}}(X_{1,q})^{2}\right]\leqslant\mathfrak{m}_{\mathcal{B}}^{d}\mathbb{E}(\ell(Y_{1})^{2})\prod_{q=1}^{d}m_{q}^{\prime}.$$

On the one hand, under condition (5) on $\mathcal{B}_{1},\dots,\mathcal{B}_{n}$, for any $j\in\{1,\dots,m\}$, $\varphi_{j}^{m}$ doesn’t depend on $m$, so it can be denoted by $\varphi_{j}$, and then

$$\mathbb{E}(\langle K(X_{1},.),K^{\prime}(X_{2},.)\ell(Y_{2})\rangle_{2}^{2})=\int\limits_{\mathbb{R}^{d}}\mathbb{E}\left[\left(\prod_{q=1}^{d}\sum_{j=1}^{m_{q}\wedge m_{q}^{\prime}}\varphi_{j}(x_{q}^{\prime})\varphi_{j}(X_{2,q})\right)^{2}\ell(Y_{2})^{2}\right]f(x^{\prime})\lambda_{d}(dx^{\prime})$$

$${}\leqslant||f||_{\infty}\mathbb{E}\left[\ell(Y_{2})^{2}\prod_{q=1}^{d}\sum_{j,j^{\prime}=1}^{m_{q}\wedge m_{q}^{\prime}}\varphi_{j^{\prime}}(X_{2,q})\varphi_{j}(X_{2,q})\int\limits_{-\infty}^{\infty}\varphi_{j^{\prime}}(x^{\prime})\varphi_{j}(x^{\prime})dx^{\prime}\right]$$

$${}\leqslant||f||_{\infty}\overline{s}_{K^{\prime},\ell}.$$

On the other hand, under condition (6) on $\mathcal{B}_{1},\dots,\mathcal{B}_{n}$, since $X_{1}$ and $(X_{2},Y_{2})$ are independent, and since $K(x,x)\geqslant 0$ for every $x\in\mathbb{R}^{d}$,

$$\mathbb{E}(\langle K(X_{1},.),K^{\prime}(X_{2},.)\ell(Y_{2})\rangle_{2}^{2})\leqslant\mathbb{E}(||K(X_{1},.)||_{2}^{2}||K^{\prime}(X_{2},.)||_{2}^{2}\ell(Y_{2})^{2})$$

$${}=\mathbb{E}(K(X_{1},X_{1}))\mathbb{E}(||K^{\prime}(X_{2},.)||_{2}^{2}\ell(Y_{2})^{2})\leqslant\overline{\mathfrak{m}}_{\mathcal{B}}\overline{s}_{K^{\prime},\ell}.$$

(4) For every $\psi\in\mathbb{L}^{2}(\mathbb{R}^{d})$,

$$\mathbb{E}(\langle K(X_{1},.),\psi\rangle_{2}^{2})=\mathbb{E}\left[\left|\sum_{j_{1}=1}^{m_{1}}\cdots\sum_{j_{d}=1}^{m_{d}}\langle\psi,\varphi_{j_{1}}^{m_{1}}\otimes\cdots\otimes\varphi_{j_{d}}^{m_{d}}\rangle_{2}(\varphi_{j_{1}}^{m_{1}}\otimes\cdots\otimes\varphi_{j_{d}}^{m_{d}})(X_{1})\right|^{2}\right]$$

$${}\leqslant||f||_{\infty}\left|\left|\sum_{j_{1}=1}^{m_{1}}\cdots\sum_{j_{d}=1}^{m_{d}}\langle\psi,\varphi_{j_{1}}^{m_{1}}\otimes\cdots\otimes\varphi_{j_{d}}^{m_{d}}\rangle_{2}(\varphi_{j_{1}}^{m_{1}}\otimes\cdots\otimes\varphi_{j_{d}}^{m_{d}})(.)\right|\right|_{2}^{2}\leqslant||f||_{\infty}||\psi||_{2}^{2}.$$

A.3. Proof of Proposition 2.6

For the sake of readability, assume that $d=1$. Consider $m\in\{1,\dots,m_{\textrm{max}}\}$. Since each Legendre’s polynomial is uniformly bounded by $1$,

$$\left|\mathbb{E}\left[\sum_{j=1}^{m}\xi_{j}^{m}(X_{1})\xi_{j}^{m}(x^{\prime})\right]\right|\leqslant\sum_{j=1}^{m}\frac{2j+1}{2}\left|\int\limits_{-1}^{1}Q_{j}(x)f(x)dx\right|.$$

Moreover, since $Q_{j}$ is a solution to Legendre’s differential equation for any $j\in\{1,\dots,m\}$, thanks to the integration by parts formula,

$$\int\limits_{-1}^{1}Q_{j}(x)f(x)dx=-\frac{1}{j(j+1)}\int\limits_{-1}^{1}\frac{d}{dx}[(1-x^{2})Q_{j}^{\prime}(x)]f(x)dx$$

$${}=-\frac{1}{j(j+1)}[(1-x^{2})Q_{j}^{\prime}(x)f(x)]_{-1}^{1}+\frac{1}{j(j+1)}\int\limits_{-1}^{1}(1-x^{2})Q_{j}^{\prime}(x)f^{\prime}(x)dx$$

$${}=-\frac{1}{j(j+1)}\int\limits_{-1}^{1}Q_{j}(x)\frac{d}{dx}[(1-x^{2})f^{\prime}(x)]dx.$$

Then,

$$\left|\int\limits_{-1}^{1}Q_{j}(x)f(x)dx\right|\leqslant\frac{2\mathfrak{c}_{1}}{j(j+1)}||Q_{j}||_{2}=\frac{2\sqrt{2}\mathfrak{c}_{1}}{j(j+1)(2j+1)^{1/2}}$$

with $\mathfrak{c}_{1}=\max\{2||f^{\prime}||_{\infty},||f^{\prime\prime}||_{\infty}\}$. So,

$$\left|\mathbb{E}\left[\sum_{j=1}^{m}\xi_{j}^{m}(X_{1})\xi_{j}^{m}(x^{\prime})\right]\right|\leqslant 2\mathfrak{c}_{1}\sum_{j=1}^{m}\frac{1}{j^{3/2}}\leqslant 2\mathfrak{c}_{1}\zeta\left(\frac{3}{2}\right),$$

where $\zeta$ is Riemann’s zeta function. Thus, Legendre’s basis satisfies condition (6).

A.4. Proof of Proposition 3.6

The proof of Proposition 3.6 relies on the following technical lemma.

Lemma A.1. For every $x\in[0,2\pi]$ and $p,q\in\mathbb{N}^{*}$ such that $q>p$,

$$\left|\sum_{j=p+1}^{q}\frac{\sin(jx)}{j}\right|\leqslant\frac{2}{(1+p)\sin(x/2)}.$$

See Subsubsection A.4.1. for a proof.

For the sake of readability, assume that $d=1$. Consider $K,K^{\prime}\in\mathcal{K}_{\mathcal{B}_{1},\dots,\mathcal{B}_{n}}(m_{\textrm{max}})$. Then, there exist $m,m^{\prime}\in\{1,\dots,m_{\textrm{max}}\}$ such that

$$K(x^{\prime},x)=\sum_{j=1}^{m}\chi_{j}(x)\chi_{j}(x^{\prime})\quad\textrm{and}\quad K^{\prime}(x^{\prime},x)=\sum_{j=1}^{m^{\prime}}\chi_{j}(x)\chi_{j}(x^{\prime});\quad\forall x,x^{\prime}\in\mathbb{R}.$$

First, there exist $\mathfrak{m}_{1}(m,m^{\prime})\in\{0,\dots,n\}$ and $\mathfrak{c}_{1}>0$, not depending on $n$, $K$ and $K^{\prime}$, such that for any $x^{\prime}\in[0,1]$,

$$|\langle K(x^{\prime},.),s_{K^{\prime},\ell}\rangle_{2}|=\left|\sum_{j=1}^{m\wedge m^{\prime}}\mathbb{E}(\ell(Y_{1})\chi_{j}(X_{1}))\chi_{j}(x^{\prime})\right|$$

$${}\leqslant\mathfrak{c}_{1}+2\left|\sum_{j=1}^{\mathfrak{m}_{1}(m,m^{\prime})}\mathbb{E}(\ell(Y_{1})(\cos(2\pi jX_{1})\cos(2\pi jx^{\prime})+\sin(2\pi jX_{1})\sin(2\pi jx^{\prime}))\mathbf{1}_{[0,1]}(X_{1}))\right|$$

$${}=\mathfrak{c}_{1}+2\left|\sum_{j=1}^{\mathfrak{m}_{1}(m,m^{\prime})}\mathbb{E}(\ell(Y_{1})\cos(2\pi j(X_{1}-x^{\prime}))\mathbf{1}_{[0,1]}(X_{1}))\right|.$$

Moreover, for any $j\in\{2,\dots,\mathfrak{m}_{1}(m,m^{\prime})\}$,

$$\mathbb{E}(\ell(Y_{1})\cos(2\pi j(X_{1}-x^{\prime}))\mathbf{1}_{[0,1]}(X_{1}))=\int\limits_{0}^{1}\cos(2\pi j(x-x^{\prime}))s(x)dx$$

$${}=\frac{1}{j}\left[\frac{\sin(2\pi j(x-x^{\prime}))}{2\pi}s(x)\right]_{0}^{1}$$

$${}+\frac{1}{j^{2}}\left[\frac{\cos(2\pi j(x-x^{\prime}))}{4\pi^{2}}s^{\prime}(x)\right]_{0}^{1}-\frac{1}{j^{2}}\int\limits_{0}^{1}\frac{\cos(2\pi j(x-x^{\prime}))}{4\pi^{2}}s^{\prime\prime}(x)dx$$

$${}=\frac{s(0)-s(1)}{2\pi}\frac{\alpha_{j}(x^{\prime})}{j}+\frac{\beta_{j}(x^{\prime})}{j^{2}}$$

where $\alpha_{j}(x^{\prime}):=\sin(2\pi jx^{\prime})$ and

$$\beta_{j}(x^{\prime}):=\frac{1}{4\pi^{2}}\left((s^{\prime}(1)-s^{\prime}(0))\cos(2\pi jx^{\prime})-\int\limits_{0}^{1}\cos(2\pi j(x-x^{\prime}))s^{\prime\prime}(x)dx\right).$$

Then, there exists a deterministic constant $\mathfrak{c}_{2}>0$, not depending on $n$, $K$, $K^{\prime}$, and $x^{\prime}$, such that

$$\langle K(x^{\prime},.),s_{K^{\prime},\ell}\rangle_{2}^{2}\leqslant\mathfrak{c}_{2}\left[1+\left(\sum_{j=1}^{\mathfrak{m}_{1}(m,m^{\prime})}\frac{\alpha_{j}(x^{\prime})}{j}\right)^{2}+\left(\sum_{j=1}^{\mathfrak{m}_{1}(m,m^{\prime})}\frac{\beta_{j}(x^{\prime})}{j^{2}}\right)^{2}\right].$$

(A.3)

Let us show that each term of the right-hand side of inequality (A.3) is uniformly bounded in $x^{\prime}$, $m$, and $m^{\prime}$. On the one hand,

$$\left|\sum_{j=1}^{\mathfrak{m}_{1}(m,m^{\prime})}\frac{\beta_{j}(x^{\prime})}{j^{2}}\right|\leqslant\max_{j\in\{1,\dots,n\}}||\beta_{j}||_{\infty}\sum_{j=1}^{n}\frac{1}{j^{2}}\leqslant\frac{1}{24}(2||s^{\prime}||_{\infty}+||s^{\prime\prime}||_{\infty}).$$

On the other hand, for every $x\in]0,\pi[$ such that $[\pi/x]+1\leqslant\mathfrak{m}_{1}(m,m^{\prime})$ (without loss of generality), by Lemma A.1,

$$\left|\sum_{j=1}^{\mathfrak{m}_{1}(m,m^{\prime})}\frac{\sin(jx)}{j}\right|\leqslant\left|\sum_{j=1}^{[\pi/x]}\frac{\sin(jx)}{j}\right|+\left|\sum_{j=[\pi/x]+1}^{\mathfrak{m}_{1}(m,m^{\prime})}\frac{\sin(jx)}{j}\right|$$

$${}\leqslant x\left[\frac{\pi}{x}\right]+\frac{2}{(1+[\pi/x])\sin(x/2)}\leqslant\pi+2.$$

(A.4)

Since $x\mapsto\sin(x)$ is continuous, odd and $2\pi$-periodic, inequality (A.4) holds true for every $x\in\mathbb{R}$. So,

$$\left|\sum_{j=1}^{\mathfrak{m}_{1}(m,m^{\prime})}\frac{\alpha_{j}(x^{\prime})}{j}\right|\leqslant\pi+2.$$

Therefore,

$$\mathbb{E}\left[\sup_{K,K^{\prime}\in\mathcal{K}_{\mathcal{B}_{1},\dots,\mathcal{B}_{n}}(m_{\textrm{max}})}\langle K(X_{1},.),s_{K^{\prime},\ell}\rangle_{2}^{2}\right]\leqslant\mathfrak{c}_{2}\left(1+(\pi+2)^{2}+\frac{1}{24^{2}}(2||s^{\prime}||_{\infty}+||s^{\prime\prime}||_{\infty})^{2}\right).$$

A.4.1. Proof of Lemma A.1. For any $x\in[0,2\pi]$ and $q\in\mathbb{N}^{*}$, consider

$$f_{q}(x):=\sum_{j=1}^{q}\frac{\sin(jx)}{j}\textrm{, }g_{q}(x):=\sum_{j=1}^{q}\left(\frac{1}{j}-\frac{1}{j+1}\right)h_{j}(x)\quad\textrm{and}\quad h_{q}(x):=\sum_{j=1}^{q}\sin(jx).$$

On the one hand,

$$g_{q}(x)=h_{1}(x)-\frac{1}{q+1}h_{q}(x)+\sum_{j=2}^{q}\frac{1}{j}(h_{j}(x)-h_{j-1}(x)).$$

Then,

$$f_{q}(x)=g_{q}(x)+\frac{1}{q+1}h_{q}(x).$$

On the other hand,

$$h_{q}(x)=\textrm{Im}\left(\sum_{j=1}^{q}e^{\mathbf{i}jx}\right)=\textrm{Im}\left[e^{\mathbf{i}(q+1)x/2}\frac{\sin(qx/2)}{\sin(x/2)}\right]$$

$${}=\frac{\sin((q+1)x/2)\sin(qx/2)}{\sin(x/2)}=\frac{\cos(x/2)-\cos((q+1/2)x)}{2\sin(x/2)}.$$

Then,

$$\sin\left(\frac{x}{2}\right)|h_{q}(x)|\leqslant 1$$

and, for any $p\in\mathbb{N}^{*}$ such that $q>p$,

$$\sin\left(\frac{x}{2}\right)|g_{q}(x)-g_{p}(x)|\leqslant\frac{1}{p+1}-\frac{1}{q+1}.$$

Therefore,

$$\sin\left(\frac{x}{2}\right)|f_{q}(x)-f_{p}(x)|\leqslant\sin\left(\frac{x}{2}\right)|g_{q}(x)-g_{p}(x)|+\sin\left(\frac{x}{2}\right)\frac{|h_{q}(x)|}{q+1}+\sin\left(\frac{x}{2}\right)\frac{|h_{p}(x)|}{p+1}$$

$${}\leqslant\frac{2}{p+1}.$$

In conclusion,

$$\left|\sum_{j=p+1}^{q}\frac{\sin(jx)}{k}\right|\leqslant\frac{2}{(1+p)\sin(x/2)}.$$

B. PROOFS OF RISK BOUNDS

B.1. Preliminary Results

This subsection provides three lemmas used several times in the sequel.

Lemma B.1. Consider

$$U_{K,K^{\prime},\ell}(n):=\sum_{i\not=j}\langle K(X_{i},.)\ell(Y_{i})-s_{K,\ell},K^{\prime}(X_{j},.)\ell(Y_{j})-s_{K^{\prime},\ell}\rangle_{2};\quad\forall K,K^{\prime}\in\mathcal{K}_{n}.$$

(B.1)

Under Assumptions 2.1.(1)–2.1.(3), if $s\in\mathbb{L}^{2}(\mathbb{R}^{d})$ and if there exists $\alpha>0$ such that $\mathbb{E}(\exp(\alpha|\ell(Y_{1})|))<\infty$, then there exists a deterministic constant $\mathfrak{c}_{B.1}>0$, not depending on $n$, such that for every $\theta\in]0,1[$,

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}\left\{\frac{|U_{K,K^{\prime},\ell}(n)|}{n^{2}}-\frac{\theta}{n}\overline{s}_{K^{\prime},\ell}\right\}\right)\leqslant\mathfrak{c}_{B.1}\frac{\log(n)^{5}}{\theta n}.$$

Lemma B.2. Consider

$$V_{K,\ell}(n):=\frac{1}{n}\sum_{i=1}^{n}||K(X_{i},.)\ell(Y_{i})-s_{K,\ell}||_{2}^{2};\quad\forall K\in\mathcal{K}_{n}.$$

Under Assumptions 2.1.(1), 2.1.(2), if $s\in\mathbb{L}^{2}(\mathbb{R}^{d})$ and if there exists $\alpha>0$ such that $\mathbb{E}(\exp(\alpha|\ell(Y_{1})|))<\infty$, then there exists a deterministic constant $\mathfrak{c}_{B.2}>0$, not depending on $n$, such that for every $\theta\in]0,1[$,

$$\mathbb{E}\left(\sup_{K\in\mathcal{K}_{n}}\left\{\frac{1}{n}|V_{K,\ell}(n)-\overline{s}_{K,\ell}|-\frac{\theta}{n}\overline{s}_{K,\ell}\right\}\right)\leqslant\mathfrak{c}_{B.2}\frac{\log(n)^{3}}{\theta n}.$$

Lemma B.3. Consider

$$W_{K,K^{\prime},\ell}(n):=\langle\widehat{s}_{K,\ell}(n;.)-s_{K,\ell},s_{K^{\prime},\ell}-s\rangle_{2};\quad\forall K,K^{\prime}\in\mathcal{K}_{n}.$$

(B.2)

Under Assumptions 2.1.(1), 2.1.(2), 2.1.(4), if $s\in\mathbb{L}^{2}(\mathbb{R}^{d})$ and if there exists $\alpha>0$ such that $\mathbb{E}(\exp(\alpha|\ell(Y_{1})|))<\infty$, then there exists a deterministic constant $\mathfrak{c}_{B.3}>0$, not depending on $n$, such that for every $\theta\in]0,1[$,

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}\{|W_{K,K^{\prime},\ell}(n)|-\theta||s_{K^{\prime},\ell}-s||_{2}^{2}\}\right)\leqslant\mathfrak{c}_{B.3}\frac{\log(n)^{4}}{\theta n}.$$

B.1.1. Proof of Lemma B.1. The proof of Lemma B.1 relies on the following concentration inequality for U-statistics, proved in dimension $1$ in Houdré and Reynaud-Bouret [11] first, and then extended to the infinite-dimensional framework by Giné and Nickl [9].

Lemma B.4. Let $\xi_{1},\dots,\xi_{n}$ be i.i.d. random variables on a Polish space $\Xi$ equipped with its Borel $\sigma$-algebra. Let $f_{i,j}$, $1\leqslant i\not=j\leqslant n$, be some bounded and symmetric measurable maps from $\Xi^{2}$ into $\mathbb{R}$ such that, for every $i\not=j$,

$$f_{i,j}=f_{j,i}\quad{and}\quad\mathbb{E}(f_{i,j}(z,\xi_{1}))=0 \, dz\textrm{-a.e.}$$

Consider the totally degenerate second order U-statistic

$$U_{n}:=\sum_{i\not=j}f_{i,j}(\xi_{i},\xi_{j}).$$

There exists a universal constant $\mathfrak{m}>0$ such that for every $\lambda>0$,

$$\mathbb{P}(U_{n}\leqslant\mathfrak{m}(\mathfrak{c}_{n}\lambda^{1/2}+\mathfrak{d}_{n}\lambda+\mathfrak{b}_{n}\lambda^{3/2}+\mathfrak{a}_{n}\lambda^{2}))\geqslant 1-2.7e^{-\lambda},$$

where

$$\mathfrak{a}_{n}=\sup_{i,j=1,\dots,n}\left\{\sup_{z,z^{\prime}\in\Xi}|f_{i,j}(z,z^{\prime})|\right\},$$

$$\mathfrak{b}_{n}^{2}=\max\left\{\sup_{i,z}\sum_{j=1}^{i-1}\mathbb{E}(f_{i,j}(z,\xi_{j})^{2});\ \sup_{j,z^{\prime}}\sum_{i=j+1}^{n}\mathbb{E}(f_{i,j}(\xi_{i},z^{\prime})^{2})\right\},$$

$$\mathfrak{c}_{n}^{2}=\sum_{i\not=j}\mathbb{E}(f_{i,j}(\xi_{i},\xi_{j})^{2}){ and}$$

$$\mathfrak{d}_{n}=\sup_{(a,b)\in\mathcal{A}}\mathbb{E}\left[\sum_{i<j}f_{i,j}(\xi_{i},\xi_{j})a_{i}(\xi_{i})b_{j}(\xi_{j})\right]$$

with

$$\mathcal{A}=\left\{(a,b):\mathbb{E}\left(\sum_{i=1}^{n-1}a_{i}(\xi_{i})^{2}\right)\leqslant 1\quad{and}\quad\mathbb{E}\left(\sum_{j=2}^{n}b_{j}(\xi_{j})^{2}\right)\leqslant 1\right\}.$$

See Giné and Nickl [9], Theorem 3.4.8 for a proof.

Consider $\mathfrak{m}(n):=8\log(n)/\alpha$. For any $K,K^{\prime}\in\mathcal{K}_{n}$,

$$U_{K,K^{\prime},\ell}(n)=U_{K,K^{\prime},\ell}^{1}(n)+U_{K,K^{\prime},\ell}^{2}(n)+U_{K,K^{\prime},\ell}^{3}(n)+U_{K,K^{\prime},\ell}^{4}(n),$$

where

$$U_{K,K^{\prime},\ell}^{l}(n):=\sum_{i\not=j}g_{K,K^{\prime},\ell}^{l}(n;X_{i},Y_{i},X_{j},Y_{j}),\quad l=1,2,3,4,$$

with, for every $(x^{\prime},y),(x^{\prime\prime},y^{\prime})\in E=\mathbb{R}^{d}\times\mathbb{R}$,

$$g_{K,K^{\prime},\ell}^{1}(n;x^{\prime},y,x^{\prime\prime},y^{\prime})$$

$${}:=\langle K(x^{\prime},.)\ell(y)\mathbf{1}_{|\ell(y)|\leqslant\mathfrak{m}(n)}-s_{K,\ell}^{+}(n;.),K^{\prime}(x^{\prime\prime},.)\ell(y^{\prime})\mathbf{1}_{|\ell(y)|\leqslant\mathfrak{m}(n)}-s_{K^{\prime},\ell}^{+}(n;.)\rangle_{2},$$

$$g_{K,K^{\prime},\ell}^{2}(n;x^{\prime},y,x^{\prime\prime},y^{\prime})$$

$${}:=\langle K(x^{\prime},.)\ell(y)\mathbf{1}_{|\ell(y)|>\mathfrak{m}(n)}-s_{K,\ell}^{-}(n;.),K^{\prime}(x^{\prime\prime},.)\ell(y^{\prime})\mathbf{1}_{|\ell(y)|\leqslant\mathfrak{m}(n)}-s_{K^{\prime},\ell}^{+}(n;.)\rangle_{2},$$

$$g_{K,K^{\prime},\ell}^{3}(n;x^{\prime},y,x^{\prime\prime},y^{\prime})$$

$${}:=\langle K(x^{\prime},.)\ell(y)\mathbf{1}_{|\ell(y)|\leqslant\mathfrak{m}(n)}-s_{K,\ell}^{+}(n;.),K^{\prime}(x^{\prime\prime},.)\ell(y^{\prime})\mathbf{1}_{|\ell(y)|>\mathfrak{m}(n)}-s_{K^{\prime},\ell}^{-}(n;.)\rangle_{2},$$

$$g_{K,K^{\prime},\ell}^{4}(n;x^{\prime},y,x^{\prime\prime},y^{\prime})$$

$${}:=\langle K(x^{\prime},.)\ell(y)\mathbf{1}_{|\ell(y)|>\mathfrak{m}(n)}-s_{K,\ell}^{-}(n;.),K^{\prime}(x^{\prime\prime},.)\ell(y^{\prime})\mathbf{1}_{|\ell(y)|>\mathfrak{m}(n)}-s_{K^{\prime},\ell}^{-}(n;.)\rangle_{2}$$

and, for every $k\in\mathcal{K}_{n}$,

$$s_{k,\ell}^{+}(n;.):=\mathbb{E}(k(X_{1},.)\ell(Y_{1})\mathbf{1}_{|\ell(Y_{1})|\leqslant\mathfrak{m}(n)})\quad\textrm{and}\quad s_{k,\ell}^{-}(n;.):=\mathbb{E}(k(X_{1},.)\ell(Y_{1})\mathbf{1}_{|\ell(Y_{1})|>\mathfrak{m}(n)}).$$

On the one hand, since $\mathbb{E}(g_{K,K^{\prime},\ell}^{1}(n;x^{\prime},y,X_{1},Y_{1}))=0$ for every $(x^{\prime},y)\in E$, by Lemma B.4, there exists a universal constant $\mathfrak{m}\geqslant 1$ such that for any $\lambda>0$, with probability larger than $1-5.4e^{-\lambda}$,

$$\frac{|U_{K,K^{\prime},\ell}^{1}(n)|}{n^{2}}\leqslant\frac{\mathfrak{m}}{n^{2}}(\mathfrak{c}_{K,K^{\prime},\ell}(n)\lambda^{1/2}+\mathfrak{d}_{K,K^{\prime},\ell}(n)\lambda+\mathfrak{b}_{K,K^{\prime},\ell}(n)\lambda^{3/2}+\mathfrak{a}_{K,K^{\prime},\ell}(n)\lambda^{2}),$$

where the constants $\mathfrak{a}_{K,K^{\prime},\ell}(n)$, $\mathfrak{b}_{K,K^{\prime},\ell}(n)$, $\mathfrak{c}_{K,K^{\prime},\ell}(n)$, and $\mathfrak{d}_{K,K^{\prime},\ell}(n)$ are defined and controlled later. First, note that

$$U_{K,K^{\prime},\ell}^{1}(n)=\sum_{i\not=j}(\varphi_{K,K^{\prime},\ell}(n;X_{i},Y_{i},X_{j},Y_{j})$$

$${}-\psi_{K,K^{\prime},\ell}(n;X_{i},Y_{i})-\psi_{K^{\prime},K,\ell}(n;X_{j},Y_{j})+\mathbb{E}(\varphi_{K,K^{\prime},\ell}(n;X_{i},Y_{i},X_{j},Y_{j}))),$$

(B.3)

where

$$\varphi_{K,K^{\prime},\ell}(n;x^{\prime},y,x^{\prime\prime},y^{\prime\prime}):=\langle K(x^{\prime},.)\ell(y)\mathbf{1}_{|\ell(y)|\leqslant\mathfrak{m}(n)},K^{\prime}(x^{\prime\prime},.)\ell(y^{\prime})\mathbf{1}_{|\ell(y^{\prime})|\leqslant\mathfrak{m}(n)}\rangle_{2}$$

and

$$\psi_{k,k^{\prime},\ell}(n;x^{\prime},y):=\langle k(x^{\prime},.)\ell(y)\mathbf{1}_{|\ell(y)|\leqslant\mathfrak{m}(n)},s_{k^{\prime},\ell}^{+}(n;.)\rangle_{2}=\mathbb{E}(\varphi_{k,k^{\prime},\ell}(n;x^{\prime},y,X_{1},Y_{1}))$$

for every $k,k^{\prime}\in\mathcal{K}_{n}$ and $(x^{\prime},y),(x^{\prime\prime},y^{\prime})\in E$. Let us now control $\mathfrak{a}_{K,K^{\prime},\ell}(n)$, $\mathfrak{b}_{K,K^{\prime},\ell}(n)$, $\mathfrak{c}_{K,K^{\prime},\ell}(n)$, and $\mathfrak{d}_{K,K^{\prime},\ell}(n)$.

The constant $\mathfrak{a}_{K,K^{\prime},\ell}(n)$. Consider

$$\mathfrak{a}_{K,K^{\prime},\ell}(n):=\sup_{(x^{\prime},y),(x^{\prime\prime},y^{\prime})\in E}|g_{K,K^{\prime},\ell}^{1}(n;x^{\prime},y,x^{\prime\prime},y^{\prime})|.$$

By (B.3), Cauchy–Schwarz’s inequality and Assumption 2.1.(1),

$$\mathfrak{a}_{K,K^{\prime},\ell}(n)\leqslant 4\sup_{(x^{\prime},y),(x^{\prime\prime},y^{\prime})\in E}|\langle K(x^{\prime},.)\ell(y)\mathbf{1}_{|\ell(y)|\leqslant\mathfrak{m}(n)},K^{\prime}(x^{\prime\prime},.)\ell(y^{\prime})\mathbf{1}_{|\ell(y^{\prime})|\leqslant\mathfrak{m}(n)}\rangle_{2}|$$

$${}\leqslant 4\mathfrak{m}(n)^{2}\left(\sup_{x^{\prime}\in\mathbb{R}^{d}}||K(x^{\prime},.)||_{2}\right)\left(\sup_{x^{\prime\prime}\in\mathbb{R}^{d}}||K^{\prime}(x^{\prime\prime},.)||_{2}\right)\leqslant 4\mathfrak{m}_{\mathcal{K},\ell}\mathfrak{m}(n)^{2}n.$$

So,

$$\frac{1}{n^{2}}\mathfrak{a}_{K,K^{\prime},\ell}(n)\lambda^{2}\leqslant\frac{4}{n}\mathfrak{m}_{\mathcal{K},\ell}\mathfrak{m}(n)^{2}\lambda^{2}.$$
The constant $\mathfrak{b}_{K,K^{\prime},\ell}(n)$. Consider

$$\mathfrak{b}_{K,K^{\prime},\ell}(n)^{2}:=n\sup_{(x^{\prime},y)\in E}\mathbb{E}(g_{K,K^{\prime},\ell}^{1}(n;x^{\prime},y,X_{1},Y_{1})^{2}).$$

By (B.3), Jensen’s inequality, Cauchy–Schwarz’s inequality and Assumption 2.1.(1),

$$\mathfrak{b}_{K,K^{\prime},\ell}(n)^{2}\leqslant 16n\sup_{(x^{\prime},y)\in E}\mathbb{E}(\langle K(x^{\prime},.)\ell(y)\mathbf{1}_{|\ell(y)|\leqslant\mathfrak{m}(n)},K^{\prime}(X_{1},.)\ell(Y_{1})\mathbf{1}_{|\ell(Y_{1})|\leqslant\mathfrak{m}(n)}\rangle_{2}^{2})$$

$${}\leqslant 16n\mathfrak{m}(n)^{2}\sup_{x^{\prime}\in\mathbb{R}^{d}}||K(x^{\prime},.)||_{2}^{2}\mathbb{E}(||K^{\prime}(X_{1},.)\ell(Y_{1})\mathbf{1}_{|\ell(Y_{1})|\leqslant\mathfrak{m}(n)}||_{2}^{2})\leqslant 16\mathfrak{m}_{\mathcal{K},\ell}n^{2}\mathfrak{m}(n)^{2}\overline{s}_{K^{\prime},\ell}.$$

So, for any $\theta\in]0,1[$,

$$\frac{1}{n^{2}}\mathfrak{b}_{K,K^{\prime},\ell}(n)\lambda^{3/2}\leqslant 2\left(\frac{3\mathfrak{m}}{\theta}\right)^{1/2}\frac{2}{n^{1/2}}\mathfrak{m}_{\mathcal{K},\ell}^{1/2}\mathfrak{m}(n)\lambda^{3/2}\times\left(\frac{\theta}{3\mathfrak{m}}\right)^{1/2}\frac{1}{n^{1/2}}\overline{s}_{K^{\prime},\ell}^{1/2}$$

$${}\leqslant\frac{\theta}{3\mathfrak{m}n}\overline{s}_{K^{\prime},\ell}+\frac{12\mathfrak{m}\lambda^{3}}{\theta n}\mathfrak{m}_{\mathcal{K},\ell}\mathfrak{m}(n)^{2}.$$
The constant $\mathfrak{c}_{K,K^{\prime},\ell}(n)$. Consider

$$\mathfrak{c}_{K,K^{\prime},\ell}(n)^{2}:=n^{2}\mathbb{E}(g_{K,K^{\prime},\ell}^{1}(n;X_{1},Y_{1},X_{2},Y_{2})^{2}).$$

By (B.3), Jensen’s inequality and Assumption 2.1.(3),

$$\mathfrak{c}_{K,K^{\prime},\ell}(n)^{2}\leqslant 16n^{2}\mathbb{E}(\langle K(X_{1},.)\ell(Y_{1})\mathbf{1}_{|\ell(Y_{1})|\leqslant\mathfrak{m}(n)},K^{\prime}(X_{2},.)\ell(Y_{2})\mathbf{1}_{|\ell(Y_{2})|\leqslant\mathfrak{m}(n)}\rangle_{2}^{2})$$

$${}\leqslant 16n^{2}\mathfrak{m}(n)^{2}\mathbb{E}(\langle K(X_{1},.),K^{\prime}(X_{2},.)\ell(Y_{2})\rangle_{2}^{2})\leqslant 16\mathfrak{m}_{\mathcal{K},\ell}n^{2}\mathfrak{m}(n)^{2}\overline{s}_{K^{\prime},\ell}.$$

So,

$$\frac{1}{n^{2}}\mathfrak{c}_{K,K^{\prime},\ell}(n)\lambda^{1/2}\leqslant\frac{\theta}{3\mathfrak{m}n}\overline{s}_{K^{\prime},\ell}+\frac{12\mathfrak{m}\lambda}{\theta n}\mathfrak{m}_{\mathcal{K},\ell}\mathfrak{m}(n)^{2}.$$
The constant $\mathfrak{d}_{K,K^{\prime},\ell}(n)$. Consider

$$\mathfrak{d}_{K,K^{\prime},\ell}(n):=\sup_{(a,b)\in\mathcal{A}}\mathbb{E}\left[\sum_{i<j}a_{i}(X_{i},Y_{i})b_{j}(X_{j},Y_{j})g_{K,K^{\prime},\ell}^{1}(n;X_{i},Y_{i},X_{j},Y_{j})\right],$$

where

$$\mathcal{A}:=\left\{(a,b):\sum_{i=1}^{n-1}\mathbb{E}(a_{i}(X_{i},Y_{i})^{2})\leqslant 1\quad\textrm{and}\quad\sum_{j=2}^{n}\mathbb{E}(b_{j}(X_{j},Y_{j})^{2})\leqslant 1\right\}.$$

By (B.3), Jensen’s inequality, Cauchy-Schwarz’s inequality and Assumption 2.1.(3),

$$\mathfrak{d}_{K,K^{\prime},\ell}(n)\leqslant 4\sup_{(a,b)\in\mathcal{A}}\mathbb{E}\left[\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}|a_{i}(X_{i},Y_{i})b_{j}(X_{j},Y_{j})\varphi_{K,K^{\prime},\ell}(n;X_{i},Y_{i},X_{j},Y_{j})|\right]$$

$${}\leqslant 4n\mathfrak{m}(n)\mathbb{E}(\langle K(X_{1},.),K^{\prime}(X_{2},.)\ell(Y_{2})\rangle_{2}^{2})^{1/2}\leqslant 4\mathfrak{m}_{\mathcal{K},\ell}^{1/2}n\mathfrak{m}(n)\overline{s}_{K^{\prime},\ell}^{1/2}.$$

So,

$$\frac{1}{n^{2}}\mathfrak{d}_{K,K^{\prime},\ell}(n)\lambda\leqslant\frac{\theta}{3\mathfrak{m}n}\overline{s}_{K^{\prime},\ell}+\frac{12\mathfrak{m}\lambda^{2}}{\theta n}\mathfrak{m}_{\mathcal{K},\ell}\mathfrak{m}(n)^{2}.$$

Then, since $\mathfrak{m}\geqslant 1$ and $\lambda>0$, with probability larger than $1-5.4e^{-\lambda}$,

$$\frac{|U_{K,K^{\prime},\ell}^{1}(n)|}{n^{2}}\leqslant\frac{\theta}{n}\overline{s}_{K^{\prime},\ell}+\frac{40\mathfrak{m}^{2}}{\theta n}\mathfrak{m}_{\mathcal{K},\ell}\mathfrak{m}(n)^{2}(1+\lambda)^{3}.$$

So, with probability larger than $1-5.4|\mathcal{K}_{n}|^{2}e^{-\lambda}$,

$$S_{\mathcal{K},\ell}(n,\theta):=\sup_{K,K^{\prime}\in\mathcal{K}_{n}}\left\{\frac{|U_{K,K^{\prime},\ell}^{1}(n)|}{n^{2}}-\frac{\theta}{n}\overline{s}_{K^{\prime},\ell}\right\}\leqslant\frac{40\mathfrak{m}^{2}}{\theta n}\mathfrak{m}_{\mathcal{K},\ell}\mathfrak{m}(n)^{2}(1+\lambda)^{3}.$$

For every $t\in\mathbb{R}_{+}$, consider

$$\lambda_{\mathcal{K},\ell}(n,\theta,t):=-1+\left(\frac{t}{\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)}\right)^{1/3}\textrm{\ with\ }\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)=\frac{40\mathfrak{m}^{2}}{\theta n}\mathfrak{m}_{\mathcal{K},\ell}\mathfrak{m}(n)^{2}.$$

Then, for any $T>0$,

$$\mathbb{E}(S_{\mathcal{K},\ell}(n,\theta))\leqslant T+\int\limits_{T}^{\infty}\mathbb{P}(S_{\mathcal{K},\ell}(n,\theta)\geqslant(1+\lambda_{\mathcal{K},\ell}(n,\theta,t))^{3}\mathfrak{m}_{\mathcal{K},\ell}(n,\theta))dt$$

$${}\leqslant T+5.4|\mathcal{K}_{n}|^{2}\int\limits_{T}^{\infty}\exp(-\lambda_{\mathcal{K},\ell}(n,\theta,t))dt$$

$${}=T+5.4|\mathcal{K}_{n}|^{2}\int\limits_{T}^{\infty}\exp\left(-\frac{t^{1/3}}{2\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)^{1/3}}\right)\exp\left(1-\frac{t^{1/3}}{2\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)^{1/3}}\right)dt$$

$${}\leqslant T+5.4\mathfrak{c}_{1}|\mathcal{K}_{n}|^{2}\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)\exp\left(-\frac{T^{1/3}}{2\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)^{1/3}}\right)\textrm{\ with\ }\mathfrak{c}_{1}=\int\limits_{0}^{\infty}e^{1-r^{1/3}/2}dr.$$

Moreover,

$$\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)\leqslant\mathfrak{c}_{2}\frac{\log(n)^{2}}{\theta n}\textrm{\ with\ }\mathfrak{c}_{2}=\frac{40\times 8^{2}\mathfrak{m}^{2}}{\alpha^{2}}\mathfrak{m}_{\mathcal{K},\ell}.$$

So, by taking

$$T=2^{4}\mathfrak{c}_{2}\frac{\log(n)^{5}}{\theta n},$$

and since $|\mathcal{K}_{n}|\leqslant n$,

$$\mathbb{E}(S_{\mathcal{K},\ell}(n,\theta))\leqslant 2^{4}\mathfrak{c}_{2}\frac{\log(n)^{5}}{\theta n}+5.4\mathfrak{c}_{1}\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)\frac{|\mathcal{K}_{n}|^{2}}{n^{2}}\leqslant(2^{4}+5.4\mathfrak{c}_{1})\mathfrak{c}_{2}\frac{\log(n)^{5}}{\theta n}.$$

On the other hand, by Assumption 2.1.(1), Cauchy–Schwarz’s inequality and Markov’s inequality,

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}|g_{K,K^{\prime},\ell}^{2}(n;X_{1},Y_{1},X_{2},Y_{2})|\right)$$

$${}\leqslant 4\mathfrak{m}(n)\sum_{K,K^{\prime}\in\mathcal{K}_{n}}\mathbb{E}(|\ell(Y_{1})|\mathbf{1}_{|\ell(Y_{1})|>\mathfrak{m}(n)}|\langle K(X_{1},.),K^{\prime}(X_{2},.)\rangle_{2}|)$$

$${}\leqslant 4\mathfrak{m}(n)\mathfrak{m}_{\mathcal{K},\ell}n|\mathcal{K}_{n}|^{2}\mathbb{E}(\ell(Y_{1})^{2})^{1/2}\mathbb{P}(|\ell(Y_{1})|>\mathfrak{m}(n))^{1/2}\leqslant\mathfrak{c}_{3}\frac{\log(n)}{n}$$

with

$$\mathfrak{c}_{3}=\frac{32}{\alpha}\mathfrak{m}_{\mathcal{K},\ell}\mathbb{E}(\ell(Y_{1})^{2})^{1/2}\mathbb{E}(\exp(\alpha|\ell(Y_{1})|))^{1/2}.$$

So,

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}\frac{|U_{K,K^{\prime},\ell}^{2}(n)|}{n^{2}}\right)\leqslant\mathfrak{c}_{3}\frac{\log(n)}{n}$$

and, symmetrically,

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}\frac{|U_{K,K^{\prime},\ell}^{3}(n)|}{n^{2}}\right)\leqslant\mathfrak{c}_{3}\frac{\log(n)}{n}.$$

By Assumption 2.1.(1), Cauchy–Schwarz’s inequality and Markov’s inequality,

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}|g_{K,K^{\prime},\ell}^{4}(n;X_{1},Y_{1},X_{2},Y_{2})|\right)$$

$${}\leqslant 4\sum_{K,K^{\prime}\in\mathcal{K}_{n}}\mathbb{E}(|\ell(Y_{1})\ell(Y_{2})|\mathbf{1}_{|\ell(Y_{1})|,|\ell(Y_{2})|>\mathfrak{m}(n)}|\langle K(X_{1},.),K^{\prime}(X_{2},.)\rangle_{2}|)$$

$${}\leqslant 4\mathfrak{m}_{\mathcal{K},\ell}n|\mathcal{K}_{n}|^{2}\mathbb{E}(\ell(Y_{1})^{2})\mathbb{P}(|\ell(Y_{1})|>\mathfrak{m}(n))\leqslant\frac{\mathfrak{c}_{4}}{n^{5}}$$

with

$$\mathfrak{c}_{4}=4\mathfrak{m}_{\mathcal{K},\ell}\mathbb{E}(\ell(Y_{1})^{2})\mathbb{E}(\exp(\alpha|\ell(Y_{1})|)).$$

So,

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}\frac{|U_{K,K^{\prime},\ell}^{4}(n)|}{n^{2}}\right)\leqslant\frac{\mathfrak{c}_{4}}{n^{5}}.$$

Therefore,

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}\left\{\frac{|U_{K,K^{\prime},\ell}(n)|}{n^{2}}-\frac{\theta}{n}\overline{s}_{K^{\prime},\ell}\right\}\right)\leqslant(2^{4}+5.4\mathfrak{c}_{1})\mathfrak{c}_{2}\frac{\log(n)^{5}}{\theta n}+2\mathfrak{c}_{3}\frac{\log(n)}{n}+\frac{\mathfrak{c}_{4}}{n^{5}}.$$

B.1.2. Proof of Lemma B.2. First, the two following results are used several times in the sequel:

$$||s_{K,\ell}||_{2}^{2}\leqslant\mathbb{E}(\ell(Y_{1})^{2})\int\limits_{\mathbb{R}^{d}}f(x^{\prime})\int\limits_{\mathbb{R}^{d}}K(x^{\prime},x)^{2}\lambda_{d}(dx)\lambda_{d}(dx^{\prime})\leqslant\mathbb{E}(\ell(Y_{1})^{2})\mathfrak{m}_{\mathcal{K},\ell}n$$

(B.4)

and

$$\mathbb{E}(V_{K,\ell}(n))=\mathbb{E}(||K(X_{1},.)\ell(Y_{1})-s_{K,\ell}||_{2}^{2})$$

$${}=\mathbb{E}(||K(X_{1},.)\ell(Y_{1})||_{2}^{2})+||s_{K,\ell}||_{2}^{2}-2\int\limits_{\mathbb{R}^{d}}s_{K,\ell}(x)\mathbb{E}(K(X_{1},x)\ell(Y_{1}))\lambda_{d}(dx)=\overline{s}_{K,\ell}-||s_{K,\ell}||_{2}^{2}.$$

(B.5)

Consider $\mathfrak{m}(n):=2\log(n)/\alpha$ and

$$v_{K,\ell}(n):=V_{K,\ell}(n)-\mathbb{E}(V_{K,\ell}(n))=v_{K,\ell}^{1}(n)+v_{K,\ell}^{2}(n),$$

where

$$v_{K,\ell}^{j}(n)=\frac{1}{n}\sum_{i=1}^{n}(g_{K,\ell}^{j}(n;X_{i},Y_{i})-\mathbb{E}(g_{K,\ell}^{j}(n;X_{i},Y_{i})));\quad j=1,2,$$

with, for every $(x^{\prime},y)\in E$,

$$g_{K,\ell}^{1}(n;x^{\prime},y):=||K(x^{\prime},.)\ell(y)-s_{K,\ell}||_{2}^{2}\mathbf{1}_{|\ell(y)|\leqslant\mathfrak{m}(n)}$$

and

$$g_{K,\ell}^{2}(n;x^{\prime},y):=||K(x^{\prime},.)\ell(y)-s_{K,\ell}||_{2}^{2}\mathbf{1}_{|\ell(y)|>\mathfrak{m}(n)}.$$

On the one hand, by Bernstein’s inequality, for any $\lambda>0$, with probability larger than $1-2e^{-\lambda}$,

$$|v_{K,\ell}^{1}(n)|\leqslant\sqrt{\frac{2\lambda}{n}\mathfrak{v}_{K,\ell}(n)}+\frac{\lambda}{n}\mathfrak{c}_{K,\ell}(n),$$

where

$$\mathfrak{c}_{K,\ell}(n)=\frac{||g_{K,\ell}^{1}(n;.)||_{\infty}}{3}\quad\textrm{and}\quad\mathfrak{v}_{K,\ell}(n)=\mathbb{E}(g_{K,\ell}^{1}(n;X_{1},Y_{1})^{2}).$$

Moreover,

$$\mathfrak{c}_{K,\ell}(n)=\frac{1}{3}\sup_{(x^{\prime},y)\in E}||K(x^{\prime},.)\ell(y)-s_{K,\ell}||_{2}^{2}\mathbf{1}_{|\ell(y)|\leqslant\mathfrak{m}(n)}$$

$${}\leqslant\frac{2}{3}\left(\mathfrak{m}(n)^{2}\sup_{x^{\prime}\in\mathbb{R}^{d}}||K(x^{\prime},.)||_{2}^{2}+||s_{K,\ell}||_{2}^{2}\right)\leqslant\frac{2}{3}(\mathfrak{m}(n)^{2}+\mathbb{E}(\ell(Y_{1})^{2}))\mathfrak{m}_{\mathcal{K},\ell}n$$

by inequality (B.4), and

$$\mathfrak{v}_{K,\ell}(n)\leqslant||g_{K,\ell}^{1}(n;.)||_{\infty}\mathbb{E}(V_{K,\ell}(n))$$

$${}\leqslant 2(\mathfrak{m}(n)^{2}+\mathbb{E}(\ell(Y_{1})^{2}))\mathfrak{m}_{\mathcal{K},\ell}n(\overline{s}_{K,\ell}-||s_{K,\ell}||_{2}^{2})$$

by inequality (B.4) and equality (B.5). Then, for any $\theta\in]0,1[$,

$$|v_{K,\ell}^{1}(n)|\leqslant 2\sqrt{\lambda(\mathfrak{m}(n)^{2}+\mathbb{E}(\ell(Y_{1})^{2}))\mathfrak{m}_{\mathcal{K},\ell}(\overline{s}_{K,\ell}-||s_{K,\ell}||_{2}^{2})}+\frac{2\lambda}{3}(\mathfrak{m}(n)^{2}+\mathbb{E}(\ell(Y_{1})^{2}))\mathfrak{m}_{\mathcal{K},\ell}$$

$${}\leqslant\theta\overline{s}_{K,\ell}+\frac{5\lambda}{3\theta}(1+\mathbb{E}(\ell(Y_{1})^{2}))\mathfrak{m}_{\mathcal{K},\ell}\mathfrak{m}(n)^{2}$$

with probability larger than $1-2e^{-\lambda}$. So, with probability larger than $1-2|\mathcal{K}_{n}|e^{-\lambda}$,

$$S_{\mathcal{K},\ell}(n,\theta):=\sup_{K\in\mathcal{K}_{n}}\left\{\frac{|v_{K,\ell}^{1}(n)|}{n}-\frac{\theta}{n}\overline{s}_{K,\ell}\right\}\leqslant\frac{5\lambda}{3\theta n}(1+\mathbb{E}(\ell(Y_{1})^{2}))\mathfrak{m}_{\mathcal{K},\ell}\mathfrak{m}(n)^{2}.$$

For every $t\in\mathbb{R}_{+}$, consider

$$\lambda_{\mathcal{K},\ell}(n,\theta,t):=\frac{t}{\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)}\textrm{\ with\ }\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)=\frac{5}{3\theta n}(1+\mathbb{E}(\ell(Y_{1})^{2}))\mathfrak{m}_{\mathcal{K},\ell}\mathfrak{m}(n)^{2}.$$

Then, for any $T>0$,

$$\mathbb{E}(S_{\mathcal{K},\ell}(n,\theta))\leqslant T+\int\limits_{T}^{\infty}\mathbb{P}(S_{\mathcal{K},\ell}(n,\theta)\geqslant\lambda_{\mathcal{K},\ell}(n,\theta,t)\mathfrak{m}_{\mathcal{K},\ell}(n,\theta))dt$$

$${}\leqslant T+2|\mathcal{K}_{n}|\int\limits_{T}^{\infty}\exp(-\lambda_{\mathcal{K},\ell}(n,\theta,t))dt$$

$${}=T+2|\mathcal{K}_{n}|\int\limits_{T}^{\infty}\exp\left(-\frac{t}{2\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)}\right)\exp\left(-\frac{t}{2\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)}\right)dt$$

$${}\leqslant T+2\mathfrak{c}_{1}|\mathcal{K}_{n}|\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)\exp\left(-\frac{T}{2\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)}\right)\textrm{\ with\ }\mathfrak{c}_{1}=\int\limits_{0}^{\infty}e^{-r/2}dr=2.$$

(B.6)

Moreover,

$$\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)\leqslant\mathfrak{c}_{2}\frac{\log(n)^{2}}{\theta n}\textrm{\ with\ }\mathfrak{c}_{2}=\frac{20}{3\alpha^{2}}(1+\mathbb{E}(\ell(Y_{1})^{2}))\mathfrak{m}_{\mathcal{K},\ell}.$$

So, by taking

$$T=2\mathfrak{c}_{2}\frac{\log(n)^{3}}{\theta n},$$

and since $|\mathcal{K}_{n}|\leqslant n$,

$$\mathbb{E}(S_{\mathcal{K},\ell}(n,\theta))\leqslant 2\mathfrak{c}_{2}\frac{\log(n)^{3}}{\theta n}+4\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)\frac{|\mathcal{K}_{n}|}{n}\leqslant 6\mathfrak{c}_{2}\frac{\log(n)^{3}}{\theta n}.$$

On the other hand, by inequality (B.4) and Markov’s inequality,

$$\mathbb{E}\left[\sup_{K\in\mathcal{K}_{n}}\frac{|v_{K,\ell}^{2}(n)|}{n}\right]\leqslant\frac{2}{n}\mathbb{E}\left(\sup_{K\in\mathcal{K}_{n}}||K(X_{1},.)\ell(Y_{1})-s_{K,\ell}||_{2}^{2}\mathbf{1}_{|\ell(Y_{1})|>\mathfrak{m}(n)}\right)$$

$${}\leqslant\frac{4}{n}\mathbb{E}\left[\left|\ell(Y_{1})^{2}\sup_{K\in\mathcal{K}_{n}}||K(X_{1},.)||_{2}^{2}+\sup_{K\in\mathcal{K}_{n}}||s_{K,\ell}||_{2}^{2}\right|^{2}\right]^{1/2}\mathbb{P}(|\ell(Y_{1})|>\mathfrak{m}(n))^{1/2}\leqslant\frac{\mathfrak{c}_{3}}{n}$$

with

$$\mathfrak{c}_{3}=8\mathfrak{m}_{\mathcal{K},\ell}\mathbb{E}(\ell(Y_{1})^{4})^{1/2}\mathbb{E}(\exp(\alpha|\ell(Y_{1})|))^{1/2}.$$

Therefore,

$$\mathbb{E}\left(\sup_{K\in\mathcal{K}_{n}}\left\{\frac{|v_{K,\ell}(n)|}{n}-\frac{\theta}{n}\overline{s}_{K,\ell})\right\}\right)\leqslant 6\mathfrak{c}_{2}\frac{\log(n)^{3}}{\theta n}+\frac{\mathfrak{c}_{3}}{n}$$

and, by equality (B.5), the definition of $v_{K,\ell}(n)$ and Assumption 2.1.(2),

$$\mathbb{E}\left(\sup_{K\in\mathcal{K}_{n}}\left\{\frac{1}{n}|V_{K,\ell}(n)-\overline{s}_{K,\ell}|-\frac{\theta}{n}\overline{s}_{K,\ell}\right\}\right)\leqslant 6\mathfrak{c}_{2}\frac{\log(n)^{3}}{\theta n}+\frac{\mathfrak{c}_{3}+\mathfrak{m}_{\mathcal{K},\ell}}{n}.$$

Remark B.5. As mentioned in Remark 2.10, replacing the exponential moment condition by the weaker $q$-th moment condition with $q=(12-4\varepsilon)/\beta$, $\varepsilon\in]0,1[$ and $0<\beta<\varepsilon/2$, allows to get a rate of convergence of order $1/n^{1-\varepsilon}$. Indeed, by inequality (B.6), with $\mathfrak{m}(n)=n^{\beta}$ and

$$T=\frac{2\mathfrak{c}_{1}}{\theta n^{1-\varepsilon}}{\, with\, }\mathfrak{c}_{1}=\frac{5}{3}(1+\mathbb{E}(\ell(Y_{1})^{2}))\mathfrak{m}_{\mathcal{K},\ell},$$

and by letting $\alpha=1+2\beta-\varepsilon$, there exist $n_{\varepsilon,\alpha}\in\mathbb{N}^{*}$ and $\mathfrak{c}_{\varepsilon,\alpha}>0$ not depending on $n$, such that for any $n\geqslant n_{\varepsilon,\alpha}$,

$$\mathbb{E}(S_{\mathcal{K},\ell}(n,\theta))\leqslant\frac{2\mathfrak{c}_{1}}{\theta n^{1-\varepsilon}}+4\mathfrak{c}_{1}|\mathcal{K}_{n}|\frac{n^{2\beta-1}}{\theta}\exp(-n^{\varepsilon-2\beta})$$

$${}\leqslant\frac{2\mathfrak{c}_{1}}{\theta n^{1-\varepsilon}}+4\mathfrak{c}_{1}\mathfrak{c}_{\varepsilon,\alpha}\frac{n^{2\beta}}{\theta n^{\alpha}}=\frac{2\mathfrak{c}_{1}(1+2\mathfrak{c}_{\varepsilon,\alpha})}{\theta n^{1-\varepsilon}}.$$

Furthermore, by Markov’s inequality,

$$\mathbb{P}(|\ell(Y_{1})|>n^{\beta})\leqslant\frac{\mathbb{E}(|\ell(Y_{1})|^{(12-4\varepsilon)/\beta})}{n^{12-4\varepsilon}}.$$

So, as previously, there exists a deterministic constant $\mathfrak{c}_{2}>0$ such that

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}|W_{K,K^{\prime},\ell}^{2}(n)|\right)\leqslant\mathfrak{c}_{2}|\mathcal{K}_{n}|^{2}\mathbb{P}(|\ell(Y_{1})|>\mathfrak{m}(n))^{1/4}\leqslant\frac{\mathfrak{c}_{3}\mathbb{E}(|\ell(Y_{1})|^{(12-4\varepsilon)/\beta})^{1/4}}{n^{1-\varepsilon}},$$

and then

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}\{|W_{K,K^{\prime},\ell}(n)|-\theta||s_{K^{\prime},\ell}-s||_{2}^{2}\}\right)$$

$${}\leqslant\frac{\mathfrak{c}_{3}}{\theta n^{1-\varepsilon}}{\, with\, }\mathfrak{c}_{3}=2\mathfrak{c}_{1}(1+2\mathfrak{c}_{\varepsilon,\alpha})+\mathfrak{c}_{2}\mathbb{E}(|\ell(Y_{1})|^{(12-4\varepsilon)/\beta})^{1/4}.$$

B.1.3. Proof of Lemma B.3. Consider $\mathfrak{m}(n)=12\log(n)/\alpha$. For any $K,K^{\prime}\in\mathcal{K}_{n}$,

$$W_{K,K^{\prime},\ell}(n)=W_{K,K^{\prime},\ell}^{1}(n)+W_{K,K^{\prime},\ell}^{2}(n),$$

where

$$W_{K,K^{\prime},\ell}^{j}(n):=\frac{1}{n}\sum_{i=1}^{n}(g_{K,K^{\prime},\ell}^{j}(n;X_{i},Y_{i})-\mathbb{E}(g_{K,K^{\prime},\ell}^{j}(n;X_{i},Y_{i})));\quad j=1,2,$$

with, for every $(x^{\prime},y)\in E$,

$$g_{K,K^{\prime},\ell}^{1}(n;x^{\prime},y):=\langle K(x^{\prime},.)\ell(y),s_{K^{\prime},\ell}-s\rangle_{2}\mathbf{1}_{|\ell(y)|\leqslant\mathfrak{m}(n)}$$

and

$$g_{K,K^{\prime},\ell}^{2}(n;x^{\prime},y):=\langle K(x^{\prime},.)\ell(y),s_{K^{\prime},\ell}-s\rangle_{2}\mathbf{1}_{|\ell(y)|>\mathfrak{m}(n)}.$$

On the one hand, by Bernstein’s inequality, for any $\lambda>0$, with probability larger than $1-2e^{-\lambda}$,

$$|W_{K,K^{\prime},\ell}^{1}(n)|\leqslant\sqrt{\frac{2\lambda}{n}\mathfrak{v}_{K,K^{\prime},\ell}(n)}+\frac{\lambda}{n}\mathfrak{c}_{K,K^{\prime},\ell}(n),$$

where

$$\mathfrak{c}_{K,K^{\prime},\ell}(n)=\frac{||g_{K,K^{\prime},\ell}^{1}(n;.)||_{\infty}}{3}\quad\textrm{and}\quad\mathfrak{v}_{K,K^{\prime},\ell}(n)=\mathbb{E}(g_{K,K^{\prime},\ell}^{1}(n;X_{1},Y_{1})^{2}).$$

Moreover,

$$\mathfrak{c}_{K,K^{\prime},\ell}(n)=\frac{1}{3}\sup_{(x^{\prime},y)\in E}|\langle K(x^{\prime},.)\ell(y),s_{K^{\prime},\ell}-s\rangle_{2}|\mathbf{1}_{|\ell(y)|\leqslant\mathfrak{m}(n)}$$

$${}\leqslant\frac{1}{3}\mathfrak{m}(n)||s_{K^{\prime},\ell}-s||_{2}\sup_{x^{\prime}\in\mathbb{R}^{d}}||K(x^{\prime},.)||_{2}\leqslant\frac{1}{3}\mathfrak{m}_{\mathcal{K},\ell}^{1/2}n^{1/2}\mathfrak{m}(n)||s_{K^{\prime},\ell}-s||_{2}$$

by Assumption 2.1.(1) and

$$\mathfrak{v}_{K,\ell}(n)\leqslant\mathbb{E}(\langle K(X_{1},.)\ell(Y_{1}),s_{K^{\prime},\ell}-s\rangle_{2}^{2}\mathbf{1}_{|\ell(Y_{1})|\leqslant\mathfrak{m}(n)})$$

$${}\leqslant\mathfrak{m}(n)^{2}\mathfrak{m}_{\mathcal{K},\ell}||s_{K^{\prime},\ell}-s||_{2}^{2}$$

by Assumption 2.1.(4). Then, since $\lambda>0$, for any $\theta\in]0,1[$,

$$|W_{K,K^{\prime},\ell}^{1}(n)|\leqslant\sqrt{\frac{2\lambda}{n}\mathfrak{m}(n)^{2}\mathfrak{m}_{\mathcal{K},\ell}||s_{K^{\prime},\ell}-s||_{2}^{2}}+\frac{\lambda}{3n^{1/2}}\mathfrak{m}_{\mathcal{K},\ell}^{1/2}\mathfrak{m}(n)||s_{K^{\prime},\ell}-s||_{2}$$

$${}\leqslant\theta||s_{K^{\prime},\ell}-s||_{2}^{2}+\frac{\mathfrak{m}_{\mathcal{K},\ell}}{2\theta n}\mathfrak{m}(n)^{2}(1+\lambda)^{2}$$

with probability larger than $1-2e^{-\lambda}$. So, with probability larger than $1-2|\mathcal{K}_{n}|^{2}e^{-\lambda}$,

$$S_{\mathcal{K},\ell}(n,\theta):=\sup_{K,K^{\prime}\in\mathcal{K}_{n}}\{|W_{K,K^{\prime},\ell}^{1}(n)|-\theta||s_{K^{\prime},\ell}-s||_{2}^{2}\}\leqslant\frac{\mathfrak{m}_{\mathcal{K},\ell}}{2\theta n}\mathfrak{m}(n)^{2}(1+\lambda)^{2}.$$

For every $t\in\mathbb{R}_{+}$, consider

$$\lambda_{\mathcal{K},\ell}(n,\theta,t):=-1+\left(\frac{t}{\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)}\right)^{1/2}\textrm{\ with\ }\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)=\frac{\mathfrak{m}_{\mathcal{K},\ell}}{2\theta n}\mathfrak{m}(n)^{2}.$$

Then, for any $T>0$,

$$\mathbb{E}(S_{\mathcal{K},\ell}(n,\theta))\leqslant T+\int\limits_{T}^{\infty}\mathbb{P}(S_{\mathcal{K},\ell}(n,\theta)\geqslant(1+\lambda_{\mathcal{K},\ell}(n,\theta,t))^{2}\mathfrak{m}_{\mathcal{K},\ell}(n,\theta))dt$$

$${}\leqslant T+2|\mathcal{K}_{n}|^{2}\int\limits_{T}^{\infty}\exp(-\lambda_{\mathcal{K},\ell}(n,\theta,t))dt$$

$${}=T+2|\mathcal{K}_{n}|^{2}\int\limits_{T}^{\infty}\exp\left(-\frac{t^{1/2}}{2\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)^{1/2}}\right)\exp\left(1-\frac{t^{1/2}}{2\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)^{1/2}}\right)dt$$

$${}\leqslant T+2\mathfrak{c}_{1}|\mathcal{K}_{n}|^{2}\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)\exp\left(-\frac{T^{1/2}}{2\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)^{1/2}}\right)\textrm{\ with\ }\mathfrak{c}_{1}=\int\limits_{0}^{\infty}e^{1-r^{1/2}/2}dr.$$

Moreover,

$$\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)\leqslant\mathfrak{c}_{2}\frac{\log(n)^{2}}{\theta n}\textrm{\ with\ }\mathfrak{c}_{2}=\frac{12^{2}}{2\alpha^{2}}\mathfrak{m}_{\mathcal{K},\ell}.$$

So, by taking

$$T=2^{3}\mathfrak{c}_{2}\frac{\log(n)^{4}}{\theta n},$$

and since $|\mathcal{K}_{n}|\leqslant n$,

$$\mathbb{E}(S_{\mathcal{K},\ell}(n,\theta))\leqslant 2^{3}\mathfrak{c}_{2}\frac{\log(n)^{4}}{\theta n}+2\mathfrak{c}_{1}\mathfrak{m}_{\mathcal{K},\ell}(n,\theta)\frac{|\mathcal{K}_{n}|^{2}}{n^{2}}\leqslant(2^{3}+2\mathfrak{c}_{1})\mathfrak{c}_{2}\frac{\log(n)^{4}}{\theta n}.$$

On the other hand, by Assumption 2.1.(2), 2.1.(4), Cauchy–Schwarz’s inequality and Markov’s inequality,

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}|W_{K,K^{\prime},\ell}^{2}(n)|\right)\leqslant 2\mathbb{E}(\ell(Y_{1})^{2}\mathbf{1}_{|\ell(Y_{1})|>\mathfrak{m}(n)})^{1/2}\sum_{K,K^{\prime}\in\mathcal{K}_{n}}\mathbb{E}(\langle K(X_{1},.),s_{K^{\prime},\ell}-s\rangle_{2}^{2})^{1/2}$$

$${}\leqslant 2\mathfrak{m}_{\mathcal{K},\ell}^{1/2}||s_{K^{\prime},\ell}-s||_{2}\mathbb{E}(\ell(Y_{1})^{4})^{1/4}|\mathcal{K}_{n}|^{2}\mathbb{P}(|\ell(Y_{1})|>\mathfrak{m}(n))^{1/4}\leqslant\frac{\mathfrak{c}_{3}}{n}$$

with

$$\mathfrak{c}_{3}=2\mathfrak{m}_{\mathcal{K},\ell}^{1/2}(\mathfrak{m}_{\mathcal{K},\ell}^{1/2}+||s||_{2})\mathbb{E}(\ell(Y_{1})^{4})^{1/4}\mathbb{E}(\exp(\alpha|\ell(Y_{1})|))^{1/4}.$$

Therefore,

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}\{|W_{K,K^{\prime},\ell}(n)|-\theta||s_{K^{\prime},\ell}-s||_{2}^{2}\}\right)\leqslant(2^{3}+2\mathfrak{c}_{1})\mathfrak{c}_{2}\frac{\log(n)^{4}}{\theta n}+\frac{\mathfrak{c}_{3}}{n}\leqslant\mathfrak{c}_{4}\frac{\log(n)^{4}}{\theta n}$$

with $\mathfrak{c}_{4}=(2^{3}+2\mathfrak{c}_{1})\mathfrak{c}_{2}+\mathfrak{c}_{3}$.

B.2. Proof of Proposition 2.7

For any $K\in\mathcal{K}_{n}$,

$$||\widehat{s}_{K,\ell}(n;.)-s_{K,\ell}||_{2}^{2}=\frac{U_{K,\ell}(n)}{n^{2}}+\frac{V_{K,\ell}(n)}{n}$$

(B.7)

with $U_{K,\ell}(n)=U_{K,K,\ell}(n)$ and $V_{K,\ell}(n)=V_{K,K,\ell}(n)$. Then, by Lemmas B.1 and B.2,

$$\mathbb{E}\left(\sup_{K\in\mathcal{K}_{n}}\left\{\left|||\widehat{s}_{K,\ell}(n;.)-s_{K,\ell}||_{2}^{2}-\frac{\overline{s}_{K,\ell}}{n}\right|-\frac{\theta}{n}\overline{s}_{K,\ell}\right\}\right)\leqslant\mathfrak{c}_{2.7}\frac{\log(n)^{5}}{\theta n}$$

with $\mathfrak{c}_{2.7}=\mathfrak{c}_{B.1}+\mathfrak{c}_{B.2}$.

B.3. Proof of Theorem 2.8

On the one hand, for every $K\in\mathcal{K}_{n}$,

$$||\widehat{s}_{K,\ell}(n;.)-s||_{2}^{2}-(1+\theta)\left(||s_{K,\ell}-s||_{2}^{2}+\frac{\overline{s}_{K,\ell}}{n}\right)$$

can be written

$$||\widehat{s}_{K,\ell}(n;.)-s_{K,\ell}||_{2}^{2}-(1+\theta)\frac{\overline{s}_{K,\ell}}{n}+2W_{K,\ell}(n)-\theta||s_{K,\ell}-s||_{2}^{2},$$

where $W_{K,\ell}(n):=W_{K,K,\ell}(n)$ (see (14)). Then, by Proposition 2.7 and Lemma B.3,

$$\mathbb{E}\left(\sup_{K\in\mathcal{K}_{n}}\left\{||\widehat{s}_{K,\ell}(n;.)-s||_{2}^{2}-(1+\theta)\left(||s_{K,\ell}-s||_{2}^{2}+\frac{\overline{s}_{K,\ell}}{n}\right)\right\}\right)\leqslant\mathfrak{c}_{2.8}\frac{\log(n)^{5}}{\theta n}$$

with $\mathfrak{c}_{2.8}=\mathfrak{c}_{2.7}+\mathfrak{c}_{B.3}$. On the other hand, for any $K\in\mathcal{K}_{n}$,

$$||s_{K,\ell}-s||_{2}^{2}=||\widehat{s}_{K,\ell}(n;.)-s||_{2}^{2}-||\widehat{s}_{K,\ell}(n;.)-s_{K,\ell}||_{2}^{2}-W_{K,\ell}(n).$$

Then,

$$(1-\theta)\left(||s_{K,\ell}-s||_{2}^{2}+\frac{\overline{s}_{K,\ell}}{n}\right)-||\widehat{s}_{K,\ell}(n;.)-s||_{2}^{2}\leqslant|W_{K,\ell}(n)|-\theta||s_{K,\ell}-s||_{2}^{2}+\Lambda_{K,\ell}(n)-\theta\frac{\overline{s}_{K,\ell}}{n},$$

where

$$\Lambda_{K,\ell}(n):=\left|||\widehat{s}_{K,\ell}-s_{K,\ell}||_{2}^{2}-\frac{\overline{s}_{K,\ell}}{n}\right|.$$

By equalities (B.5) and (B.7),

$$\Lambda_{K,\ell}(n)=\left|\frac{U_{K,\ell}(n)}{n^{2}}+\frac{v_{K,\ell}(n)}{n}-\frac{||s_{K,\ell}||_{2}^{2}}{n}\right|$$

with $U_{K,\ell}(n)=U_{K,K,\ell}(n)$ (see (B.1)). By Lemmas B.1 and B.2, there exists a deterministic constant $\mathfrak{c}_{1}>0$, not depending $n$ and $\theta$, such that

$$\mathbb{E}\left(\sup_{K\in\mathcal{K}_{n}}\left\{\Lambda_{K,\ell}(n)-\theta\frac{\overline{s}_{K,\ell}}{n}\right\}\right)\leqslant\mathfrak{c}_{1}\frac{\log(n)^{5}}{\theta n}.$$

By Lemma B.3,

$$\mathbb{E}\left(\sup_{K\in\mathcal{K}_{n}}\{|W_{K,\ell}(n)|-\theta||s_{K,\ell}-s||_{2}^{2}\}\right)\leqslant\mathfrak{c}_{B.3}\frac{\log(n)^{4}}{\theta n}.$$

Therefore,

$$\mathbb{E}\left(\sup_{K\in\mathcal{K}_{n}}\left\{||s_{K,\ell}-s||_{2}^{2}+\frac{\overline{s}_{K,\ell}}{n}-\frac{1}{1-\theta}||\widehat{s}_{K,\ell}(n;.)-s||_{2}^{2}\right\}\right)\leqslant\overline{\mathfrak{c}}_{2.8}\frac{\log(n)^{5}}{\theta(1-\theta)n}$$

with $\overline{\mathfrak{c}}_{2.8}=\mathfrak{c}_{B.3}+\mathfrak{c}_{1}$.

B.4. Proof of Theorem 3.2

The proof of Theorem 3.2 is dissected in three steps.

Step 1. This first step is devoted to provide a suitable decomposition of

$$||\widehat{s}_{\widehat{K},\ell}(n;\cdot)-s||_{2}^{2}.$$

First,

$$||\widehat{s}_{\widehat{K},\ell}(n;\cdot)-s||_{2}^{2}=||\widehat{s}_{\widehat{K},\ell}(n;\cdot)-\widehat{s}_{K_{0},\ell}(n;\cdot)||_{2}^{2}$$

$${}+||\widehat{s}_{K_{0},\ell}(n;\cdot)-s||_{2}^{2}-2\langle\widehat{s}_{K_{0},\ell}(n;\cdot)-\widehat{s}_{\widehat{K},\ell}(n;\cdot),\widehat{s}_{K_{0},\ell}(n;\cdot)-s\rangle_{2}.$$

From (8), it follows that for any $K\in\mathcal{K}_{n}$,

$$||\widehat{s}_{\widehat{K},\ell}(n;\cdot)-s||_{2}^{2}\leqslant||\widehat{s}_{K,\ell}(n;\cdot)-s||_{2}^{2}+\textrm{pen}_{\ell}(K)-\textrm{pen}_{\ell}(\widehat{K})+||\widehat{s}_{K_{0},\ell}(n;\cdot)-s||_{2}^{2}$$

$${}-2\langle\widehat{s}_{K,\ell}(n;\cdot)-\widehat{s}_{\widehat{K},\ell}(n\cdot),\widehat{s}_{K_{0},\ell}(n;\cdot)-s\rangle_{2}=||\widehat{s}_{K,\ell}(n;\cdot)-s||_{2}^{2}+\psi_{n}(K)-\psi_{n}(\widehat{K}),$$

(B.8)

where

$$\psi_{n}(K):=2\langle\widehat{s}_{K,\ell}(n;\cdot)-s,\widehat{s}_{K_{0},\ell}(n;\cdot)-s\rangle_{2}-\textrm{pen}_{\ell}(K).$$

Let’s complete the decomposition of $||\widehat{s}_{\widehat{K},\ell}(n;\cdot)-s||_{2}^{2}$ by writing

$$\psi_{n}(K)=2(\psi_{1,n}(K)+\psi_{2,n}(K)+\psi_{3,n}(K)),$$

where

$$\psi_{1,n}(K):=\dfrac{U_{K,K_{0},\ell}(n)}{n^{2}},$$

$$\psi_{2,n}(K):=-\dfrac{1}{n^{2}}\left(\displaystyle\sum_{i=1}^{n}\ell(Y_{i})\langle K_{0}(X_{i},.),s_{K,\ell}\rangle_{2}+\sum_{i=1}^{n}\ell(Y_{i})\langle K(X_{i},.),s_{K_{0},\ell}\rangle_{2}\right)+\dfrac{1}{n}\langle s_{K_{0},\ell},s_{K,\ell}\rangle_{2},\textrm{ and}$$

$$\psi_{3,n}(K):=W_{K,K_{0},\ell}(n)+W_{K_{0},K,\ell}(n)+\langle s_{K,\ell}-s,s_{K_{0},\ell}-s\rangle_{2}.$$

Step 2. In this step, we give controls of the quantities

$$\mathbb{E}(\psi_{i,n}(K))\quad\text{and}\quad\mathbb{E}(\psi_{i,n}(\widehat{K}));\quad i=1,2,3.$$

By Lemma B.1, for any $\theta\in]0,1[$,

$$\mathbb{E}(|\psi_{1,n}(K)|)\leqslant\frac{\theta}{n}\overline{s}_{K,\ell}+\mathfrak{c}_{B.1}\frac{\log(n)^{5}}{\theta n}$$

and

$$\mathbb{E}(|\psi_{1,n}(\widehat{K})|)\leqslant\frac{\theta}{n}\mathbb{E}(\overline{s}_{\widehat{K},\ell})+\mathfrak{c}_{B.1}\frac{\log(n)^{5}}{\theta n}.$$
On the one hand, for any $K,K^{\prime}\in\mathcal{K}_{n}$, consider

$$\Psi_{2,n}(K,K^{\prime}):=\dfrac{1}{n}\displaystyle\sum_{i=1}^{n}\ell(Y_{i})\langle K(X_{i},.),s_{K^{\prime},\ell}\rangle_{2}.$$

Then, by Assumption 3.1,

$$\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}|\Psi_{2,n}(K,K^{\prime})|\right)\leqslant\mathbb{E}(\ell(Y_{1})^{2})^{1/2}\mathbb{E}\left(\sup_{K,K^{\prime}\in\mathcal{K}_{n}}\langle K(X_{1},.),s_{K^{\prime},\ell}\rangle_{2}^{2}\right)^{1/2}$$

$${}\leqslant\overline{\mathfrak{m}}_{\mathcal{K},\ell}^{1/2}\mathbb{E}(\ell(Y_{1})^{2})^{1/2}.$$

On the other hand, by Assumption 2.1.(2),

$$|\langle s_{K,\ell},s_{K_{0},\ell}\rangle_{2}|\leqslant\mathfrak{m}_{\mathcal{K},\ell}.$$

Then, there exists a deterministic constant $\mathfrak{c}_{1}>0$, not depending on $n$ and $K$, such that

$$\mathbb{E}(|\psi_{2,n}(K)|)\leqslant\frac{\mathfrak{c}_{1}}{n}\quad\textrm{and}\quad\mathbb{E}(|\psi_{2,n}(\widehat{K})|)\leqslant\frac{\mathfrak{c}_{1}}{n}.$$
By Lemma B.3,

$$\mathbb{E}(|\psi_{3,n}(K)|)\leqslant\dfrac{\theta}{4}(||s_{K,\ell}-s||_{2}^{2}+||s_{K_{0},\ell}-s||_{2}^{2})+8\mathfrak{c}_{B.3}\frac{\log(n)^{4}}{\theta n}$$

$${}+\left(\dfrac{\theta}{2}\right)^{1/2}||s_{K,\ell}-s||_{2}\times\left(\dfrac{2}{\theta}\right)^{1/2}||s_{K_{0},\ell}-s||_{2}$$

$${}\leqslant\dfrac{\theta}{2}||s_{K,\ell}-s||_{2}^{2}+\left(\dfrac{\theta}{4}+\dfrac{1}{\theta}\right)||s_{K_{0},\ell}-s||_{2}^{2}+8\mathfrak{c}_{B.3}\frac{\log(n)^{4}}{\theta n}$$

and

$$\mathbb{E}(|\psi_{3,n}(\widehat{K})|)\leqslant\frac{\theta}{2}\mathbb{E}(||s_{\widehat{K},\ell}-s||_{2}^{2})+\left(\dfrac{\theta}{4}+\dfrac{1}{\theta}\right)||s_{K_{0},\ell}-s||_{2}^{2}+8\mathfrak{c}_{B.3}\frac{\log(n)^{4}}{\theta n}.$$

Step 3. By the previous step, there exists a deterministic constant $\mathfrak{c}_{2}>0$, not depending on $n$, $\theta$, $K$, and $K_{0}$, such that

$$\mathbb{E}(|\psi_{n}(K)|)\leqslant\theta\left(||s_{K,\ell}-s||_{2}^{2}+\dfrac{\overline{s}_{K,\ell}}{n}\right)+\left(\dfrac{\theta}{2}+\dfrac{2}{\theta}\right)||s_{K_{0},\ell}-s||_{2}^{2}+\mathfrak{c}_{2}\dfrac{\log(n)^{5}}{\theta n}$$

and

$$\mathbb{E}(|\psi_{n}(\widehat{K})|)\leqslant\theta\mathbb{E}\left(||s_{\widehat{K},\ell}-s||_{2}^{2}+\dfrac{\overline{s}_{\widehat{K},\ell}}{n}\right)+\left(\dfrac{\theta}{2}+\dfrac{2}{\theta}\right)||s_{K_{0},\ell}-s||_{2}^{2}+\mathfrak{c}_{2}\dfrac{\log(n)^{5}}{\theta n}.$$

Then, by Theorem 2.8,

$$\mathbb{E}(|\psi_{n}(K)|)\leqslant\dfrac{\theta}{1-\theta}\mathbb{E}(||\widehat{s}_{K,\ell}(n;.)-s||_{2}^{2})+\left(\dfrac{\theta}{2}+\dfrac{2}{\theta}\right)||s_{K_{0},\ell}-s||_{2}^{2}+\left(\dfrac{\mathfrak{c}_{2}}{\theta}+\dfrac{\mathfrak{c}_{2.8}}{1-\theta}\right)\dfrac{\log(n)^{5}}{n}$$

and

$$\mathbb{E}(|\psi_{n}(\widehat{K})|)\leqslant\dfrac{\theta}{1-\theta}\mathbb{E}(||\widehat{s}_{\widehat{K},\ell}(n;.)-s||_{2}^{2})+\left(\dfrac{\theta}{2}+\dfrac{2}{\theta}\right)||s_{K_{0},\ell}-s||_{2}^{2}+\left(\dfrac{\mathfrak{c}_{2}}{\theta}+\dfrac{\mathfrak{c}_{2.8}}{1-\theta}\right)\dfrac{\log(n)^{5}}{n}.$$

By decomposition (B.8), there exist two deterministic constants $\mathfrak{c}_{3},\mathfrak{c}_{4}>0$, not depending on $n$, $\theta$, $K$, and $K_{0}$, such that

$$\mathbb{E}(||\widehat{s}_{\widehat{K},\ell}(n;\cdot)-s||_{2}^{2})\leqslant\mathbb{E}(||\widehat{s}_{K,\ell}(n;\cdot)-s||_{2}^{2})+\mathbb{E}(|\psi_{n}(K)|)+\mathbb{E}(|\psi_{n}(\widehat{K})|)$$

$${}\leqslant\left(1+\dfrac{\theta}{1-\theta}\right)\mathbb{E}(||\widehat{s}_{K,\ell}(n;\cdot)-s||_{2}^{2})+\dfrac{\theta}{1-\theta}\mathbb{E}(||\widehat{s}_{\widehat{K},\ell}(n;.)-s||_{2}^{2})$$

$${}+\dfrac{\mathfrak{c}_{3}}{\theta}||s_{K_{0},\ell}-s||_{2}^{2}+\dfrac{\mathfrak{c}_{4}}{\theta(1-\theta)}\dfrac{\log(n)^{5}}{n}.$$

This concludes the proof.

ACKNOWLEDGMENTS

The authors want also to thank Fabienne Comte for her careful reading and advices.

FUNDING

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 811017.

About this article

Cite this article

Halconruy, H., Marie, N. Kernel Selection in Nonparametric Regression. Math. Meth. Stat. 29, 32–56 (2020). https://doi.org/10.3103/S1066530720010044

Download citation

Received: 30 October 2020
Revised: 22 November 2020
Accepted: 01 January 2021
Published: 31 August 2021
Issue Date: January 2020
DOI: https://doi.org/10.3103/S1066530720010044

Keywords:

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Kernel Selection in Nonparametric Regression

Abstract

Similar content being viewed by others

Optimal Kernel Selection for Density Estimation

Nonparametric relative regression for associated random variables

Kernel regression estimation for incomplete data with applications

1 INTRODUCTION

2 RISK BOUND

3 KERNEL SELECTION

REFERENCES

Author information

Authors and Affiliations

Corresponding authors

APPENDIX A.

A. DETAILS ON KERNELS SETS: PROOFS OF PROPOSITIONS 2.2, 2.3, 2.6, AND 3.6

A.1. Proof of Proposition 2.2

A.2. Proof of Proposition 2.3

A.3. Proof of Proposition 2.6

A.4. Proof of Proposition 3.6

B. PROOFS OF RISK BOUNDS

B.1. Preliminary Results

B.2. Proof of Proposition 2.7

B.3. Proof of Theorem 2.8

B.4. Proof of Theorem 3.2

ACKNOWLEDGMENTS

FUNDING

About this article

Cite this article

Keywords:

Navigation

Kernel Selection in Nonparametric Regression

Abstract

Similar content being viewed by others

Optimal Kernel Selection for Density Estimation

Nonparametric relative regression for associated random variables

Kernel regression estimation for incomplete data with applications

1 INTRODUCTION

2 RISK BOUND

3 KERNEL SELECTION

REFERENCES

Author information

Authors and Affiliations

Corresponding authors

APPENDIX A.

A. DETAILS ON KERNELS SETS: PROOFS OF PROPOSITIONS 2.2, 2.3, 2.6, AND 3.6

A.1. Proof of Proposition 2.2

A.2. Proof of Proposition 2.3

A.3. Proof of Proposition 2.6

A.4. Proof of Proposition 3.6

B. PROOFS OF RISK BOUNDS

B.1. Preliminary Results

B.2. Proof of Proposition 2.7

B.3. Proof of Theorem 2.8

B.4. Proof of Theorem 3.2

ACKNOWLEDGMENTS

FUNDING

About this article

Cite this article

Share this article

Keywords:

Search

Navigation