1 Introduction

The design of complex technical systems by engineers always has to take into account some kind of uncertainty. This uncertainty might occur because of lack of knowledge about future use or about properties of the materials used to build the technical system (e.g., the exact value of the damping coefficient of a spring–mass damper). In order to take this uncertainty into account, we model in the sequel the outcome Y of the technical system by a random variable. For simplicity, we restrict ourselves to the case that Y is a real-valued random variable. Thus, we are interested in properties of the distribution of Y; for example, we are interested in quantiles

$$\begin{aligned} q_{Y,\alpha } = \min \left\{ y \in \mathbb {R}\, : \, {\mathbf P}\{Y \le y\} \ge \alpha \right\} \end{aligned}$$
(1)

for \(\alpha \in (0,1)\) (which describe for \(\alpha \) close to one values which we expect to be upper bounds on the values occurring in an application), or in the density \(g_Y:\mathbb {R}\rightarrow \mathbb {R}\) of Y with respect to the Lebesgue–Borel measure, which we assume later to exist.

In the sequel, we model the lack of knowledge about the future use of the system or about properties of materials used in it by introducing an additional \(\mathbb {R}^d\)-valued random variable X, which contains values for uncertain parameters describing the system or its future use, and from which we assume either to know the distribution or are able to generate an arbitrary number of independent realizations. Furthermore, we assume that we have available a model describing the relation between X and Y by a function \(\bar{m}:\mathbb {R}^d\rightarrow \mathbb {R}\). This function \(\bar{m}\) might be constructed by using a physical model of our technical system, and in some sense \(\bar{m}(X)\) is an approximation of Y. However, as all models our model is imperfect in the sense that \(Y=\bar{m}(X)\) does not hold. This might be due to the fact that Y cannot be exactly characterized by a function of X (since X might not describe the randomness of Y completely), or since our relation between Y and X is not correctly specified by \(\bar{m}\), or because of both. So although we know \(\bar{m}\) and can generate an arbitrary number of independent copies \(X_1\), \(X_2\), ...of X, we cannot use \(\bar{m}(X_1)\), \(\bar{m}(X_2)\), ...as observations of Y, since there is an error between these values and a sample of Y.

In order to control this error, we assume that we have available \(n \in \mathbb {N}\) observations of the Y-values corresponding to the first n values of X. To formulate our prediction problem precisely, let (XY), \((X_1,Y_1)\), \((X_2,Y_2)\), ...be independent and identically distributed and let \(L_n, N_n \in \mathbb {N}\). We assume that we are given the data

$$\begin{aligned}&(X_1,Y_1),\ldots ,(X_n,Y_n),(X_{n+1}, \bar{m}(X_{n+1})), \ldots , (X_{n+L_n},\bar{m}(X_{n+L_n})), \nonumber \\&\quad X_{n+L_n+1},\ldots ,X_{n+L_n+N_n}, \end{aligned}$$
(2)

and we want to use these data in order to estimate the quantiles \(q_{Y,\alpha }\) or the density \(g_Y\) of Y (which we later assume to exist). The main difficulty in solving this problem is that the sample size n of the observations of Y (which corresponds to the number of experiments we are making with the technical system) is rather small (since these experiments are time consuming or costly).

Before we describe various existing approaches to solve this problem in the literature, we will illustrate the problem by an example. Here, we consider a demonstrator for a suspension strut, which was built at the Technische Universität Darmstadt and which serves as an academic demonstrator to study uncertainty in load distributions and the ability to control vibrations, stability and load paths in suspension struts such as aircraft landing gears. The photograph of this suspension strut and its experimental test setup is shown in Fig. 1(left); a CAD illustration of this suspension strut can be found in Fig. 1(middle).

Fig. 1
figure 1

A photograph of the demonstrator of a suspension strut and its experimental test setup (left), a CAD illustration of the suspension strut (middle) and illustration of a simplified model of the suspension strut (right)

This suspension strut consists of upper and lower structures, where the lower structure contains a spring–damper component and an elastic foot. The spring–damper component transmits the axial forces between the upper and lower structures of the suspension strut. The aim of our analysis is the analysis of the behavior of the maximum relative compression of the spring–damper component in case that the free fall height is chosen randomly. Here, we assume that the free fall heights are independent normally distributed with mean 0.05 meter and standard deviation 0.0057 meter.

We analyze the uncertainty in the maximum relative compression in our suspension strut using a simplified mathematical model of the suspension strut [cf., Fig. 1(right)], where the upper and the lower structures of the suspension strut are two lump masses m and \(m_1\), the spring–damper component is represented by a stiffness parameter k and a suitable damping coefficient b, and the foot is represented by another stiffness parameter \(k_{ef}\). Using a linear stiffness and an axiomatic damping, it is possible to compute the maximum relative compression by solving a differential equation using Runge–Kutta algorithm (cf., model a) in Mallapur and Platz (2017). Figure 2 shows \(L_n=500\) data points from the computer experiment and also \(n=20\) experimental data points. Since they do not look like they come from the same source, our computer experiment is obviously imperfect. Our aim in the sequel is to us the \(n=20\) data points from our experiments with the suspension strut together with the \(L_n=500\) data points from the computer experiments in order to analyze the uncertainty in the above-described experiments with the suspension strut. This can be done, e.g., by making some statistical inference about quantiles or the density of the maximal occurring compression in experiments with the suspension strut. Here, we do not only want to adjust for a constant shift in order to match the simulator and the experimental data closely, but we also want to take into account that the values of Y are not a deterministic function of X.

Fig. 2
figure 2

Data from \(L_n=500\) computer experiments (in black) together with data (in red) from \(n=20\) experiments with the suspension strut in Fig. 1(left panel) (color figure online)

There are various possible approaches to solve the above estimation problem. The simplest idea is to ignore the model \(\bar{m}(X)\) completely and to make inference about \(q_{Y,\alpha }\) and \(g_Y\) using only the observations

$$\begin{aligned} Y_1, \ldots , Y_n \end{aligned}$$
(3)

of Y. For example, we can estimate the quantile \(q_{Y,\alpha }\) by the plug-in estimate

$$\begin{aligned} \hat{q}_{Y,n,\alpha } = \min \left\{ y \in \mathbb {R}\, : \, \hat{G}_{Y,n}(y) \ge \alpha \right\} \end{aligned}$$
(4)

corresponding to the estimate

$$\begin{aligned} \hat{G}_{Y,n}(y) = \frac{1}{n} \sum _{i=1}^n I_{(-\infty ,y]}(Y_i) \end{aligned}$$

of the cumulative distribution function (cdf) \(G(y)={\mathbf P}\{Y \le y\}\) of Y, which result in an order statistics as an estimate of the quantile. Or we can estimate the density \(g_Y\) of Y by the well-known kernel density estimate of Rosenblatt (1956) and Parzen (1962), where we first choose a density \(K:\mathbb {R}\rightarrow \mathbb {R}\) (so-called kernel) and a so-called bandwidth \(h_n>0\) and define our estimate by

$$\begin{aligned} \hat{g}_{Y,n}(y) = \frac{1}{n \cdot h_n} \cdot \sum _{i=1}^n K \left( \frac{y-Y_i}{h_n} \right) . \end{aligned}$$

However, since the sample size n of our data (3) is rather small, this will in general not lead to satisfying results.

Another simple idea is to ignore the real data (3), and to use the model data

$$\begin{aligned} \bar{m}(X_{n+1}), \ldots , \bar{m}(X_{n+L_n}) \end{aligned}$$
(5)

as a sample of Y with additional measurement errors, and to use this sample to define quantile and density estimates as above. In this way, we estimate \(q_{Y,\alpha }\) by

$$\begin{aligned} \hat{q}_{\bar{m}(X),L_n,\alpha } = \min \left\{ y \in \mathbb {R}\, : \, \hat{G}_{\bar{m}(X),L_n}(y) \ge \alpha \right\} \end{aligned}$$
(6)

where

$$\begin{aligned} \hat{G}_{\bar{m}(X),N_n}(y) = \frac{1}{L_n} \sum _{i=1}^{L_n} I_{(-\infty ,y]}(\bar{m}(X_{n+i})), \end{aligned}$$

and we can estimate the density g of Y by

$$\begin{aligned} \hat{g}_{\bar{m}(X),L_n}(y) = \frac{1}{L_n \cdot h_{L_n}} \cdot \sum _{i=1}^{L_n} K \left( \frac{y-\bar{m}(X_{n+i})}{h_{L_n}} \right) . \end{aligned}$$

Since the function \(\bar{m}\) of our model \(\bar{m}(X)\) of Y might be costly to evaluate (e.g., in case that its values are defined as solutions of a complicated partially differential equation) and consequently \(L_n\) might not be really large, it makes sense to use in a first step the data

$$\begin{aligned} (X_{n+1},\bar{m}(X_{n+1})), \ldots , (X_{n+L_n},\bar{m}(X_{n+L_n})) \end{aligned}$$

to compute a surrogate model

$$\begin{aligned} \hat{m}_{L_n}(\cdot ) = \hat{m}_{L_n}(\cdot , (X_{n+1},\bar{m}(X_{n+1})), \ldots , (X_{n+L_n},\bar{m}(X_{n+L_n})): \mathbb {R}^d\rightarrow \mathbb {R}\end{aligned}$$

of \(\bar{m}\), and to compute in the second step the quantile and density estimates \(\hat{q}_{\hat{m}_{L_n}(X),N_n,\alpha }\) and \(\hat{g}_{\bar{m}_{L_n}(X),N_n}\) using the data

$$\begin{aligned} \hat{m}_{L_n}(X_{n+L_n+1}), \ldots , \hat{m}_{L_n}(X_{n+L_n+N_n}). \end{aligned}$$

Surrogate models have been introduced and investigated with the aid of the simulated and real data in connection with the quadratic response surfaces in Bucher and Bourgund (1990), Kim and Na (1997) and Das and Zheng (2000), in context of support vector machines in Hurtado (2004), Deheeger and Lemaire (2010) and Bourinet et al. (2011) in connection with neural networks in Papadrakakis and Lagaros (2002), and in context of kriging in Kaymaz (2005) and Bichon et al. (2008).

Under the assumption that we have \(\bar{m}(X)=Y\), the above estimates have been theoretically analyzed in Devroye et al. (2013), Bott et al. (2015), Felber et al. (2015a, b), Enss et al. (2016) and Kohler and Krzyżak (2018).

However, in practice there usually will be an error in the approximation of Y by \(\bar{m}(X)\), and it is unclear how this error influences the error of the quantile and density estimates.

Kohler et al. (2016) and Kohler and Krzyżak (2016) used the data

$$\begin{aligned} (X_1,Y_1), \ldots , (X_n,Y_n) \end{aligned}$$

obtained by experiments with the technical system in order to control this error. In particular, confidence intervals for quantiles and confidence bands for densities are derived there. Wong et al. (2017) used the above data of the technical system in order to calibrate a computer model and estimated the error of the resulting model by using bootstrap. Kohler and Krzyżak (2017) used these data in order to improve the surrogate model and analyzed the density estimate based on the improved surrogate model.

Kohler et al. (2016) and Kohler and Krzyżak (2016, 2017) try to approximate Y by some function of X and make statistical inference on the basis of this approximation. Wong et al. (2017) do this similarly, but take into account additional measurement errors of the y-values. The basic new idea in this article is to estimate instead a regression model

$$\begin{aligned} Y=\bar{m}(X)+\bar{\epsilon }, \end{aligned}$$
(7)

where

$$\begin{aligned} \bar{\epsilon }=Y-\bar{m}(X) \end{aligned}$$

is the residual error of our model \(\bar{m}(X)\), which is not related to measurement errors but instead is due to the fact that an approximation of Y by a function of X cannot be perfect. In this model, we estimate simultaneously \(\bar{m}\) and the conditional distribution \({\mathbf P}_{\bar{\epsilon } | X=x}\) of \(\bar{\epsilon }\) given \(X=x\). As soon as we have available estimates \(\hat{m}_{L_n}\) and \({\mathbf P}_{\bar{\epsilon } | X=x}\) for both, we generate data

$$\begin{aligned} \hat{m}_{L_n}(X_{n+L_n+1})+\hat{\epsilon }(X_{n+L_n+1}), \ldots , \hat{m}_{L_n}(X_{n+L_n+N_n})+\hat{\epsilon }(X_{n+L_n+N_n}) \end{aligned}$$

(where \(\hat{\epsilon }(x)\) has the distribution \(\hat{{\mathbf P}}_{\bar{\epsilon } | X=x}\) conditioned on \(X=x\)) and use these data to define corresponding quantile estimates.

We assume in the sequel that the conditional distribution of \(\bar{\epsilon }\) given X has a density with respect to the Lebesgue–Borel measure. In order to estimate this conditional density, we use the well-known conditional kernel density estimate introduced already in Rosenblatt (1969). Concerning existing results on conditional density estimates, we refer to Fan et al. (1996), Fan and Yim (2004), Gooijer and Zerom (2003), Efromovich (2007), Bott and Kohler (2016, 2017) and the literature cited therein.

Our main result, which is formulated in Sect. 3, shows that our newly proposed quantile estimates achieve under suitable regularity condition rates of convergence, which are faster than the rates of convergence of the estimates (4), (6) and the modifications of (6) using \(\hat{m}_{L_n}\) instead of \(\bar{m}\). Furthermore, we show with simulated data that in the situations which we consider in our simulations this effect also occurs for finite sample sizes, and illustrate the usefulness of our newly proposed method by applying it to a spring–damper system introduced earlier.

Throughout this paper, we use the following notation: \(\mathbb {N}\), \(\mathbb {N}_0\) and \(\mathbb {R}\) are the sets of positive integers, nonnegative integers and real numbers, respectively. Let \(p=k+\beta \) for some \(k \in \mathbb {N}_0\) and \(0 < \beta \le 1\), and let \(C>0\). A function \(m:\mathbb {R}^d \rightarrow \mathbb {R}\) is called (pC)-smooth, if for every \(\alpha =(\alpha _1, \ldots , \alpha _d) \in \mathbb {N}_0^d\) with \(\sum _{j=1}^d \alpha _j = k\) the partial derivative \(\frac{\partial ^k m}{\partial x_1^{\alpha _1}\ldots \partial x_d^{\alpha _d}}\) exists and satisfies

$$\begin{aligned} \left| \frac{\partial ^k m}{\partial x_1^{\alpha _1}\ldots \partial x_d^{\alpha _d}}(x)-\frac{\partial ^k m}{\partial x_1^{\alpha _1} \ldots \partial x_d^{\alpha _d}}(z)\right| \le C \cdot \Vert x-z \Vert ^\beta \end{aligned}$$

for all \(x,z \in \mathbb {R}^d\). If X is a random variable, then \({\mathbf P}_X\) is the corresponding distribution, i.e., the measure associated with the random variable. If (XY) is a \(\mathbb {R}^d\times \mathbb {R}\)-valued random variable and \(x \in \mathbb {R}^d\), then \({\mathbf P}_{Y|X=x}\) denotes the conditional distribution of Y given \(X=x\). Let \(D \subseteq \mathbb {R}^d\) and let \(f:\mathbb {R}^d \rightarrow \mathbb {R}\) be a real-valued function defined on \(\mathbb {R}^d\). We write \(x = \arg \min _{z \in D} f(z)\) if \(\min _{z \in {\mathcal {D}}} f(z)\) exists and if x satisfies

$$\begin{aligned} x \in D \quad \text{ and } \quad f(x) = \min _{z \in {\mathcal {D}}} f(z). \end{aligned}$$

For \(x \in \mathbb {R}^d\) and \(r >0\), we denote the (closed) ball with center x and radius r by \(S_r(x)\). If A is a set, then \(I_A\) is the indicator function corresponding to A, i.e., the function which takes on the value 1 on A and is zero elsewhere. For \(A \subseteq \mathbb {R}\), we denote the infimum of A by \(\inf A\), where we use the convention \(\inf \emptyset = \infty \). If \(x \in \mathbb {R}\), then we denote the smallest integer greater than or equal to x by \(\lceil x \rceil \).

The outline of this paper is as follows: In Sect. 2, the construction of the newly proposed quantile estimate is explained. The main results are presented in Sect. 3 and proven in Sect. 5. The finite sample size performance of our estimates is illustrated in Sect. 4 by applying it to simulated and real data.

2 Definition of the estimate

In the sequel, we assume that we are given data (2), where \(n,L_n,N_n \in \mathbb {N}\), the \(\mathbb {R}^d\times \mathbb {R}\) valued random variables (XY), \((X_1,Y_1)\), \((X_2,Y_2)\), ...are independent and identically distributed, and where \(\bar{m}:\mathbb {R}^d\rightarrow \mathbb {R}\) is measurable. Our aim is to estimate the quantile \(q_{Y,\alpha }\) defined in (1) for some \(\alpha \in (0,1)\).

To do this, we start by constructing an estimate of \(\bar{m}\). For this, we use the data

$$\begin{aligned} (X_{n+1},\bar{m}(X_{n+1})), \ldots , (X_{n+L_n},\bar{m}(X_{n+L_n})) \end{aligned}$$

and define the penalized least squares estimates of \(\bar{m}\) by

$$\begin{aligned} \tilde{m}_{L_n}(\cdot )= \arg \min _{f \in W^k(\mathbb {R}^d)} \left( \frac{1}{L_n} \sum _{i=1}^{L_n} \left( \bar{m}(X_{n+i}) - f(X_{n+i}) \right) ^2 + \lambda _{L_n} \cdot J_k^2(f) \right) \end{aligned}$$

and

$$\begin{aligned} \hat{m}_{L_n}(x) = T_{\beta _{L_n}}( \tilde{m}_{L_n}(x)) \quad (x \in \mathbb {R}^d) \end{aligned}$$

for some \(\beta _{L_n}>0\), where \(k \in \mathbb {N}\) with \(2 k > d\),

$$\begin{aligned} J_k^2(f)= \sum _{\alpha _1, \ldots , \alpha _d \in \mathbb {N}, \, \alpha _1+ \cdots + \alpha _d = k} \frac{k!}{\alpha _1 ! \cdot \cdots \cdot \alpha _d!} \int _{\mathbb {R}^d} \left| \frac{\partial ^k f}{\partial x_1^{\alpha _1}\ldots \partial x_d^{\alpha _d}}(x)\right| ^2 \mathrm{d}x \end{aligned}$$

is a penalty term penalizing the roughness of the estimate, \(W^k (\mathbb {R}^d)\) denotes the Sobolev space

$$\begin{aligned} \left\{ f \, : \, \frac{\partial ^k f}{\partial x_1^{\alpha _1}\ldots \partial x_d^{\alpha _d}} \in L_2 (\mathbb {R}^d) \text{ for } \text{ all } \alpha _1, \ldots , \alpha _d \in \mathbb {N} \text{ with } \alpha _1+ \cdots + \alpha _d = k \right\} , \end{aligned}$$

and where \(\lambda _{L_n}>0\), \(T_L(x)=\max \{-L,\min \{L,x\}\}, L>0\) is the truncation operator and \(L_2(\mathbb {R}^d)\) denotes square integrable functions on \(\mathbb {R}^d\). The condition \(2k>d\) implies that the functions in \(W^k(\mathbb {R}^d)\) are continuous and hence the value of a function at a point is well defined.

Then, we compute the residuals of this estimate on the data \((X_1,Y_1), \ldots , (X_n,Y_n)\), i.e., we set

$$\begin{aligned} \hat{\epsilon }_i =Y_i - \hat{m}_{L_n}(X_i) \quad (i=1, \ldots ,n). \end{aligned}$$
(8)

We use these residuals in order to estimate the conditional distribution of \(\bar{\epsilon }=Y-\bar{m}(X)\) given \(X=x\). Here, we assume that this distribution has a density and estimate this density by applying a conditional density estimator to the data

$$\begin{aligned} (X_1,Y_1-\hat{m}_{L_n}(X_1)), \ldots , (X_n,Y_n-\hat{m}_{L_n}(X_n)). \end{aligned}$$

To do this, we set \(G=I_{[-1,1]}\) and let \(K:\mathbb {R}\rightarrow \mathbb {R}\) be a density, let \(h_n,H_n>0\) and set

$$\begin{aligned} \hat{g}_{\hat{\epsilon }|X}(y,x) = \frac{ \sum _{i=1}^n G \left( \frac{\Vert x-X_i\Vert }{H_n} \right) \cdot K \left( \frac{y-(Y_i-\hat{m}_{L_n}(X_i))}{h_n} \right) }{ h_n \cdot \sum _{j=1}^n G \left( \frac{\Vert x-X_j\Vert }{H_n} \right) }. \end{aligned}$$
(9)

Once we have constructed the estimates \(\hat{m}_n\) and \(\hat{g}_{\hat{\epsilon }|X}\), we construct a sample of size \(N_n\) of the distribution of

$$\begin{aligned} \hat{m}_{L_n}(X)+\hat{\epsilon }(X), \end{aligned}$$

where the random variable \(\hat{\epsilon }(X)\) has the conditional density \(\hat{g}_{\hat{\epsilon }|X}(\cdot ,X)\) given X, and estimate the quantile by the empirical quantile corresponding to this sample. To do this, we use an inversion method: We define for \(u \in (0,1)\) and \(x \in \mathbb {R}^d\)

$$\begin{aligned} F_n^{-1}(u,x) = \inf \left\{ y \in \mathbb {R}\, : \, \int _{-\infty }^y \hat{g}_{\hat{\epsilon }|X}(z,x) \, \mathrm{d}z \ge u \right\} , \end{aligned}$$

choose independent and identically distributed random variables \(U_1\), \(U_2\), ..., with uniform distribution on (0, 1), such that they are independent of all other previously introduced random variables, and set

$$\begin{aligned} \hat{\epsilon }_{n+L_n+i}=F_n^{-1}(U_i,X_{n+L_n+i}) \quad (i=1, \ldots , N_n). \end{aligned}$$

This implies in case

$$\begin{aligned} \int _\mathbb {R}\hat{g}_{\hat{\epsilon }|X}(z,X_{n+L_n+i}) \, \mathrm{d}z=1 \end{aligned}$$

that \(\hat{\epsilon }_{n+L_n+i}\) conditioned on \(X_{n+L_n+i}\) has the density \(\hat{g}_{\hat{\epsilon }|X}(\cdot ,X_{n+L_n+i})\)).

With these random variables, we estimate the cdf of Y by setting

$$\begin{aligned} \hat{Y}_{n+L_n+i}= \hat{m}_{L_n}(X_{n+L_n+i}) +\hat{\epsilon }_{n+L_n+i} \quad (i=1,\ldots ,N_n), \end{aligned}$$

and

$$\begin{aligned} \hat{G}_{\hat{Y},N_n}(y) = \frac{1}{N_n} \sum _{i=1}^{N_n} I_{\{ \hat{Y}_{n+L_n+i} \le y\}}, \end{aligned}$$

and use the corresponding plug-in estimate

$$\begin{aligned} \hat{q}_{\hat{Y},N_n,\alpha } = \min \left\{ y \in \mathbb {R}\, : \, \hat{G}_{\hat{Y},N_n}(y) \ge \alpha \right\} \end{aligned}$$

as an estimate of \(q_{Y,\alpha }\).

Remark 1

Since there is no measurement error in the observation from the simulator \(\bar{m}\), we could also use an interpolation estimate (instead of the penalized least squares estimate \(\hat{m}_{L_n}\)) in order to estimate \(\bar{m}\). For example, in this context we could apply the spline estimate from Bauer et al. (2017).

3 Main result

Before we formulate our main result, we summarize some important notations.

Y:

Outcome of the experience

X:

Parameters of the experiment

\(\bar{m}\):

Function \(\bar{m}:\mathbb {R}^d\rightarrow \mathbb {R}\) describing the computer model

\(\bar{\epsilon }\):

Residual of the computer model

\(g_{\bar{\epsilon }|X}\):

Conditional density of \(\bar{\epsilon }\).

Our main result is the following theorem, which gives a nonasymptotic bound on the error of our quantile estimate.

Theorem 1

Let (XY), \((X_1,Y_1)\), \((X_2,Y_2)\), ...be independent and identically distributed \(\mathbb {R}^d\times \mathbb {R}\)-valued random variables, and let \(\bar{m}:\mathbb {R}^d\rightarrow \mathbb {R}\) be a measurable function. Let \(g_{\bar{\epsilon }|X} : \mathbb {R}\times \mathbb {R}^d\rightarrow \mathbb {R}\) be a measurable function with the property that \(g_{\bar{\epsilon }|X}(\cdot ,X)\) is a density of the conditional distribution of \(\bar{\epsilon }=Y-\bar{m}(X)\) given X. Assume that the following regularity conditions hold for some \(C_1,C_2>0\), \(r,s \in (0,1]\):

  1. (A1)

    \(| g_{\bar{\epsilon }|X}(y,x_1) - g_{\bar{\epsilon }|X}(y,x_2)| \le C_1 \cdot \Vert x_1-x_2\Vert ^r\) for all \(x_1,x_2 \in \mathbb {R}^d, y \in \mathbb {R}\),

  2. (A2)

    \(| g_{\bar{\epsilon }|X}(u,x)-g_{\bar{\epsilon }|X}(v,x)| \le C_2 \cdot |u-v|^s\) for all \(u,v \in \mathbb {R}, x \in \mathbb {R}^d\).

Let \(n, L_n , N_n \in \mathbb {N}\) and assume \(N_n^2 \ge 8 \cdot \log n\). For \(\alpha \in (0,1)\) define the estimate \(\hat{q}_{\hat{Y},N_n,\alpha }\) of the quantile \(q_{Y,\alpha }\) [given by (1)] as in Sect. 2, where \(h_n, H_n>0\), G is the naive kernel and where \(K:\mathbb {R}^d\rightarrow \mathbb {R}\) is a bounded and symmetric density, which decreases monotonically on \(\mathbb {R}_+\) and satisfies

$$\begin{aligned} \int K^2(z) \, \mathrm{d}z< \infty \quad \text{ and } \quad \int K(z) \cdot |z|^s \mathrm{d}z < \infty . \end{aligned}$$

Let \(\gamma _n>0\), assume \(2 \cdot \sqrt{d} \cdot \gamma _n \ge H_n\), and for \(x \in \mathbb {R}^d\) let \(-\infty< a_n(x) \le b_n(x) < \infty \). Set

$$\begin{aligned} \epsilon _n= & {} 4 \cdot {\mathbf E}\int _{\mathbb {R}^d} |\hat{m}_{L_n}(x)-\bar{m}(x)|^2 {\mathbf P}_X(\mathrm{d}x),\\ \delta _n= & {} \frac{8 \cdot K(0) \cdot (4 \cdot \sqrt{d})^d\gamma _n^d}{ h_n \cdot H_n^d } \cdot {\mathbf E}\int _{\mathbb {R}^d} | \hat{m}_{L_n}(x)-\bar{m}(x)| \, {\mathbf P}_X (\mathrm{d}x)\\&+\,8 \cdot c_1 \cdot \left( \sqrt{ \frac{ \int _{[-\gamma _n,\gamma _n]^d} |b_n(x)-a_n(x)| \, {\mathbf P}_X(\mathrm{d}x) \cdot \gamma _n^d }{ n \cdot H_n^d \cdot h_n }}\right. \\&\left. +\,\frac{4 \cdot \gamma _n^d}{n \cdot H_n^d} + 4 \cdot \int _{[-\gamma _n,\gamma _n]^d} |b_n(x)-a_n(x)| \, {\mathbf P}_X(\mathrm{d}x) \cdot \left( C_1 \cdot H_n^r + C_2 \cdot h_n^s \right) \right) \\&+\,8 \cdot {\mathbf P}_X(\mathbb {R}^d{\setminus } [-\gamma _n,\gamma _n]^d) + 8 \cdot \int _{[-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]^c} g_{\bar{\epsilon }|X}(y,x) \, \mathrm{d}y \, {\mathbf P}_X (\mathrm{d}x) \end{aligned}$$

where

$$\begin{aligned} c_1= \max \left\{ 1, \sqrt{ 2 \cdot (4 \cdot \sqrt{d})^d \cdot \int K^2(z) \, \mathrm{d}z} , (4 \cdot \sqrt{d})^d, \int K(z) \cdot |z|^s \mathrm{d}z \right\} , \end{aligned}$$

and set

$$\begin{aligned} \eta _n=4 \cdot {\mathbf P}\left\{ X \in \mathbb {R}^d{{\setminus }} [-\gamma _n,\gamma _n]^d\right\} + 4 \cdot \frac{ (4 \cdot \sqrt{d})^d \cdot \gamma _n^d}{n \cdot H_n^d}. \end{aligned}$$

Let \(e_n>0\) and assume that the cdf of Y satisfies

$$\begin{aligned}&G_{Y}( q_{Y,\alpha }+e_n-( (\log n) \cdot \epsilon _n)^{1/3}) - G_Y(q_{Y,\alpha }) >( (\log n) \cdot \epsilon _n)^{1/3}\nonumber \\&\quad +\,\sqrt{\frac{\log N_n}{N_n}} + (\log n) \cdot \delta _n + (\log n) \cdot \eta _n \end{aligned}$$
(10)

and

$$\begin{aligned}&G_Y(q_{Y,\alpha }) - G_{Y}( q_{Y,\alpha }-e_n+( (\log n) \cdot \epsilon _n)^{1/3})>( (\log n) \cdot \epsilon _n)^{1/3}\nonumber \\&\quad +\,\sqrt{\frac{\log N_n}{N_n}} + (\log n) \cdot \delta _n + (\log n) \cdot \eta _n. \end{aligned}$$
(11)

Then,

$$\begin{aligned}&{\mathbf P}\left\{ \left| \hat{q}_{\hat{Y},N_n,\alpha } - q_{Y,\alpha } \right| > e_n\right\} \le \frac{1}{\log n}. \end{aligned}$$

Remark 2

Assume that Y has a density \(g_Y:\mathbb {R}\rightarrow \mathbb {R}\) with respect to the Lebesgue measure which satisfies for some \(c_2,c_3>0\)

$$\begin{aligned} g_Y(y)>c_2 \quad \text{ for } \text{ all } \quad y \in [q_{Y,\alpha }-c_3,q_{Y,\alpha }+c_3]. \end{aligned}$$
(12)

Assume that positive \(\epsilon _n, \delta _n, \eta _n\) defined in Theorem 1 satisfy

$$\begin{aligned} \left( 1 + \frac{1}{c_2} \right) \cdot \left( ( (\log n) \cdot \epsilon _n)^{1/3} + (\log n) \cdot \delta _n + (\log n) \cdot \eta _n + \sqrt{\frac{\log N_n}{N_n}} \right) \le c_3, \end{aligned}$$
(13)

and set

$$\begin{aligned} e_n=\left( 1 + \frac{1}{c_2} \right) \cdot \left( ( (\log n) \cdot \epsilon _n)^{1/3} + (\log n) \cdot \delta _n + (\log n) \cdot \eta _n + \sqrt{\frac{\log N_n}{N_n}} \right) . \end{aligned}$$

Then, (10) and (11) hold, and consequently, we can conclude from Theorem 1

$$\begin{aligned}&{\mathbf P}\left\{ \left| \hat{q}_{\hat{Y},N_n,\alpha } - q_{Y,\alpha } \right| \right. \\&\quad \left. >\,\left( 1 + \frac{1}{c_2} \right) \cdot \left( ( (\log n) \cdot \epsilon _n)^{1/3} + (\log n) \cdot \delta _n + (\log n) \cdot \eta _n + \sqrt{\frac{\log N_n}{N_n}} \right) \right\} \\&\quad \le \,\frac{1}{\log n}. \end{aligned}$$

Indeed, the assumptions above imply

$$\begin{aligned} 0 \le e_n -( (\log n) \cdot \epsilon _n)^{1/3} \le c_3. \end{aligned}$$

Consequently, because of the assumption on the density of Y we have

$$\begin{aligned} G_{Y}( q_{Y,\alpha }+e_n-( (\log n) \cdot \epsilon _n)^{1/3}) - G_{Y}( q_{Y,\alpha }) \ge c_2 (e_n-( (\log n) \cdot \epsilon _n)^{1/3})) . \end{aligned}$$

By the definition of \(e_n\), we have

$$\begin{aligned}&c_2 (e_n-( (\log n) \cdot \epsilon _n)^{1/3}) - ( (\log n) \cdot \epsilon _n)^{1/3} - \sqrt{\frac{\log N_n}{N_n}}\\&\quad -\,(\log n) \cdot \delta _n - (\log n) \cdot \eta _n > 0, \end{aligned}$$

which implies (10). In the same way, one can show (11).

Remark 3

The rate of convergence in Remark 1 depends on \(\epsilon _n\), \(\delta _n\) and \(\eta _n\). Here, \(\epsilon _n\) is by its definition related to the \(L_2\) error of \(\hat{m}_{L_n}\). It follows from the proof of Theorem 1 (cf., Lemma 1) that \(\delta _n\) is related to the \(L_1\) error of the conditional density estimate \(\hat{g}_{\hat{\epsilon }|X}\) and \(\eta _n\) is related to the probability that this estimate is not a density.

Remark 4

Set \(\gamma _n=\log (n)\). Assume that the conditional distribution \(\bar{\epsilon }\) given \(X=x\) has compact support contained in \([a_n(x),b_n(x)]\), which implies that we have

$$\begin{aligned} \int _{[-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]^c} g_{\bar{\epsilon }|X}(y,x) \, \mathrm{d}y \, {\mathbf P}_X (\mathrm{d}x) =0. \end{aligned}$$

Under suitable smoothness assumptions on \(\bar{m}:\mathbb {R}^d\rightarrow \mathbb {R}\), suitable assumptions on the tails of \(\Vert X\Vert \) and in case that \(\lambda _{L_n}\) and \(\beta _{L_n}\) are suitably chosen it is well known that the expected \(L_2\) error of the smoothing spline estimate satisfies

$$\begin{aligned} {\mathbf E}\int _{\mathbb {R}^d} | \hat{m}_{L_n}(x)-\bar{m}(x)|^2 {\mathbf P}_X(\mathrm{d}x) \le c_4 \cdot \left( \frac{\log L_n}{L_n} \right) ^{2k/(2k+d)} \end{aligned}$$

(cf., e.g., Theorem 2 in Kohler and Krzyżak (2017)). Thus, for \(L_n\) large compared to n and under suitable assumptions on the tails of \(\Vert X\Vert \) it follows from Remark 2 that the error of our quantile estimate in Theorem 1 is up to some constant given by

$$\begin{aligned}&(\log n) \cdot \left( \sqrt{ \frac{ \int _{[-\log (n),\log (n)]^d} |b_n(x)-a_n(x)| \, {\mathbf P}_X(\mathrm{d}x) \cdot (\log n)^d }{ n \cdot H_n^d \cdot h_n }} + \frac{(\log n)^d}{n \cdot H_n^d} \nonumber \right. \\&\quad \left. +\,\int _{[-\log (n),\log (n)]^d} |b_n(x)-a_n(x)| \, {\mathbf P}_X(\mathrm{d}x) \cdot \left( C_1 \cdot H_n^r + C_2 \cdot h_n^s \right) \right) . \end{aligned}$$
(14)

Minimizing the expression above with respect to \(h_n\) and \(H_n\) as in the proof of Corollary 2 in Bott and Kohler (2017) shows that in case of a suitable choice of the bandwidths \(h_n,H_n>0\) the error of our quantile estimate in Theorem 1 is up to some logarithmic factor given by the minimum of

$$\begin{aligned}&C_1^{\frac{d}{r+d}}\cdot \left( \int _{[-\log (n),\log (n)]^d} |b_n(x)-a_n(x)| \, {\mathbf P}_X(\mathrm{d}x) \right) ^{\frac{d}{r+d}} \cdot n^{-\frac{r}{r+d}}\\&\quad +\,C_1^{\frac{ds}{(r+d)(2s+1)}}\cdot C_2^{\frac{1}{2s+1}} \left( \int _{[-\log (n),\log (n)]^d} |b_n(x)-a_n(x)| \, {\mathbf P}_X(\mathrm{d}x) \right) ^{\frac{(r+d)(s+1)+ds}{(r+d)(2s+1)}}\\&\quad \cdot \,n^{-\frac{rs}{(r+d)(2s+1)}} \end{aligned}$$

and

$$\begin{aligned}&C_1^{\frac{(2s+1)d}{r(2s+1)+ds}}\cdot C_2^{-\frac{d}{r(2s+1)+ds}}\nonumber \\&\quad \cdot \,\left( \int _{[-\log (n),\log (n)]^d} |b_n(x)-a_n(x)| \, {\mathbf P}_X(\mathrm{d}x) \right) ^{\frac{ds}{r(2s+1)+ds}} \cdot n^{-\frac{r(2s+1)}{r(2s+1)+ds}}\nonumber \\&\quad +\,C_1^{\frac{ds}{r(2s+1)+ds}}\cdot C_2^{\frac{r}{r(2s+1)+ds}}\nonumber \\&\quad \cdot \,\left( \int _{[-\log (n),\log (n)]^d} |b_n(x)-a_n(x)| \, {\mathbf P}_X(\mathrm{d}x) \right) ^{\frac{r(s+1)+ds}{r(2s+1)+ds}} \cdot n^{-\frac{rs}{r(2s+1)+ds}}. \end{aligned}$$

Assume that the distribution of (XY) and \(\bar{m}\) change with increasing sample size and that \(|b_n(x)-a_n(x)|\) is the diameter of the support of the conditional distribution of \(\bar{\epsilon }\) given \(X=x\). Then, the error of our quantile estimate can converge to zero arbitrarily fast in case that \(\int _{[-\log (n),\log (n)]^d} |b_n(x)-a_n(x)| \, {\mathbf P}_X(\mathrm{d}x)\) goes to zero fast enough. In particular, the rate of convergence of our quantile estimate might be much better than the well-known rate of convergence \(1/\sqrt{n}\) of the simple quantile estimate (4), and in case of imperfect models, it will also be better than the rate of convergence of the surrogate quantile estimate.

Remark 5

The results in Remark 4 require that the parameters of the estimates (e.g., \(h_n\) and \(H_n\)) are suitably chosen. A data-dependent way of choosing these parameters in an application will be proposed in the next section, and by using simulated data, it will be shown that in this case our newly proposed estimates outperform the other estimates for finite sample size in the situations which we consider there.

4 Application to simulated and real data

In this section, we illustrate the finite sample size performance of our estimates by applying them to simulated and real data. We start with an application to simulated data, where we compare the simple order statistics estimate (est. 1) defined by (4) and a surrogate quantile estimate (est. 2) defined by (6) (where we replace \(\bar{m}\) by \(\hat{m}_{L_n}\) and evaluate this function on \(N_n\)x-values) with our newly proposed estimate based on estimation of the conditional density (est. 3) as defined in Sect. 2.

In the implementation of est. 2 and est. 3, we use thin plate splines (with smoothing parameter chosen by generalized cross-validation) as implemented by the routine Tps() of R in order to estimate a surrogate model for our computer experiment. Here, the implementation of the surrogate quantile estimate est. 2 computes a sample of size \(N_n=100{,}000\) of \(\hat{m}_{L_n}(X)\) and estimates the quantile by the corresponding order statistics.

In the implementation of our newly proposed est. 3, we use the naive kernel \(G(x)=I_{[-1,1]}(x)\) and the Epanechnikov kernel \(K(y)=(3/4)\cdot (1-y^2)_+\) for the conditional density estimate

$$\begin{aligned} \hat{g}_{\hat{\epsilon }|X}(y,x) = \frac{ \sum _{i=1}^n G \left( \frac{\Vert x-X_i\Vert }{H} \right) \cdot K \left( \frac{y-(Y_i-\hat{m}_{L_n}(X_i))}{h} \right) }{ h \cdot \sum _{j=1}^n G \left( \frac{\Vert x-X_j\Vert }{H} \right) }. \end{aligned}$$

Here, the bandwidths h and H are chosen in a data-dependent way from the sets

$$\begin{aligned} {\mathcal {P}}_h=\left\{ 2 \cdot 2^{-l} \cdot \hbox {IQR}(Y_1-\hat{m}_{L_n}(X_1),\ldots ,Y_n-\hat{m}_{L_n}(X_n)) \, : \, l \in \{0,1,\ldots ,4\} \right\} \end{aligned}$$

and

$$\begin{aligned} {\mathcal {P}}_H=\left\{ 2 \cdot 2^{-l} \cdot \hbox {IQR}(X_1,\ldots ,X_n) \, : \, l \in \{0,1,\ldots ,4\} \right\} , \end{aligned}$$

where IQR denotes an interquartile range, i.e., the distance between 25th and 75th percentiles. To do this, we use the famous combinatorial method for bandwidth selection of the kernel density estimate introduced in Devroye and Lugosi (2001), which aims at choosing a bandwidth which minimizes the \(L_1\) error. Here, we apply a variant of this method for conditional density estimation introduced and described in Bott and Kohler (2016). To do this, we choose the bandwidths by minimizing

$$\begin{aligned}&\mathop {\max }_{\begin{array}{c} h_1,h_2 \in {\mathcal {P}}_h, \\ H_1,H_2 \in {\mathcal {P}}_H \end{array}} \left| \frac{1}{n_t} \sum _{i=n_l+1}^{n} \int _{A_i(h_1,H_1,h_2,H_2)} \hat{g}_{\hat{\epsilon }|X}^{(n_l,(h,H)}(y,X_i) \, \mathrm{d}y\right. \\&\quad \left. -\, \frac{1}{n_t} \cdot \sum _{i=n_l+1}^n I_{A_i(h_1,H_1,h_2,H_2)}(Y_i) \right| \end{aligned}$$

with respect to \(h \in {\mathcal {P}}_h\) and \(H \in {\mathcal {P}}_H\), where \(n_l=\lfloor n/2 \rfloor \), \(n_t=n-n_l\),

$$\begin{aligned} \hat{g}_{\hat{\epsilon }|X}^{(n_l,(h,H))}(y,x) = \frac{ \sum _{i=1}^{n_l} G \left( \frac{\Vert x-X_i\Vert }{H} \right) \cdot K \left( \frac{y-(Y_i-\hat{m}_{L_n}(X_i))}{h} \right) }{ h \cdot \sum _{j=1}^{n_l} G \left( \frac{\Vert x-X_j\Vert }{H} \right) } \end{aligned}$$

and

$$\begin{aligned} A_i(h_1,H_1,h_2,H_2) = \left\{ y \in \mathbb {R}\, : \, \hat{g}_{\hat{\epsilon }|X}^{(n_l,(h_1,H_1))}(y,X_i) >\hat{g}_{\hat{\epsilon }|X}^{(n_l,(h_2,H_2))}(y,X_i) \right\} . \end{aligned}$$

In the implementation of this method, we approximate the integral

$$\begin{aligned} \int _{A_i(h_1,H_1,h_2,H_2)} \hat{g}_{\hat{\epsilon }|X}^{(n_l,(h,H)}(y,X_i) \, \mathrm{d}y \end{aligned}$$

by a Rieman sum based on an equidistant grid of

$$\begin{aligned}&\left[ \min \{Y_1-\hat{m}_{L_n}(X_1), \ldots , Y_n-\hat{m}_{L_n}(X_n)\} -\max _{h \in {\mathcal {P}}_h} h,\right. \\&\quad \left. \max \{Y_1-\hat{m}_{L_n}(X_1), \ldots , Y_n-\hat{m}_{L_n}(X_n)\} + \max _{h \in {\mathcal {P}}_h} h \right] \end{aligned}$$

consisting of 200 grid points (which enables an “efficient” implementation of the above minimization problem by first computing of \(\hat{g}_{\hat{\epsilon }|X}^{(n_l,(h,H)}(y,X_i)\) for all grid points y, all \(h \in {\mathcal {P}}_h\), all \(H \in {\mathcal {P}}_H\) and all \(i=n_l+1, \ldots , n\)). After the computation of \(\hat{g}_{\epsilon |X}\), we use the inversion method to generate random variables with the conditional density \(\hat{g}_{\epsilon |X}(\cdot ,X_i)\). Here, we do not have to consider values outside of the above interval, since our density estimate is zero outside of this interval. In order to implement the inversion method, we discretize the corresponding conditional cumulative distribution function

$$\begin{aligned} \hat{G}_{\hat{\epsilon }|X}(y,X_i)= & {} \int _{-\infty }^y \hat{g}_{\hat{\epsilon }|X}(z,X_i) \, \mathrm{d}z\\= & {} \frac{ \sum _{i=1}^{n_l} G \left( \frac{\Vert x-X_i\Vert }{H} \right) \cdot \int _{-\infty }^y K \left( \frac{z-(Y_i-\hat{m}_{L_n}(X_i))}{h} \right) \, \mathrm{d}z }{ h \cdot \sum _{j=1}^{n_l} G \left( \frac{\Vert x-X_j\Vert }{H} \right) } \end{aligned}$$

by considering only its values on an equidistant grid of

$$\begin{aligned}&\left[ \min \{Y_1-\hat{m}_{L_n}(X_1), \ldots , Y_n-\hat{m}_{L_n}(X_n)\} - h,\right. \\&\quad \left. \max \{Y_1-\hat{m}_{L_n}(X_1), \ldots , Y_n-\hat{m}_{L_n}(X_n)\} + h \right] \end{aligned}$$

consisting of 1000 points, and by approximating the above integral by a Rieman sum corresponding to this grid. This enables again an “efficient” computation of the values of the conditional empirical cumulative distribution function by computing in advance

$$\begin{aligned} K \left( \frac{z-(Y_i-\hat{m}_{L_n}(X_i))}{h} \right) \end{aligned}$$

for all grid points z and all \(i=1, \ldots ,n\). Using so computed values of the random variables, we compute a sample of size \(N_n=100{,}000\) of Y and estimate the quantile by the corresponding order statistics.

We compare the above three estimates in the regression model

$$\begin{aligned} Y=m(X) + \epsilon , \end{aligned}$$

where X is a standard normally distributed random variable,

$$\begin{aligned} m(x)=\exp (x) \quad (x \in \mathbb {R}) \end{aligned}$$

and the conditional distribution of \(\epsilon \) given X is normally distributed with mean zero and standard deviation

$$\begin{aligned} \sigma (X) = \sigma \cdot \left( 0.25 + X \cdot (1-X) \right) . \end{aligned}$$

Here, \(\sigma >0\) is a parameter of our distribution for which we allow the values 0.5, 1 and 2. Furthermore, we assume that our simulation model is based on the function

$$\begin{aligned} \bar{m}(x)=m(x) - \delta = \exp (x) - \delta \quad (x \in \mathbb {R}), \end{aligned}$$

where \(\delta \in \mathbb {R}\) is the constant model error of our model for which we consider the values 0 (i.e., no error) and 1 (i.e., negative error). Here, we consider a negative value for the model error, since the surrogate quantile estimate tends to underestimate the quantile in the above example, so that a positive error might accidentally improve the surrogate quantile estimate.

We apply our estimates to samples of size \(n \in \{20, 50, 100\}\) of (XY) and \(L_n=500\) of \((X, \bar{m}(X))\), and use them to estimate quantiles of order \(\alpha =0.95\) and \(\alpha =0.99\).

In order to judge the errors of our quantile estimate, we use a simple order statistics with sample size 1, 000, 000 applied to a sample of Y as a reference value for the (unknown) quantile \(q_{Y,\alpha }\) and compute the relative errors

$$\begin{aligned} \frac{ | \hat{q}_{Y,\alpha } -q_{Y,\alpha }|}{ q_{Y,\alpha } }. \end{aligned}$$

Of course, our estimates \(\hat{q}_{Y,\alpha }\) and hence also the above relative errors depend on the random samples selected above, and hence are random. Therefore, we repeat the computation of the above error 100 times with newly generated independent samples and report the median and the interquartile ranges of the 100 errors in each of the considered cases for \(\alpha \), \(\sigma \), \(\delta \) and n, which results in errors for \(2 \cdot 3 \cdot 2 \cdot 3=36\) different situations. The values we obtained in case \(\alpha =0.95\) and in case \(\alpha =0.99\) are reported in Tables 1 and 2, respectively.

Table 1 Simulation results in case \(\alpha =0.95\). Reported are the median (and in brackets the interquartile range) of the 100 relative errors for each of our three estimates
Table 2 Simulation results in case \(\alpha =0.99\). Reported are the median (and in brackets the interquartile range) of the 100 relative errors for each of our three estimates

Looking at the results in Tables 1 and 2, we see that our newly proposed estimate outperforms the order statistics estimate in all 36 settings of the simulations. Furthermore, it outperforms the surrogate quantile estimates whenever the model error is not zero, and also in case of the model error being zero whenever \(\sigma \) is large. There are a few cases with small \(\sigma \) value and zero model error where the surrogate quantile estimate is better than our newly proposed estimate, but in this case the difference between the errors is not large in contrast to the improvement of the error of the surrogate quantile estimate by our newly proposed estimate in most of the other cases.

Finally, we illustrate the usefulness of our newly proposed method for uncertainty quantification by using it in analysis of the uncertainty occurring in experiments with the suspension strut in Fig. 1(left) described in Introduction. We use the results of \(L_n=500\) computer experiments to construct a surrogate estimate \(\hat{m}_{L_n}\) as described above, and we apply the method proposed in Sect. 2 to compute the conditional density of the residuals. To do this, we choose as described above the bandwidths h and H from the sets

$$\begin{aligned} {\mathcal {P}}_h=\left\{ 0.000766, 0.000383, 0.000191, 0.000096, 0.000048 \right\} \end{aligned}$$

and

$$\begin{aligned} {\mathcal {P}}_H=\left\{ 0.0174, 0.0087, 0.0043, 0.0022, 0.0011 \right\} \end{aligned}$$

by using the combinatorial method of Bott and Kohler (2016). This results in \(h=0.000191\) and \(H= 0.0043\). As described above, we use the corresponding density estimate together with the surrogate model to generate an approximate sample of size 100, 000 of Y and estimate the \(\alpha =0.95\) quantile of Y by the corresponding order statistics, which results in the estimate 0.0855. In contrast, the simple order statistics estimate of the quantile based only on the \(n=20\) experimental data points yields the smaller value 0.0849.

5 Proofs

5.1 Estimation of quantiles on the basis of conditional density estimates

Let (XY), \((X_1,Y_1)\), \((X_2,Y_2)\), ...be independent and identically distributed \(\mathbb {R}^d\times \mathbb {R}\)-valued random vectors and let \(\bar{m}:\mathbb {R}^d\rightarrow \mathbb {R}\) be a measurable function. Assume that the conditional distribution of \(\bar{\epsilon }=Y-\bar{m}(X)\) given X has the density \(g_{\bar{\epsilon }|X}(\cdot ,X):\mathbb {R}\times \mathbb {R}\rightarrow \mathbb {R}\) with respect to the Lebesgue–Borel measure, where \(g_{\bar{\epsilon }|X}:\mathbb {R}\times \mathbb {R}^d\rightarrow \mathbb {R}\) is measurable. Let \(n, L_n, N_n \in \mathbb {N}\) and set

$$\begin{aligned} {\mathcal {D}}_n = \{ (X_1,Y_1), \ldots , (X_n,Y_n), (X_{n+1},\bar{m}(X_{n+1})), \ldots , (X_{n+L_n},\bar{m}(X_{n+L_n}))\}. \end{aligned}$$

Let \(\hat{m}_{L_n}(\cdot )=\hat{m}_{L_n}(\cdot ,{\mathcal {D}}_n):\mathbb {R}^d\rightarrow \mathbb {R}\) and let

$$\begin{aligned} \hat{g}_{\hat{\epsilon }|X}(\cdot ,\cdot ) = \hat{g}_{\hat{\epsilon }|X}(\cdot ,\cdot , {\mathcal {D}}_n): \mathbb {R}\times \mathbb {R}^d\rightarrow \mathbb {R}\end{aligned}$$

be a measurable function satisfying

$$\begin{aligned} \hat{g}_{\hat{\epsilon }|X}(y,x) \ge 0 \quad \text{ for } \text{ all } y \in \mathbb {R}, x \in \mathbb {R}^d. \end{aligned}$$

Let U, \(U_1\), \(U_2\), ...be independent random variables which are uniformly distributed on (0, 1) and which are independent of (XY), \((X_1,Y_1)\), ...and set

$$\begin{aligned} \hat{\epsilon } = \inf \left\{ y \in \mathbb {R}\, : \; \int _{-\infty }^y \hat{g}_{\hat{\epsilon }|X}(z,X) \, \mathrm{d}z \ge U \right\} \end{aligned}$$

and

$$\begin{aligned} \hat{\epsilon }_i = \inf \left\{ y \in \mathbb {R}\, : \; \int _{-\infty }^y \hat{g}_{\hat{\epsilon }|X}(z,X_i) \, \mathrm{d}z \ge U_i \right\} \quad (i \in \mathbb {N}). \end{aligned}$$

Set

$$\begin{aligned} \hat{Y}= & {} \hat{m}_{L_n}(X)+\hat{\epsilon } \quad \text{ and } \\ \hat{Y}_i= & {} \hat{m}_{L_n}(X_i)+\hat{\epsilon }_i \quad (i \in \{n+L_n+1, n+L_n+2, \ldots , n+L_n+N_n\}). \end{aligned}$$

For \(\alpha \in (0,1)\) set

$$\begin{aligned} q_{Y,\alpha } = \min \left\{ y \in \mathbb {R}\, : \, G_Y(y) \ge \alpha \right\} , \end{aligned}$$

where

$$\begin{aligned} G_Y(y)={\mathbf P}\{Y \le y\}, \end{aligned}$$

and

$$\begin{aligned} \hat{q}_{\hat{Y},N_n,\alpha } = \min \left\{ y \in \mathbb {R}\, : \, \hat{G}_{\hat{Y},N_n}(y) \ge \alpha \right\} , \end{aligned}$$

where

$$\begin{aligned} \hat{G}_{\hat{Y},N_n}(y) = \frac{1}{N_n} \sum _{i=1}^{N_n} I_{\{\hat{Y}_{n+L_n+i} \le y\}}. \end{aligned}$$

Lemma 1

Let \(\alpha \in (0,1)\), \(n \in \mathbb {N}\) and \(L_n , N_n \in \mathbb {N}\) and define the estimate \(\hat{q}_{Y,N_n,\alpha }\) of \(q_{Y,\alpha }\) as above. Assume that \(\hat{g}_{\hat{\epsilon }|X}\) satisfies

$$\begin{aligned} \hat{g}_{\hat{\epsilon }|X}(y,x) \ge 0 \quad (y \in \mathbb {R}, x \in \mathbb {R}^d) \quad \text{ and } \quad \int _\mathbb {R}\hat{g}_{\hat{\epsilon }|X}(y,x) \, \mathrm{d}y \le 1 \quad (x \in \mathbb {R}^d). \end{aligned}$$
(15)

Let \(\epsilon _n, \delta _n, \eta _n, e_n>0\) be such that

$$\begin{aligned}&G_{Y}( q_{Y,\alpha }+e_n-( (\log n) \cdot \epsilon _n)^{1/3}) -G_Y(q_{Y,\alpha })>( (\log n) \cdot \epsilon _n)^{1/3}\nonumber \\&\quad +\, \sqrt{\frac{\log N_n}{N_n}} + (\log n) \cdot \delta _n + (\log n) \cdot \eta _n \end{aligned}$$
(16)

and

$$\begin{aligned}&G_Y(q_{Y,\alpha }) - G_{Y}( q_{Y,\alpha }-e_n+( (\log n) \cdot \epsilon _n)^{1/3})>( (\log n) \cdot \epsilon _n)^{1/3}\nonumber \\&\quad +\, \sqrt{\frac{\log N_n}{N_n}} + (\log n) \cdot \delta _n + (\log n) \cdot \eta _n. \end{aligned}$$
(17)

Then

$$\begin{aligned}&{\mathbf P}\left\{ \left| \hat{q}_{Y,N_n,\alpha }-q_{Y,\alpha } \right|>e_n\right\} \\&\quad \le \,{\mathbf P}\left\{ \frac{1}{N_n} \sum _{i=1}^{N_n} |\hat{m}_n(X_{n+L_n+i})-\bar{m}(X_{n+L_n+i})|^2>(\log n) \cdot \epsilon _n\right\} \\&\qquad +\, {\mathbf P}\left\{ \int _{\mathbb {R}^d} \int _\mathbb {R}| \hat{g}_{\hat{\epsilon }|X}(y,x) - g_{\bar{\epsilon }|X}(y,x)| \, \mathrm{d}y \, {\mathbf P}_X(\mathrm{d}x)> (\log n) \cdot \delta _n\right\} \\&\qquad +\, {\mathbf P}\left\{ {\mathbf P}\left\{ \int _\mathbb {R}\hat{g}_{\hat{\epsilon }|X}(z,X) \, \mathrm{d}z \ne 1 \big | {\mathcal {D}}_n \right\} > (\log n) \cdot \eta _n\right\} + \frac{2}{N_n^2}. \end{aligned}$$

Proof

Set

$$\begin{aligned} \bar{Y}= & {} \bar{m}(X)+\hat{\epsilon }, \quad \bar{Y}_i=\bar{m}(X_i)+\hat{\epsilon }_i \quad (i \in \mathbb {N}),\\ G_{\bar{Y}}(y)= & {} {\mathbf P}\{ \bar{Y} \le y | {\mathcal {D}}_n\} \quad \text{ and } \quad \hat{G}_{\bar{Y},N_n}(y) = \frac{1}{N_n} \sum _{i=1}^{N_n} I_{\{\bar{Y}_{n+L_n+i} \le y\}}. \end{aligned}$$

By the Dvoretzky–Kiefer–Wolfowitz inequality [cf., Massart (1990)] applied conditionally on \({\mathcal {D}}_n\) we get

$$\begin{aligned} {\mathbf P}\left\{ \sup _{y \in \mathbb {R}} \left| G_{\bar{Y}}(y) - \hat{G}_{\bar{Y},N_n}(y) \right| >\sqrt{\frac{\log N_n}{N_n}}\right\} \le 2 \cdot \exp \left( - 2 \cdot N_n \cdot \frac{\log N_n}{N_n} \right) = \frac{2}{N_n^2}. \end{aligned}$$

Since

$$\begin{aligned}&{\mathbf P}\left\{ \left| \hat{q}_{Y,N_n,\alpha }-q_{Y,\alpha } \right|>e_n\right\} \\&\quad \le \,{\mathbf P}\left\{ \left| \hat{q}_{Y,N_n,\alpha }-q_{Y,\alpha }\right|>e_n,\frac{1}{N_n} \sum _{i=1}^{N_n} |\hat{m}_{L_n}(X_{n+L_n+i})-\bar{m}(X_{n+L_n+i})|^2 \le (\log n) \cdot \epsilon _n,\right. \\&\quad \int _{\mathbb {R}^d} \int _\mathbb {R}| \hat{g}_{\hat{\epsilon }|X}(y,x) - g_{\bar{\epsilon }|X}(y,x)| \, \mathrm{d}y \, {\mathbf P}_X(\mathrm{d}x) \le (\log n) \cdot \delta _n,\\&\quad \left. {\mathbf P}\left\{ \int _\mathbb {R}\hat{g}_{\hat{\epsilon }|X}(z,X) \, \mathrm{d}z \ne 1 \big | {\mathcal {D}}_n \right\} \le (\log n) \cdot \eta _n, \sup _{y \in \mathbb {R}} \left| G_{\bar{Y}}(y) - \hat{G}_{\bar{Y},N_n}(y) \right| \le \sqrt{\frac{\log N_n}{N_n}} \right\} \\&\qquad +\, {\mathbf P}\left\{ \frac{1}{N_n} \sum _{i=1}^{N_n} |\hat{m}_{L_n}(X_{n+L_n+i})-\bar{m}(X_{n+L_n+i})|^2> (\log n) \cdot \epsilon _n \right\} \\&\qquad +\,{\mathbf P}\left\{ \int _{\mathbb {R}^d} \int _\mathbb {R}| \hat{g}_{\hat{\epsilon }|X}(y,x) - g_{\bar{\epsilon }|X}(y,x)| \, \mathrm{d}y \, {\mathbf P}_X(\mathrm{d}x)> (\log n) \cdot \delta _n\right\} \\&\qquad +\, {\mathbf P}\left\{ {\mathbf P}\left\{ \int _\mathbb {R}\hat{g}_{\hat{\epsilon }|X}(z,X) \, \mathrm{d}z \ne 1 \big | {\mathcal {D}}_n \right\}> (\log n) \cdot \eta _n\right\} \\&\qquad +\, {\mathbf P}\left\{ \sup _{y \in \mathbb {R}} \left| G_{\bar{Y}}(y) - \hat{G}_{\bar{Y},N_n}(y) \right| >\sqrt{\frac{\log N_n}{N_n}} \right\} , \end{aligned}$$

it suffices to show that

$$\begin{aligned}&\frac{1}{N_n} \sum _{i=1}^{N_n} |\hat{m}_{L_n}(X_{n+L_n+i})-\bar{m}(X_{n+L_n+i})|^2 \le (\log n) \cdot \epsilon _n, \end{aligned}$$
(18)
$$\begin{aligned}&\int _{\mathbb {R}^d} \int _\mathbb {R}| \hat{g}_{\hat{\epsilon }|X}(y,x) - g_{\bar{\epsilon }|X}(y,x)| \, \mathrm{d}y \, {\mathbf P}_X(\mathrm{d}x) \le (\log n) \cdot \delta _n , \end{aligned}$$
(19)
$$\begin{aligned}&{\mathbf P}\left\{ \int _\mathbb {R}\hat{g}_{\hat{\epsilon }|X}(z,X) \, \mathrm{d}z \ne 1 \big | {\mathcal {D}}_n \right\} \le (\log n) \cdot \eta _n , \end{aligned}$$
(20)

and

$$\begin{aligned} \sup _{y \in \mathbb {R}} \left| G_{\bar{Y}}(y) - \hat{G}_{\bar{Y},N_n}(y) \right| \le \sqrt{\frac{\log N_n}{N_n}} \end{aligned}$$
(21)

imply

$$\begin{aligned} \left| \hat{q}_{Y,N_n,\alpha }-q_{Y,\alpha } \right| \le e_n. \end{aligned}$$
(22)

By the definition of \(\hat{q}_{Y,N_n,\alpha }\), we know that (22) is implied by

$$\begin{aligned} \hat{G}_{\hat{Y},N_n} ( q_{Y,\alpha } + e_n) \ge \alpha \end{aligned}$$
(23)

and

$$\begin{aligned} \hat{G}_{\hat{Y},N_n} ( q_{Y,\alpha } - e_n) < \alpha , \end{aligned}$$
(24)

so it suffices to show that (18)–(21) imply (23) and (24), what we do next.

So assume from now on that (18)–(21) hold. Before we start with the proof of (23) we show

$$\begin{aligned}&\sup _{y \in \mathbb {R}} | G_Y(y)-G_{\bar{Y}}(y)|\nonumber \\&\quad \le \, \int _{\mathbb {R}^d} \int _\mathbb {R}| \hat{g}_{\hat{\epsilon }|X}(y,x) - g_{\bar{\epsilon }|X}(y,x)| \, \mathrm{d}y \, {\mathbf P}_X(\mathrm{d}x) + (\log n) \cdot \eta _n. \end{aligned}$$
(25)

Indeed, we observe first

$$\begin{aligned} G_Y(y)= & {} {\mathbf P}\{ Y \le y\}\\= & {} {\mathbf E}\left\{ {\mathbf P}\left\{ \bar{m}(X) + \bar{\epsilon } \le y \big | X \right\} \right\} \\= & {} {\mathbf E}\left\{ {\mathbf P}\left\{ \bar{\epsilon } \le y - \bar{m}(X)\big | X \right\} \right\} \\= & {} {\mathbf E}\left\{ \int _{-\infty }^{y-\bar{m}(X)} g_{\bar{\epsilon }|X}(z,X) \, \mathrm{d}z \right\} \\= & {} \int _{\mathbb {R}^d} \int _{-\infty }^{y-\bar{m}(x)} g_{\bar{\epsilon }|X}(z,x) \, \mathrm{d}z \, {\mathbf P}_X (\mathrm{d}x)\\= & {} \int _{\mathbb {R}^d} \int _{-\infty }^{y} g_{\bar{\epsilon }|X}(z-\bar{m}(x),x) \, \mathrm{d}z \, {\mathbf P}_X (\mathrm{d}x) . \end{aligned}$$

Furthermore, we have

$$\begin{aligned}&G_{\bar{Y}}(y)\\&\quad = {\mathbf E}\left\{ {\mathbf P}\left\{ \hat{\epsilon } \le y - \bar{m}(X) \big | X, {\mathcal {D}}_n \right\} \big | {\mathcal {D}}_n \right\} \\&\quad = {\mathbf E}\left\{ I_{\{ \int _\mathbb {R}\hat{g}_{\hat{\epsilon }|X} (u,X) \, \mathrm{d}u = 1\}} \cdot {\mathbf P}\left\{ \hat{\epsilon } \le y - \bar{m}(X) \big | X, {\mathcal {D}}_n \right\} \right. \\&\qquad \left. +\, I_{\{ \int _\mathbb {R}\hat{g}_{\bar{\epsilon }|X} (u,X) \, \mathrm{d}u \ne 1\}} \cdot {\mathbf P}\left\{ \hat{\epsilon } \le y - \bar{m}(X) \big | X, {\mathcal {D}}_n \right\} \big | {\mathcal {D}}_n \right\} \\&\quad = {\mathbf E}\left\{ I_{\{ \int _\mathbb {R}\hat{g}_{\hat{\epsilon }|X} (u,X) \, \mathrm{d}u = 1\}} \cdot \int _{-\infty }^{y-\bar{m}(X)} \hat{g}_{\hat{\epsilon }|X} (z,X) \, \mathrm{d}z\right. \\&\qquad \left. + \,I_{\{ \int _\mathbb {R}\hat{g}_{\hat{\epsilon }|X} (u,X) \, \mathrm{d}u \ne 1\}} \cdot {\mathbf P}\left\{ \hat{\epsilon } \le y - \bar{m}(X) \big | X,{\mathcal {D}}_n \right\} \big | {\mathcal {D}}_n \right\} \\&\quad = \int _{\mathbb {R}^d} \int _{-\infty }^{y} \hat{g}_{\hat{\epsilon }|X} (z-\bar{m}(x),X) \, \mathrm{d}z \, {\mathbf P}_X(\mathrm{d}x)\\&\qquad +\, {\mathbf E}\left\{ I_{\{ \int _\mathbb {R}\hat{g}_{\hat{\epsilon }|X} (u,X) \, \mathrm{d}u \ne 1\}} \right. \\&\qquad \left. \cdot \, \left( {\mathbf P}\left\{ \hat{\epsilon } \le y - \bar{m}(X) \big | X, {\mathcal {D}}_n \right\} - \int _{-\infty }^{y-\bar{m}(X)} \hat{g}_{\hat{\epsilon }|X} (z,X) \, \mathrm{d}z \right) \big | {\mathcal {D}}_n \right\} . \end{aligned}$$

Since we have

$$\begin{aligned} \left| {\mathbf P}\left\{ \hat{\epsilon } \le y - \bar{m}(X) \big | X \right\} - \int _{-\infty }^{y-\bar{m}(X)} \hat{g}_{\hat{\epsilon }|X} (z,X) \, \mathrm{d}z \right| \le 1 \quad \hbox {a.s}., \end{aligned}$$

which follows from assumption (15)) and which implies

$$\begin{aligned}&\left| {\mathbf E}\left\{ I_{\{ \int _\mathbb {R}\hat{g}_{\hat{\epsilon }|X} (u,X) \, \mathrm{d}u \ne 1\}} \cdot \left( {\mathbf P}\left\{ \hat{\epsilon } \le y - \bar{m}(X) \big | X, {\mathcal {D}}_n \right\} - \int _{-\infty }^{y-\bar{m}(X)} \hat{g}_{\hat{\epsilon }|X} (z,X) \, \mathrm{d}z \right) \big | {\mathcal {D}}_n \right\} \right| \\&\quad \le \,{\mathbf P}\left\{ \int _\mathbb {R}\hat{g}_{\hat{\epsilon }|X}(z,X) \, \mathrm{d}z \ne 1 \big | {\mathcal {D}}_n \right\} \le (\log n) \cdot \eta _n , \end{aligned}$$

and

$$\begin{aligned}&\sup _{y \in \mathbb {R}} \left| \int _{\mathbb {R}^d} \int _{-\infty }^{y} g_{\bar{\epsilon }|X}(z-\bar{m}(x),x) \, \mathrm{d}z \, {\mathbf P}_X (\mathrm{d}x) - \int _{\mathbb {R}^d} \int _{-\infty }^{y} \hat{g}_{{\hat{\epsilon }}|X}(z-\bar{m}(x),x) \, \mathrm{d}z \, {\mathbf P}_X (\mathrm{d}x) \right| \\&\quad \le \, \sup _{y \in \mathbb {R}} \int _{\mathbb {R}^d} \int _{-\infty }^{y} | g_{\bar{\epsilon }|X}(z-\bar{m}(x),x) - \hat{g}_{{\hat{\epsilon }}|X}(z-\bar{m}(x),x) | \, \mathrm{d}z \, {\mathbf P}_X (\mathrm{d}x)\\&\quad = \int _{\mathbb {R}^d} \int _\mathbb {R}| g_{\bar{\epsilon }|X}(z-\bar{m}(x),x) - \hat{g}_{{\hat{\epsilon }}|X}(z-\bar{m}(x),x) | \, \mathrm{d}z \, {\mathbf P}_X (\mathrm{d}x)\\&\quad = \int _{\mathbb {R}^d} \int _\mathbb {R}| \hat{g}_{\hat{\epsilon }|X}(y,x) - g_{\bar{\epsilon }|X}(y,x)| \, \mathrm{d}y \, {\mathbf P}_X(\mathrm{d}x) \end{aligned}$$

this implies (25).

Next we prove (23). Using (18), (21), (25) and (19), we get

$$\begin{aligned}&\hat{G}_{\hat{Y},N_n}( q_{Y,\alpha }+e_n)\\&\quad \ge \,\frac{1}{N_n} \sum _{i=1}^{N_n} I_{ \left\{ \hat{Y}_{n+L_n+i} \le q_{Y,\alpha }+e_n, |\hat{Y}_{n+L_n+i}-\bar{Y}_{n+L_n+i}| \le ( (\log n) \cdot \epsilon _n)^{1/3} \right\} }\\&\quad \ge \, \frac{1}{N_n} \sum _{i=1}^{N_n} I_{ \left\{ \bar{Y}_{n+L_n+i} \le q_{Y,\alpha }+e_n-( (\log n) \cdot \epsilon _n)^{1/3}, |\hat{Y}_{n+L_n+i}-\bar{Y}_{n+L_n+i}| \le ( (\log n) \cdot \epsilon _n)^{1/3} \right\} }\\&\quad \ge \,\frac{1}{N_n} \sum _{i=1}^{N_n} I_{ \left\{ \bar{Y}_{n+L_n+i} \le q_{Y,\alpha }+e_n-( (\log n) \cdot \epsilon _n)^{1/3} \right\} } \\&\qquad -\,\frac{1}{N_n} \sum _{i=1}^{N_n} I_{ \left\{ |\hat{Y}_{n+L_n+i}-\bar{Y}_{n+L_n+i}|> ( (\log n) \cdot \epsilon _n)^{1/3} \right\} }\\&\quad \ge \, \frac{1}{N_n} \sum _{i=1}^{N_n} I_{ \left\{ \bar{Y}_{n+L_n+i} \le q_{Y,\alpha }+c \cdot e_n-( (\log n) \cdot \epsilon _n)^{1/3} \right\} } - \frac{1}{N_n} \sum _{i=1}^{N_n} \frac{|\hat{Y}_{n+L_n+i}-\bar{Y}_{n+L_n+i}|^2}{((\log n) \cdot \epsilon _n)^{2/3}}\\&\quad = \frac{1}{N_n} \sum _{i=1}^{N_n} I_{ \left\{ \bar{Y}_{n+L_n+i} \le q_{Y,\alpha }+c \cdot e_n-( (\log n) \cdot \epsilon _n)^{1/3} \right\} }\\&\qquad -\, \frac{1}{((\log n) \cdot \epsilon _n)^{2/3}} \cdot \frac{1}{N_n} \sum _{i=1}^{N_n} |\hat{m}_{L_n}(X_{n+L_n+i}) - \bar{m}(X_{n+L_n+i})|^2\\&\quad \ge \, \hat{G}_{\bar{Y},N_n}( q_{Y,\alpha }+e_n-( (\log n) \cdot \epsilon _n)^{1/3}) - ( (\log n) \cdot \epsilon _n)^{1/3}\\&\quad \ge \, G_{\bar{Y}}( q_{Y,\alpha }+e_n-( (\log n) \cdot \epsilon _n)^{1/3}) - ( (\log n) \cdot \epsilon _n)^{1/3}\\&\qquad - \,\sup _{y \in \mathbb {R}} \left| G_{\bar{Y}}(y) - \hat{G}_{\bar{Y},N_n}(y) \right| \\&\quad \ge \, G_{\bar{Y}}( q_{Y,\alpha }+e_n-( (\log n) \cdot \epsilon _n)^{1/3}) - ( (\log n) \cdot \epsilon _n)^{1/3} - \sqrt{\frac{\log N_n}{N_n}}\\&\quad \ge \, G_{Y}( q_{Y,\alpha }+e_n-( (\log n) \cdot \epsilon _n)^{1/3}) - ( (\log n) \cdot \epsilon _n)^{1/3}\\&\qquad -\,\sqrt{\frac{\log N_n}{N_n}} - \sup _{y \in \mathbb {R}} | G_Y(y)-G_{\bar{Y}}(y)|\\&\quad \ge \, G_{Y}( q_{Y,\alpha }+e_n-( (\log n) \cdot \epsilon _n)^{1/3}) - ( (\log n) \cdot \epsilon _n)^{1/3}\\&\qquad -\,\sqrt{\frac{\log N_n}{N_n}} - (\log n) \cdot \delta _n - (\log n) \cdot \eta _n\\&\quad >\, G_{Y}(q_{Y,\alpha })=\alpha , \end{aligned}$$

where the last inequality follows from (16).

In the same way, we argue that

$$\begin{aligned}&\hat{G}_{\hat{Y},N_n}( q_{Y,\alpha }-e_n)\\&\quad \le \, G_{Y}( q_{Y,\alpha }-e_n+((\log n) \cdot \epsilon _n)^{1/3}) + ( (\log n) \cdot \epsilon _n)^{1/3} + \sqrt{\frac{\log N_n}{N_n}} + (\log n) \cdot \delta _n\\&\qquad +\, (\log n) \cdot \eta _n\\&\quad <\, \alpha , \end{aligned}$$

which finishes the proof. \(\square \)

5.2 A bound on the \(L_1\) error of a conditional density estimate

Lemma 2

Let (XY), \((X_1,Y_1), \ldots , (X_n,Y_n)\) be independent and identically distributed \(\mathbb {R}^d\times \mathbb {R}\)-valued random vectors. Assume that the conditional distribution \({\mathbf P}_{Y|X}\) of Y given X has the density \( g_{Y|X}(\cdot ,X):\mathbb {R}\rightarrow \mathbb {R}\) with respect to the Lebesgue–Borel measure, where

$$\begin{aligned} g_{Y|X}:\mathbb {R}\times \mathbb {R}^d\rightarrow \mathbb {R}\end{aligned}$$

is a measurable function which satisfies

$$\begin{aligned} | g_{Y|X}(y,x_1) - g_{Y|X}(y,x_2)| \le C_1 \cdot \Vert x_1-x_2\Vert ^r \quad \text{ for } \text{ all } x_1,x_2 \in \mathbb {R}^d, y \in \mathbb {R}, \end{aligned}$$
(26)

and

$$\begin{aligned} | g_{Y|X}(u,x)-g_{Y|X}(v,x)| \le C_2 \cdot |u-v|^s \quad \text{ for } \text{ all } u,v \in \mathbb {R}, x \in \mathbb {R}^d\end{aligned}$$
(27)

for some \( r,s \in (0,1]\) and some \(C_1,C_2 >0\). Let \(\gamma _n>0\). For \(x \in \mathbb {R}^d\) let \(- \infty<a_n(x) \le b_n(x) < \infty \) be such that

$$\begin{aligned} \int _{[-\gamma _n,\gamma _n]^d} |b_n(x)-a_n(x)| \, {\mathbf P}_X(\mathrm{d}x) < \infty . \end{aligned}$$
(28)

Set \(G=I_{[-1,1]}\) and let \(K:\mathbb {R}\rightarrow \mathbb {R}\) be a density satisfying

$$\begin{aligned} \int _\mathbb {R}K^2(z) \, \mathrm{d}z< \infty \quad \text{ and } \quad \int _\mathbb {R}K(z) \cdot |z|^s \mathrm{d}z < \infty . \end{aligned}$$

Let \(h_n,H_n>0\) be such that \(2 \cdot \sqrt{d} \cdot \gamma _n \ge H_n\), and set

$$\begin{aligned} \hat{g}_{Y|X}(y,x) = \frac{ \sum _{i=1}^n G \left( \frac{\Vert x-X_i\Vert }{H_n} \right) \cdot K \left( \frac{y-Y_i}{h_n} \right) }{ h_n \cdot \sum _{j=1}^n G \left( \frac{\Vert x-X_j\Vert }{H_n} \right) }, \end{aligned}$$
(29)

where \(\frac{0}{0} :=0\). Then

$$\begin{aligned}&{\mathbf E}\int _{[-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]} \left| \hat{g}_{Y|X}(y,x) - g_{Y|X}(y,x) \right| \, \mathrm{d}y \, {\mathbf P}_X (\mathrm{d}x)\\&\quad \le \, c_1 \cdot \left( \sqrt{ \frac{ \int _{[-\gamma _n,\gamma _n]^d} |b_n(x)-a_n(x)| \, {\mathbf P}_X(\mathrm{d}x) \cdot \gamma _n^d }{ n \cdot H_n^d \cdot h_n }}\right. \\&\qquad \left. +\, \frac{ \gamma _n^d}{n \cdot H_n^d} + \int _{[-\gamma _n,\gamma _n]^d} |b_n(x)-a_n(x)| \, {\mathbf P}_X(\mathrm{d}x) \cdot \left( C_1 \cdot H_n^r + C_2 \cdot h_n^s \right) \right) , \end{aligned}$$

where the constant

$$\begin{aligned} c_1= \max \left\{ 1, \sqrt{ 2 \cdot (4 \cdot \sqrt{d})^d \cdot \int K^2(z) \, \mathrm{d}z} , (4 \cdot \sqrt{d})^d, \int K(z) \cdot |z|^s \mathrm{d}z \right\} \end{aligned}$$

does not depend on \({\mathbf P}_{(X,Y)}\), \(C_1\) or \(C_2\).

In the proof, we will need the following well-known auxiliary result:

Lemma 3

Let \(n \in \mathbb {N}\), let \(H_n, \gamma _n>\) be such that \(2 \cdot \sqrt{d} \cdot \gamma _n \ge H_n\), and let X be an \(\mathbb {R}^d\)-valued random variable. Then, it holds:

$$\begin{aligned} \int _{[-\gamma _n,\gamma _n]^d} \frac{1}{n \cdot {\mathbf P}_X(S_{H_n}(x))} \, {\mathbf P}_X (\mathrm{d}x) \le (4 \cdot \sqrt{d})^d \cdot \frac{\gamma _n^d}{n \cdot H_n^d}. \end{aligned}$$

Proof

The assertion follows from the proof of equation (5.1) in Györfi et al. (2002), a complete proof is available from the authors on request. \(\square \)

Proof of Lemma 2

By triangle inequality, we have

$$\begin{aligned}&{\mathbf E}\int _{[-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]} \left| \hat{g}_{Y|X}(y,x) - g_{Y|X}(y,x) \right| \, \mathrm{d}y \, {\mathbf P}_X (\mathrm{d}x)\nonumber \\&\quad \le \, {\mathbf E}\int _{[-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]} \left| \hat{g}_{Y|X}(y,x) - {\mathbf E}\{ \hat{g}_{Y|X}(y,x) | X_1^n \} \right| \, \mathrm{d}y \, {\mathbf P}_X (\mathrm{d}x)\nonumber \\&\qquad +\, {\mathbf E}\int _{[-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]} \left| {\mathbf E}\{ \hat{g}_{Y|X}(y,x) | X_1^n \} - g_{Y|X}(y,x) \right| \, \mathrm{d}y \, {\mathbf P}_X (\mathrm{d}x).\nonumber \\ \end{aligned}$$
(30)

In the first step of the proof, we show

$$\begin{aligned}&{\mathbf E}\int _{[-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]} \left| \hat{g}_{Y|X}(y,x) - {\mathbf E}\{ \hat{g}_{Y|X}(y,x) | X_1^n \} \right| \, \mathrm{d}y \, {\mathbf P}_X (\mathrm{d}x)\nonumber \\&\quad \le \, \sqrt{ 2 \cdot (4 \cdot \sqrt{d})^d \cdot \int K^2(z) \, \mathrm{d}z} \cdot \sqrt{ \frac{ \int _{[-\gamma _n,\gamma _n]^d} |b_n(x)-a_n(x)| \, {\mathbf P}_X(\mathrm{d}x) \cdot \gamma _n^d }{ n \cdot H_n^d \cdot h_n } }.\nonumber \\ \end{aligned}$$
(31)

The inequality of Cauchy–Schwarz implies

$$\begin{aligned}&{\mathbf E}\int _{[-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]} \left| \hat{g}_{Y|X}(y,x) - {\mathbf E}\{ \hat{g}_{Y|X}(y,x) | X_1^n \} \right| \, \mathrm{d}y \, {\mathbf P}_X (\mathrm{d}x)\\&\quad = {\mathbf E}\int _{[-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]} {\mathbf E}\left\{ 1 \cdot \left| \hat{g}_{Y|X}(y,x) - {\mathbf E}\{ \hat{g}_{Y|X}(y,x) | X_1^n \} \right| \Big | X_1^n \right\} \, \mathrm{d}y \, {\mathbf P}_X (\mathrm{d}x)\\&\quad \le \, {\mathbf E}\sqrt{ \int _{[-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]} {\mathbf E}\left\{ 1^2 \big | X_1^n \right\} \, \mathrm{d}y \, {\mathbf P}_X (\mathrm{d}x) }\\&\qquad \cdot \, {\mathbf E}\sqrt{ \int _{[-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]} {\mathbf E}\left\{ \left| \hat{g}_{Y|X}(y,x) - {\mathbf E}\{ \hat{g}_{Y|X}(y,x) | X_1^n \} \right| ^2 \Big | X_1^n \right\} \, \mathrm{d}y \, {\mathbf P}_X (\mathrm{d}x)}\\&\quad \le \, \sqrt{ \int _{[-\gamma _n,\gamma _n]^d} | b_n(x)-a_n(x)| \, {\mathbf P}_X(\mathrm{d}x) }\\&\qquad \cdot \, \sqrt{ {\mathbf E}\int _{[-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]} {\mathbf E}\left\{ \left| \hat{g}_{Y|X}(y,x) - {\mathbf E}\{ \hat{g}_{Y|X}(y,x) | X_1^n \} \right| ^2 \Big | X_1^n \right\} \, \mathrm{d}y \, {\mathbf P}_X (\mathrm{d}x)}, \end{aligned}$$

hence it suffices to show

$$\begin{aligned}&{\mathbf E}\int _{[-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]} {\mathbf E}\left\{ \left| \hat{g}_{Y|X}(y,x) - {\mathbf E}\{ \hat{g}_{Y|X}(y,x) | X_1^n \} \right| ^2 \Big | X_1^n \right\} \, \mathrm{d}y \, {\mathbf P}_X (\mathrm{d}x)\nonumber \\&\quad \le \, 2 \cdot (4 \cdot \sqrt{d})^d \cdot \int K^2(z) \, \mathrm{d}z \cdot \frac{ \gamma _n^d }{ n \cdot H_n^d \cdot h_n }. \end{aligned}$$
(32)

To show this, we observe first that the independence of the data implies

$$\begin{aligned}&{\mathbf E}\left\{ \left| \hat{g}_{Y|X}(y,x) - {\mathbf E}\{ \hat{g}_{Y|X}(y,x) | X_1^n \} \right| ^2 \Big | X_1^n \right\} \\&\quad = {\mathbf E}\left\{ \left| \frac{ \sum _{i=1}^n G \left( \frac{\Vert x-X_i\Vert }{H_n} \right) \cdot \left( K \left( \frac{y-Y_i}{h_n} \right) - {\mathbf E}\left\{ K \left( \frac{y-Y_i}{h_n} \right) \Big |X_i \right\} \right) }{ h_n \cdot \sum _{j=1}^n G \left( \frac{\Vert x-X_j\Vert }{H_n} \right) } \right| ^2 \Big | X_1^n \right\} \\&\quad = \frac{ \sum _{i=1}^n G \left( \frac{\Vert x-X_i\Vert }{H_n} \right) ^2 \cdot {\mathbf E}\left\{ \left| K \left( \frac{y-Y_i}{h_n} \right) - {\mathbf E}\left\{ K \left( \frac{y-Y_i}{h_n} \right) \Big |X_i \right\} \right| ^2 \Big |X_i \right\} }{ h_n^2 \cdot \left( \sum _{j=1}^n G \left( \frac{\Vert x-X_j\Vert }{H_n} \right) \right) ^2 }\\&\quad \le \, \frac{ \sum _{i=1}^n G \left( \frac{\Vert x-X_i\Vert }{H_n} \right) ^2 \cdot {\mathbf E}\left\{ \left| K \left( \frac{y-Y_i}{h_n} \right) \right| ^2 \Big |X_i \right\} }{ h_n^2 \cdot \left( \sum _{j=1}^n G \left( \frac{\Vert x-X_j\Vert }{H_n} \right) \right) ^2 }\\&\quad = \frac{ \sum _{i=1}^n G \left( \frac{\Vert x-X_i\Vert }{H_n} \right) \cdot \int _{\mathbb {R}} K^2 \left( \frac{y-u}{h_n} \right) \cdot g_{Y|X}(u,X_i) \, \mathrm{d}u }{ h_n^2 \cdot \left( \sum _{j=1}^n G \left( \frac{\Vert x-X_j\Vert }{H_n} \right) \right) ^2 }. \end{aligned}$$

Hence,

$$\begin{aligned}&{\mathbf E}\int _{[-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]} {\mathbf E}\left\{ \left| \hat{g}_{Y|X}(y,x) - {\mathbf E}\{ \hat{g}_{Y|X}(y,x) | X_1^n \} \right| ^2 \Big | X_1^n \right\} \, \mathrm{d}y \, {\mathbf P}_X (\mathrm{d}x)\\&\quad \le \,{\mathbf E}\int _{[-\gamma _n,\gamma _n]^d} \frac{ \sum _{i=1}^n G \left( \frac{\Vert x-X_i\Vert }{H_n} \right) \cdot \int _{\mathbb {R}} \int _\mathbb {R}K^2 \left( \frac{y-u}{h_n} \right) \, \mathrm{d}y \cdot g_{Y|X}(u,X_i) \, \mathrm{d}u }{ h_n^2 \cdot \left( \sum _{j=1}^n G \left( \frac{\Vert x-X_j\Vert }{H_n} \right) \right) ^2 } \, {\mathbf P}_X (\mathrm{d}x)\\&\quad = {\mathbf E}\int _{[-\gamma _n,\gamma _n]^d} \frac{ \sum _{i=1}^n G \left( \frac{\Vert x-X_i\Vert }{H_n} \right) \cdot \int _\mathbb {R}K^2 \left( z \right) \, \mathrm{d}z \cdot h_n \cdot \int _{\mathbb {R}} g_{Y|X}(u,X_i) \, \mathrm{d}u }{ h_n^2 \cdot \left( \sum _{j=1}^n G \left( \frac{\Vert x-X_j\Vert }{H_n} \right) \right) ^2 } \, {\mathbf P}_X (\mathrm{d}x)\\&\quad = \frac{ \int K^2 \left( z \right) \, \mathrm{d}z }{h_n} \cdot {\mathbf E}\left\{ \frac{ I_{ [-\gamma _n,\gamma _n]^d }(X) }{ \sum _{j=1}^n G \left( \frac{\Vert X-X_j\Vert }{H_n} \right) } \cdot I_{ \left\{ \sum _{j=1}^n G \left( \frac{\Vert X-X_j\Vert }{H_n} \right) >0 \right\} } \right\} . \end{aligned}$$

Application of Lemma 4.1 in Györfi et al. (2002) and Lemma 3 yields

$$\begin{aligned}&{\mathbf E}\left\{ \frac{I_{ [-\gamma _n,\gamma _n]^d }(X) }{ \sum _{j=1}^n G \left( \frac{\Vert X-X_j\Vert }{H_n} \right) } \cdot I_{ \left\{ \sum _{j=1}^n G \left( \frac{\Vert X-X_j\Vert }{H_n} \right) >0\right\} }\right\} \\&\quad \le \, \int _{[-\gamma _n,\gamma _n]^d} \frac{2}{(n+1) \cdot {\mathbf P}_X(S_{H_n}(x))} \, {\mathbf P}_X(\mathrm{d}x) \le 2 \cdot (4 \cdot \sqrt{d})^d \cdot \frac{\gamma _n^d}{n \cdot H_n^d}, \end{aligned}$$

which completes the proof of (32).

In the second step of the proof, we show

$$\begin{aligned}&{\mathbf E}\int _{[-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]} \left| {\mathbf E}\{ \hat{g}_{Y|X}(y,x) | X_1^n \} - g_{Y|X}(y,x) \right| \, \mathrm{d}y \,{\mathbf P}_X (\mathrm{d}x). \nonumber \\&\quad \le \,(4 \cdot \sqrt{d})^d \cdot \frac{\gamma _n^d}{n \cdot H_n^d} + \int _{[-\gamma _n,\gamma _n]^d} |b_n(x)-a_n(x)| \, {\mathbf P}_X(\mathrm{d}x) \cdot C_1 \cdot H_n^r \nonumber \\&\qquad +\, \int K(z) \cdot |z|^s \mathrm{d}z \cdot \int _{[-\gamma _n,\gamma _n]^d} |b_n(x)-a_n(x)| \, {\mathbf P}_X(\mathrm{d}x) \cdot C_2 \cdot h_n^s. \end{aligned}$$
(33)

Using the independence of the data and arguing similar as in the proof of inequality (32), we get

$$\begin{aligned}&{\mathbf E}\int _{[-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]} \left| {\mathbf E}\{ \hat{g}_{Y|X}(y,x) | X_1^n \} - g_{Y|X}(y,x) \right| \, \mathrm{d}y \,{\mathbf P}_X (\mathrm{d}x)\\&\quad = {\mathbf E}\int _{[-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]} \left| \frac{ \sum _{i=1}^n G \left( \frac{\Vert x-X_i\Vert }{H_n} \right) \cdot \int _{\mathbb {R}} K \left( \frac{y-u}{h_n} \right) \cdot g_{Y|X}(u,X_i) \, \mathrm{d}u }{ h_n \cdot \sum _{j=1}^n G \left( \frac{\Vert x-X_j\Vert }{H_n} \right) }\right. \\&\qquad \left. -\, g_{Y|X}(y,x) \right| \, \mathrm{d}y \, {\mathbf P}_X (\mathrm{d}x)\\&\quad = {\mathbf E}\int _{[-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]} I_{ \left\{ \sum _{j=1}^n G \left( \frac{\Vert x-X_j\Vert }{H_n} \right) =0 \right\} } g_{Y|X}(y,x) \, \mathrm{d}y \, {\mathbf P}_X (\mathrm{d}x)\\&\qquad +\, {\mathbf E}\int _{[-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]} \left| \sum _{i=1}^n \frac{ G \left( \frac{\Vert x-X_i\Vert }{H_n} \right) }{ \sum _{j=1}^n G \left( \frac{\Vert x-X_j\Vert }{H_n} \right) }\right. \\&\qquad \left. \cdot \, \int _{\mathbb {R}} \frac{1}{h_n} \cdot K \left( \frac{y-u}{h_n} \right) \cdot (g_{Y|X}(u,X_i) - g_{Y|X}(y,x)) \, \mathrm{d}u \right| \, \mathrm{d}y \, {\mathbf P}_X (\mathrm{d}x)\\&\quad \le \, \int _{[-\gamma _n,\gamma _n]^d} {\mathbf P}\left\{ \sum _{j=1}^n G \left( \frac{\Vert x-X_j\Vert }{H_n} \right) =0 \right\} {\mathbf P}_X (\mathrm{d}x)\\&\qquad +\, {\mathbf E}\int _{[-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]} \left| \sum _{i=1}^n \frac{ G \left( \frac{\Vert x-X_i\Vert }{H_n} \right) }{ \sum _{j=1}^n G \left( \frac{\Vert x-X_j\Vert }{H_n} \right) }\right. \\&\qquad \left. \cdot \, \int _{\mathbb {R}} \frac{1}{h_n} \cdot K \left( \frac{y-u}{h_n} \right) \cdot (g_{Y|X}(u,X_i) - g_{Y|X}(y,x)) \, \mathrm{d}u \right| \, \mathrm{d}y \, {\mathbf P}_X (\mathrm{d}x)\\&\quad \le \, \int _{[-\gamma _n,\gamma _n]^d} \left( 1- {\mathbf P}_X(S_{H_n}(x)) \right) ^n {\mathbf P}_X (\mathrm{d}x)\\&\qquad +\, {\mathbf E}\int _{[-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]} \sum _{i=1}^n \frac{ G \left( \frac{\Vert x-X_i\Vert }{H_n} \right) }{ \sum _{j=1}^n G \left( \frac{\Vert x-X_j\Vert }{H_n} \right) }\\&\qquad \cdot \,\int _{\mathbb {R}} \frac{1}{h_n} \cdot K \left( \frac{y-u}{h_n} \right) \cdot |g_{Y|X}(u,X_i) - g_{Y|X}(y,x)) | \mathrm{d}u \, \mathrm{d}y \, {\mathbf P}_X (\mathrm{d}x). \end{aligned}$$

By Lemma 3, we get

$$\begin{aligned}&\int _{[-\gamma _n,\gamma _n]^d} \left( 1- {\mathbf P}_X(S_{H_n}(x)) \right) ^n {\mathbf P}_X (\mathrm{d}x)\\&\quad \le \, \max _{z \in \mathbb {R}_+} z \cdot e^{- z} \cdot \int _{[-\gamma _n,\gamma _n]^d} \frac{1}{n \cdot {\mathbf P}_X(S_{H_n}(x)) } \, {\mathbf P}_X (\mathrm{d}x)\\&\quad \le \, (4 \cdot \sqrt{d})^d \cdot \frac{\gamma _n^d}{n \cdot H_n^d}. \end{aligned}$$

Furthermore, by triangle inequality and assumptions (26) and (27), which imply

$$\begin{aligned} |g_{Y|X}(u,X_i) - g_{Y|X}(y,x)) |\le & {} |g_{Y|X}(u,X_i) - g_{Y|X}(u,x) | + |g_{Y|X}(u,x) - g_{Y|X}(y,x) |\\\le & {} C_1 \cdot \Vert X_i-x\Vert ^r + C_2 \cdot |y-u|^s \end{aligned}$$

we get

$$\begin{aligned}&{\mathbf E}\int _{[-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]} \sum _{i=1}^n \frac{ G \left( \frac{\Vert x-X_i\Vert }{H_n} \right) }{ \sum _{j=1}^n G \left( \frac{\Vert x-X_j\Vert }{H_n} \right) }\\&\qquad \cdot \, \int _{\mathbb {R}} \frac{1}{h_n} \cdot K \left( \frac{y-u}{h_n} \right) \cdot |g_{Y|X}(u,X_i) - g_{Y|X}(y,x)) | \mathrm{d}u \, \mathrm{d}y \, {\mathbf P}_X (\mathrm{d}x)\\&\quad \le \, {\mathbf E}\int _{[-\gamma _n,\gamma _n]^d} \sum _{i=1}^n \frac{ G \left( \frac{\Vert x-X_i\Vert }{H_n} \right) }{ \sum _{j=1}^n G \left( \frac{\Vert x-X_j\Vert }{H_n} \right) } \cdot C_1 \cdot \Vert X_i-x\Vert ^r \cdot |b_n(x)-a_n(x)| \, {\mathbf P}_X (\mathrm{d}x)\\&\qquad +\, \int _{[-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]} \int _{\mathbb {R}} \frac{1}{h_n} \cdot K \left( \frac{y-u}{h_n} \right) \cdot C_2 \cdot |u-y|^s \, \mathrm{d}u \, \mathrm{d}y \, {\mathbf P}_X (\mathrm{d}x)\\&\quad \le \, C_1 \cdot H_n^r \cdot \int _{[-\gamma _n,\gamma _n]^d} |b_n(x)-a_n(x)| \, {\mathbf P}_X (\mathrm{d}x)\\&\qquad +\, \int K(z) \cdot |z|^s \mathrm{d}z \cdot C_2 \cdot h_n^s \cdot \int _{[-\gamma _n,\gamma _n]^d} |b_n(x)-a_n(x)| \, {\mathbf P}_X (\mathrm{d}x) . \end{aligned}$$

Summarizing the above results we get the assertion. \(\square \)

5.3 Proof of Theorem 1

In the proof of Theorem 1, we will use Lemma 1, Lemma 2 and the following auxiliary result from Bott et al. (2015).

Lemma 4

Let \(K:\mathbb {R}\rightarrow \mathbb {R}\) be a symmetric and bounded density which is monotonically decreasing on \(\mathbb {R}_+\). Then, it holds

$$\begin{aligned}&\int \left| K \left( \frac{y-z_1}{h_n} \right) - K \left( \frac{y-z_2}{h_n} \right) \right| \, \mathrm{d}y \le 2\cdot K(0)\cdot |z_1-z_2| \end{aligned}$$

for arbitrary \(z_1,z_2 \in \mathbb {R}\).

Proof

See Lemma 1 in Bott et al. (2015). \(\square \)

Proof of Theorem 1

By Lemma 1 and Markov inequality, it suffices to show

$$\begin{aligned}&{\mathbf E}\left\{ \frac{1}{N_n} \sum _{i=1}^{N_n} |\hat{m}_{L_n}(X_{n+L_n+i})-\bar{m}(X_{n+L_n+i})|^2 \right\} \le \frac{\epsilon _n}{4}, \end{aligned}$$
(34)
$$\begin{aligned}&{\mathbf E}\left\{ \int _{\mathbb {R}^d} \int _\mathbb {R}| \hat{g}_{\hat{\epsilon }|X}(y,x) - g_{\bar{\epsilon }|X}(y,x)| \, \mathrm{d}y \, {\mathbf P}_X(\mathrm{d}x) \right\} \le \frac{ \delta _n}{4} \end{aligned}$$
(35)

and

$$\begin{aligned} {\mathbf P}\left\{ \int _\mathbb {R}\hat{g}_{\hat{\epsilon }|X}(z,X) \, \mathrm{d}z \ne 1 \right\} \le \frac{ \eta _n}{4}. \end{aligned}$$
(36)

In the first step of the proof, we observe that (34) is a trivial consequence of the independence of the data and the definition of \(\epsilon _n\).

In the second step of the proof, we show (35). In case \(\sum _{j=1}^n G \left( \frac{\Vert x-X_j\Vert }{H_n} \right) \ne 0\) we have that \(\hat{g}_{\hat{\epsilon }|X}(\cdot ,x)\) is a density, and we can conclude by the Lemma of Scheffé and triangle inequality

$$\begin{aligned}&\int _\mathbb {R}| \hat{g}_{\hat{\epsilon }|X}(y,x) - g_{\bar{\epsilon }|X}(y,x)| \, \mathrm{d}y\\&\quad \le \, 2 \cdot \int _\mathbb {R}\left( g_{\bar{\epsilon }|X}(y,x) - \hat{g}_{\hat{\epsilon }|X}(y,x) \right) _+ \, \mathrm{d}y\\&\quad \le \, 2 \cdot \int _{[a_n(x),b_n(x)]} \left( g_{\bar{\epsilon }|X}(y,x) - \hat{g}_{\hat{\epsilon }|X}(y,x) \right) _+ \, \mathrm{d}y + 2 \cdot \int _{[a_n(x),b_n(x)]^c} g_{\bar{\epsilon }|X}(y,x) \, \mathrm{d}y\\&\quad \le \, 2 \cdot \int _{[a_n(x),b_n(x)]} \left| g_{\bar{\epsilon }|X}(y,x) - \hat{g}_{\hat{\epsilon }|X}(y,x) \right| \, \mathrm{d}y + 2 \cdot \int _{[a_n(x),b_n(x)]^c} g_{\bar{\epsilon }|X}(y,x) \, \mathrm{d}y . \end{aligned}$$

In case \(\sum _{j=1}^n G \left( \frac{\Vert x-X_j\Vert }{H_n} \right) =0\), we have

$$\begin{aligned} \hat{g}_{\hat{\epsilon }|X}(y,x)=0 \quad \text{ for } \text{ all } y \in \mathbb {R}, \end{aligned}$$

and the above sequence of inequalities does trivially hold.

Using this, we get

$$\begin{aligned}&{\mathbf E}\left\{ \int _{\mathbb {R}^d} \int _\mathbb {R}| \hat{g}_{\hat{\epsilon }|X}(y,x) - g_{\bar{\epsilon }|X}(y,x)| \, \mathrm{d}y \, {\mathbf P}_X(\mathrm{d}x) \right\} \\&\quad \le \, 2 \cdot {\mathbf E}\left\{ \int _{ [-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]} | \hat{g}_{\hat{\epsilon }|X}(y,x) - g_{\bar{\epsilon }|X}(y,x)| \, \mathrm{d}y \, {\mathbf P}_X(\mathrm{d}x)\right\} \\&\qquad +\, 2 \cdot {\mathbf P}_X(\mathbb {R}^d{\setminus } [-\gamma _n,\gamma _n]^d) + 2 \cdot \int _{[-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]^c} g_{\bar{\epsilon }|X}(y,x) \, \mathrm{d}y \, {\mathbf P}_X (\mathrm{d}x)\\&\quad \le \, 2 \cdot {\mathbf E}\left\{ \int _{ [-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]} | \hat{g}_{\hat{\epsilon }|X}(y,x) - \hat{g}_{\bar{\epsilon }|X}(y,x)| \, \mathrm{d}y\, {\mathbf P}_X(\mathrm{d}x)\right\} \\&\qquad +\, 2 \cdot {\mathbf E}\left\{ \int _{ [-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]} | \hat{g}_{\bar{\epsilon }|X}(y,x) - g_{\bar{\epsilon }|X}(y,x)| \, \mathrm{d}y \, {\mathbf P}_X(\mathrm{d}x) \right\} \\&\qquad +\, 2 \cdot {\mathbf P}_X(\mathbb {R}^d{\setminus } [-\gamma _n,\gamma _n]^d) + 2 \cdot \int _{[-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]^c} g_{\bar{\epsilon }|X}(y,x) \, \mathrm{d}y \, {\mathbf P}_X (\mathrm{d}x) , \end{aligned}$$

where

$$\begin{aligned} \hat{g}_{\bar{\epsilon }|X}(y,x) = \frac{ \sum _{i=1}^n G \left( \frac{\Vert x-X_i\Vert }{H_n} \right) \cdot K \left( \frac{y-(Y_i-\bar{m}(X_i))}{h_n} \right) }{ h_n \cdot \sum _{j=1}^n G \left( \frac{\Vert x-X_j\Vert }{H_n} \right) }. \} \end{aligned}$$

Application of Lemma 4 yields

$$\begin{aligned}&\int _{[a_n(x),b_n(x)]} | \hat{g}_{\hat{\epsilon }|X}(y,x) - \hat{g}_{\bar{\epsilon }|X}(y,x)| \, \mathrm{d}y\\&\quad \le \, \frac{ \sum _{i=1}^n G \left( \frac{\Vert x-X_i\Vert }{H_n} \right) }{ h_n \cdot \sum _{j=1}^n G \left( \frac{\Vert x-X_j\Vert }{H_n} \right) } \cdot \int _\mathbb {R}\left| K \left( \frac{y-(Y_i-\hat{m}_{L_n}(X_i))}{h_n} \right) \right. \\&\qquad \left. -\, K \left( \frac{y-(Y_i-\bar{m}(X_i))}{h_n} \right) \right| \, \mathrm{d}y\\&\quad \le \, 2 \cdot K(0) \cdot \frac{ \sum _{i=1}^n G \left( \frac{\Vert x-X_i\Vert }{H_n} \right) \cdot |\hat{m}_{L_n}(X_i)-\bar{m}(X_i)| }{ h_n \cdot \sum _{j=1}^n G \left( \frac{\Vert x-X_j\Vert }{H_n} \right) }\\&\quad \le \, \frac{2 \cdot K(0)}{h_n} \cdot \sum _{i=1}^n \frac{ |\hat{m}_{L_n}(X_i)-\bar{m}(X_i)| }{ \left( 1 + \sum _{j \in \{1,\ldots ,n\} {\setminus } \{i\}} G \left( \frac{\Vert x-X_j\Vert }{H_n} \right) \right) } |, \end{aligned}$$

where the last inequality followed from the fact that G is the naive kernel. Using this together with the independence of the data, Lemma 4.1 in Györfi et al. (2002) and Lemma 3 we get

$$\begin{aligned}&{\mathbf E}\left\{ \int _{ [-\gamma _n,\gamma _n]^d}\int _{[a_n(x),b_n(x)]} | \hat{g}_{\hat{\epsilon }|X}(y,x) - \hat{g}_{\epsilon |X}(y,x)| \, \mathrm{d}y\, {\mathbf P}_X(\mathrm{d}x) \right\} \\&\quad \le \, \frac{2 \cdot K(0)}{h_n} \cdot \int _{ [-\gamma _n,\gamma _n]^d} \sum _{i=1}^n {\mathbf E}\left\{ \frac{ 1 }{ \left( 1 + \sum _{j \in \{1,\ldots ,n\} {\setminus } \{i\}} G \left( \frac{\Vert x-X_j\Vert }{H_n} \right) \right) } \right\} \\&\qquad \cdot \, {\mathbf E}\int _{\mathbb {R}^d} | \hat{m}_{L_n}(x)-\bar{m}(x)| \, {\mathbf P}_X (\mathrm{d}x)\\&\quad \le \, \frac{2 \cdot K(0) \cdot (4 \cdot \sqrt{d})^d \cdot \gamma _n^d}{ h_n \cdot H_n^d } \cdot {\mathbf E}\int _{\mathbb {R}^d} | \hat{m}_{L_n}(x)-\bar{m}(x)| \, {\mathbf P}_X (\mathrm{d}x). \end{aligned}$$

Application of Lemma 2 yields

$$\begin{aligned}&{\mathbf E}\left\{ \int _{ [-\gamma _n,\gamma _n]^d} \int _{[a_n(x),b_n(x)]} | \hat{g}_{\hat{\epsilon }|X}(y,x) - g_{\bar{\epsilon }|X}(y,x)| \, \mathrm{d}y \, {\mathbf P}_X(\mathrm{d}x) \right\} \\&\quad \le \, c_1 \cdot \left( \sqrt{ \frac{ \int _{[-\gamma _n,\gamma _n]^d} |b_n(x)-a_n(x)| \, {\mathbf P}_X(\mathrm{d}x) \cdot \gamma _n^d }{ n \cdot H_n^d \cdot h_n }}\right. \\&\qquad \left. +\, \frac{ \gamma _n^d}{n \cdot H_n^d} + \int _{[-\gamma _n,\gamma _n]^d} |b_n(x)-a_n(x)| \, {\mathbf P}_X(\mathrm{d}x) \cdot \left( C_1 \cdot H_n^\alpha + C_2 \cdot h_n^r \right) \right) . \end{aligned}$$

Summarizing the above results, the proof of (35) is complete.

In the third step of the proof, we show (36). As in the proof of Lemma 2, we get

$$\begin{aligned}&{\mathbf P}\left\{ \int _\mathbb {R}\hat{g}_{\hat{\epsilon }|X}(z,X) \, \mathrm{d}z \ne 1 \right\} \\&\quad = {\mathbf P}\left\{ \sum _{j \in \{1,\ldots ,n\} } G \left( \frac{\Vert X-X_j\Vert }{H_n} \right) =0 \right\} \\&\quad \le \, {\mathbf P}\left\{ X \in \mathbb {R}^d{\setminus } [-\gamma _n,\gamma _n]^d\right\} + {\mathbf P}\left\{ X \in [-\gamma _n,\gamma _n]^d, \sum _{j \in \{1,\ldots ,n\} } G \left( \frac{\Vert X-X_j\Vert }{H_n} \right) =0 \right\} \\&\quad \le \, {\mathbf P}\left\{ X \in \mathbb {R}^d{\setminus } [-\gamma _n,\gamma _n]^d\right\} + \frac{ 2 \cdot (4 \cdot \sqrt{d})^d \cdot \gamma _n^d}{n \cdot H_n^d}. \end{aligned}$$

Summarizing the above results, the proof is complete. \(\square \)