1 Introduction

The quantile, also called Value-at-Risk in actuarial and financial areas, is a widespread tool for risk measurement, due to its simplicity and interpretability: if Y is a random variable with a cumulative distribution function F, the quantile at level \(\alpha \in (0,1)\) is defined as \(q(\alpha )= \inf \left\{ y \in \mathbb {R} | F(y) \ge \alpha \right\} \). As pointed out in Koenker and Bassett (1978), quantiles may also be seen as a solution of the following minimization problem:

$$\begin{aligned} q(\alpha ) = \underset{t \in \mathbb {R}}{\arg \min } \text { } \mathbb {E} \left[ \rho _{\alpha }^{(1)}(Y-t)-\rho _{\alpha }^{(1)}(Y) \right] , \end{aligned}$$
(1)

where \(\rho _{\alpha }^{(1)}(y)=|\alpha -\mathbbm {1}_{\{ y \le 0 \}}| |y|\) is the quantile check function. However, the quantile is not subadditive in general and so is not a coherent risk measure in the sense of Artzner et al. (1999). An alternative risk measure gaining popularity is the expectile, introduced in Newey and Powell (1987). This is the solution of (1), with the new loss function \(\rho _{\alpha }^{(2)}(y)=|\alpha -\mathbbm {1}_{\{ y \le 0 \}}| y^2\) in place of \(\rho _{\alpha }^{(1)}\). Expectiles larger than the mean are coherent risk measures, and have started to be used in actuarial and financial practice (see for instance Cai and Weng 2016). A pioneering paper for the estimation of extreme expectiles in heavy-tailed settings is Daouia et al. (2018).

Quantiles and expectiles may be generalized by considering the family of \(L^p\)-quantiles. Introduced in Chen (1996), this class of risk measures is defined, for all \(p \ge 1\), by

$$\begin{aligned} q^{(p)}(\alpha )= \underset{t \in \mathbb {R}}{\arg \min } \text { } \mathbb {E} \left[ \rho _{\alpha }^{(p)}(Y-t)-\rho _{\alpha }^{(p)}(Y) \right] , \end{aligned}$$
(2)

where \(\rho _{\alpha }^{(p)}(y)=|\alpha -\mathbbm {1}_{\{ y \le 0 \}}| |y|^p\) is the \(L^p\)-quantile loss function; the case \(p=1\) leads to the quantile and \(p=2\) gives the expectile. Note that, for \(p>1\), using the formulation (2) and through the subtraction of the (at first sight unimportant) term \(\rho _{\alpha }^{(p)}(Y)\), it is a straightforward consequence of the mean value theorem applied to the function \(\rho _{\alpha }^{(p)}\) that the \(L^p\)-quantile \(q^{(p)}(\alpha )\) is well defined as soon as \(\mathbb {E}(|Y|^{p-1})<\infty \). While the expectile is the only coherent \(L^p\)-quantile (see Bellini et al. 2014), Daouia et al. (2019) showed that for extreme levels of quantiles or expectiles (\(\alpha \rightarrow 1\)), it may be better to estimate \(L^p\)-quantiles first (where typically p is between 1 and 2) and exploit an asymptotic proportionality relationship to estimate quantiles or expectiles. An overview of the potential applications of this kind of statistical assessment of extreme risk may for instance be found in Embrechts et al. (1997).

The contribution of this work is to propose a methodology to estimate extreme \(L^p\)-quantiles of \(Y|\mathbf {X}=\mathbf {x}\), where the random covariate vector \(\mathbf {X} \in \mathbb {R}^d\) is recorded alongside Y. In this context, the case \(p=1\) (quantile) has been considered in Daouia et al. (2011) and Daouia et al. (2013), and the case \(p=2\) (expectile) has recently been studied in Girard et al. (2021). For the general case \(p \ge 1\), only Usseglio-Carleve (2018) proposes an estimation procedure under the strong assumption that the vector \((\mathbf {X},Y)\) is elliptically distributed. The present paper avoids this modeling assumption by constructing a kernel estimator.

The paper is organized as follows. Section 2 introduces an estimator of conditional \(L^p\)-quantiles. Section 3 gives the asymptotic properties of the estimator previously introduced, at extreme levels. Finally, Sect. 4 proposes a simulation study in order to assess the accuracy of our estimator which is then showcased on a real insurance data set in Sect. 5. Proofs are postponed to the Appendix.

2 \(L^p\)-quantile Kernel Regression

Let \((\mathbf {X}_i,Y_i)\), \(i=1,...,n\) be independent realizations of a random vector \((\mathbf {X},Y) \in \mathbb {R}^d \times \mathbb {R}\). For the sake of simplicity we assume that \(Y\ge 0\) with probability 1. We denote by g the density function of \(\mathbf {X}\) and let, in the sequel, \(\mathbf {x}\) be a fixed point in \(\mathbb {R}^d\) such that \(g(\mathbf {x})>0\). We denote by \(\bar{F}^{(1)}(y|\mathbf {x})= \mathbb {P} \left( Y>y |\mathbf {X}=\mathbf {x} \right) \) the conditional survival function of Y given \(\mathbf {X}=\mathbf {x}\) and assume that this survival function is continuous and regularly varying with index \(-1/\gamma (\mathbf {x})\):

$$\begin{aligned} \forall t>0, \ \underset{y \rightarrow \infty }{\lim } \text { } \frac{\bar{F}^{(1)}(ty|\mathbf {x})}{\bar{F}^{(1)}(y|\mathbf {x})}= t^{-1/\gamma (\mathbf {x})}. \end{aligned}$$
(3)

Such a distribution belongs to the Fréchet maximum domain of attraction (de Haan and Ferreira 2006). Note that for any \(k<1/\gamma (\mathbf {x})\), \(\mathbb {E}\left[ Y^k |\mathbf {X}=\mathbf {x} \right] < \infty \). Since the definition of \(L^p\)-quantiles in (2) requires \(\mathbb {E}\left[ |Y|^{p-1} |\mathbf {X}=\mathbf {x} \right] < \infty \), our minimal assumption will be that \(p-1<1/\gamma (\mathbf {x})\). From Eq. (2), \(L^p\)-quantiles of level \(\alpha \in (0,1)\) of Y given \(\mathbf {X}=\mathbf {x}\) may also be seen as the solution of the following equation:

$$ \frac{\mathbb {E}\left[ \left| Y-y \right| ^{p-1} \mathbbm {1}_{\{ Y>y \}} | \mathbf {X}=\mathbf {x} \right] }{\mathbb {E}\left[ \left| Y-y \right| ^{p-1} | \mathbf {X}=\mathbf {x} \right] }=1-\alpha . $$

In other terms, as noticed in Jones (1994), (conditional) \(L^p\)-quantiles can be equivalently defined as quantiles

$$ q^{(p)}(\alpha |\mathbf {x})= \inf \left\{ y \in \mathbb {R} \, | \, \bar{F}^{(p)}(y|\mathbf {x}) \le 1-\alpha \right\} $$

of the distribution associated with the survival function

$$ \bar{F}^{(p)}(y|\mathbf {x})= \frac{\varphi ^{(p-1)}(y|\mathbf {x})}{m^{(p-1)}(y|\mathbf {x})}, $$

where, for all \(k \ge 0,\)

$$\begin{aligned} m^{(k)}(y|\mathbf {x})= & {} \mathbb {E}\left[ \left| Y-y \right| ^k | \mathbf {X}=\mathbf {x} \right] g(\mathbf {x}) \\ \text {and } \varphi ^{(k)}(y|\mathbf {x})= & {} \mathbb {E}\left[ \left| Y-y \right| ^k \mathbbm {1}_{\{ Y>y \}} | \mathbf {X}=\mathbf {x} \right] g(\mathbf {x}). \end{aligned}$$

Obviously, if \(p=1\), we get the survival function introduced above. The case \(p=2\) leads to the function introduced in Jones (1994) and used in Girard et al. (2021). To estimate \(\bar{F}^{(p)}(y|\mathbf {x})\), we let K be a probability density function on \(\mathbb {R}^d\) and we introduce the kernel estimators

$$ \hat{m}_n^{(k)}(y|\mathbf {x})= \frac{\sum \limits _{i=1}^n \left| Y_i-y \right| ^k K \left( \frac{\mathbf {x}-\mathbf {X}_i}{h_n} \right) }{n h_n^d} \text {, } \hat{\varphi }_n^{(k)}(y|\mathbf {x})= \frac{\sum \limits _{i=1}^n \left| Y_i-y \right| ^k K \left( \frac{\mathbf {x}-\mathbf {X}_i}{h_n} \right) \mathbbm {1}_{\{ Y_i>y \}}}{n h_n^d}. $$

Note that \(\hat{m}_n^{(0)}(0|\mathbf {x})\) is the kernel density estimator of \(g(\mathbf {x})\), and \(\hat{m}_n^{(1)}(0|\mathbf {x})/\hat{m}_n^{(0)}(0|\mathbf {x})\) is the standard kernel regression estimator (since the \(Y_i\) are nonnegative). The kernel estimators of \(\bar{F}^{(p)}(y|\mathbf {x})\) and \(q^{(p)}(\alpha |\mathbf {x})\) are then easily deduced:

$$\begin{aligned} \hat{\bar{F}}_n^{(p)}(y|\mathbf {x})= \frac{\hat{\varphi }_n^{(p-1)}(y|\mathbf {x})}{\hat{m}_n^{(p-1)}(y|\mathbf {x})} \text {, } \hat{q}_n^{(p)}(\alpha |\mathbf {x})= \inf \left\{ y \in \mathbb {R} \, | \, \hat{\bar{F}}_n^{(p)}(y|\mathbf {x}) \le 1-\alpha \right\} . \end{aligned}$$
(4)

The case \(p=1\) gives the kernel quantile estimator introduced in Daouia et al. (2013), while \(p=2\) leads to the conditional expectile estimator of Girard et al. (2021). We study here the asymptotic properties of \(\hat{q}_n^{(p)}(\alpha |\mathbf {x})\) for an arbitrary \(p\ge 1\), when \(\alpha =\alpha _n\rightarrow 1\).

3 Main Results

We first make a standard assumption on the kernel. We fix a norm \(||\cdot ||\) on \(\mathbb {R}^d\).

\((\mathcal {K})\) The density function K is bounded and its support S is contained in the unit ball.

To be able to analyze extreme conditional \(L^p\)-quantiles in a reasonably simple way, we make a standard second-order regular variation assumption (for a survey of those conditions, see Sect. 2 in de Haan and Ferreira (2006)).

\(\mathcal {C}_2 \left( \gamma (\mathbf {x}), \rho (\mathbf {x}), A(.|\mathbf {x}) \right) \) There exist \(\gamma (\mathbf {x})>0\), \(\rho (\mathbf {x}) \le 0\) and a positive or negative function \(A(\cdot |\mathbf {x})\) converging to 0 such that

$$ \forall t>0, \ \lim _{y \rightarrow \infty } \frac{1}{A(y|\mathbf {x})} \left( \frac{ {q^{(1)}(1-1/(ty)|\mathbf {x})} }{ {q^{(1)}(1-1/y|\mathbf {x})}}-t^{\gamma (\mathbf {x})} \right) = {\left\{ \begin{array}{ll} t^{\gamma (\mathbf {x})} \dfrac{t^{\rho (\mathbf {x})}-1}{\rho (\mathbf {x})} &{} \text{ if } \rho (\mathbf {x})<0,\\ t^{\gamma (\mathbf {x})} \log (t) &{} \text{ if } \rho (\mathbf {x})=0. \end{array}\right. } $$

Our last assumption is a local Lipschitz condition which may be found for instance in Daouia et al. (2013); El Methni et al. (2014). We denote by \(B(\mathbf {x},r)\) the ball with center \(\mathbf {x}\) and radius r.

\((\mathcal {L})\) We have \(g(\mathbf {x})>0\) and there exist \(c, r>0\) such that

$$ \forall \mathbf {x}' \in B(\mathbf {x},r), \ |g(\mathbf {x})-g(\mathbf {x}')| \le c || \mathbf {x}-\mathbf {x}' ||. $$

To be able to control the local oscillations of \((\mathbf {x},y)\mapsto \bar{F}^{(1)}(y|\mathbf {x})\), we let, for any nonnegative \(y_n \rightarrow \infty \),

$$\begin{aligned} \omega _{h_n}^{(1)}(y_n|\mathbf {x})= & {} \sup _{\mathbf {x}' \in B(\mathbf {x},h_n)} \ \sup _{z \ge y_n} \frac{1}{\log (z)} \left| \log \left( \frac{\bar{F}^{(1)}(z|\mathbf {x}')}{\bar{F}^{(1)}(z|\mathbf {x})} \right) \right| , \\ \omega _{h_n}^{(2)}(y_n|\mathbf {x})= & {} \sup _{\mathbf {x}' \in B(\mathbf {x},h_n)} \ \sup _{0<y\le y_n} |\bar{F}^{(1)}(y|\mathbf {x}') - \bar{F}^{(1)}(y|\mathbf {x})|, \\ \text{ and } \omega _{h_n}^{(3)}(y_n|\mathbf {x})= & {} \sup _{\mathbf {x}' \in B(\mathbf {x},h_n)} \ \sup _{\lambda \ge 1} \ \sup _{b_n,b'_n\rightarrow 0} \left| \frac{\bar{F}^{(1)}(\lambda y_n (1+b_n)|\mathbf {x}')}{\bar{F}^{(1)}(\lambda y_n (1+b'_n)|\mathbf {x}')} - 1 \right| . \end{aligned}$$

The quantity \(\omega _{h_n}^{(1)}(y_n|\mathbf {x})\), discussed for instance in Girard et al. (2021), controls the oscillation of the conditional survival function with respect to \(\mathbf {x}\) in its right tail, while \(\omega _{h_n}^{(2)}(y_n|\mathbf {x})\) and \(\omega _{h_n}^{(3)}(y_n|\mathbf {x})\) are introduced to be able to deal with the case \(p\notin \{1,2\}\) specifically. Let us highlight that \(\omega _{h_n}^{(3)}(y_n|\mathbf {x})\) is again geared toward controlling an oscillation of the right tail of the conditional distribution; however, \(\omega _{h_n}^{(2)}(y_n|\mathbf {x})\) focuses on the oscillation of the center of the conditional distribution with respect to \(\mathbf {x}\). For \(p>1\), the introduction of a quantity such as \(\omega _{h_n}^{(2)}(y_n|\mathbf {x})\) is in some sense natural, since we will have to deal with the local oscillation of the conditional moment \(m^{(p-1)}(y|\mathbf {x})\), appearing in the denominator of \(\bar{F}^{(p)}(y|\mathbf {x})\), and this conditional moment indeed depends on the whole of the conditional distribution rather than merely on its right tail. Typically \(\omega _{h_n}^{(1)}(y_n|\mathbf {x})=O(h_n)\), \(\omega _{h_n}^{(2)}(y_n|\mathbf {x})=O(h_n)\) and \(\omega _{h_n}^{(3)}(y_n|\mathbf {x})=o(1)\) under reasonable assumptions; we give examples below.

Remark 1

Assume that \(Y|\mathbf {X}=\mathbf {x}\) has a Pareto distribution with tail index \(\gamma (\mathbf {x})>0\):

$$ \forall y\ge 1, \ \bar{F}^{(1)}(y|\mathbf {x}) = y^{-1/\gamma (\mathbf {x})}. $$

If \(\gamma \) is locally Lipschitz continuous, we clearly have \(\omega _{h_n}^{(1)}(y_n|\mathbf {x})=O(h_n)\). Furthermore, for any \(y\ge 1\), the mean value theorem yields

$$ |\bar{F}^{(1)}(y|\mathbf {x}') - \bar{F}^{(1)}(y|\mathbf {x})| \le \left| \frac{1}{\gamma (\mathbf {x}')} - \frac{1}{\gamma (\mathbf {x})} \right| \times y^{-1/[\gamma (\mathbf {x}) \vee \gamma (\mathbf {x}')]} \log y. $$

(Here and below \(\vee \) denotes the maximum operator.) Under this same local Lipschitz assumption, one then finds \(\omega _{h_n}^{(2)}(y_n|\mathbf {x})=O(h_n)\) as well. Finally, for any \(y,y'>1\),

$$ \left| \frac{\bar{F}^{(1)}(y'|\mathbf {x}')}{\bar{F}^{(1)}(y|\mathbf {x}')} - 1 \right| = \left| \left( \frac{y}{y'} \right) ^{1/\gamma (\mathbf {x}')} - 1 \right| \le \frac{|y-y'|}{y'} \times \frac{1+(y/y')^{1/\gamma (\mathbf {x}')-1}}{\gamma (\mathbf {x}')} $$

by the mean value theorem again. This inequality yields \(\omega _{h_n}^{(3)}(y_n|\mathbf {x}) =o(1)\).

The same arguments, and asymptotic bounds on \(\omega _{h_n}^{(1)}(y_n|\mathbf {x})\), \(\omega _{h_n}^{(2)}(y_n|\mathbf {x})\) and \(\omega _{h_n}^{(3)}(y_n|\mathbf {x})\), apply to the conditional Fréchet model

$$ \forall y>0, \ \bar{F}^{(1)}(y|\mathbf {x}) = 1-\exp (-y^{-1/\gamma (\mathbf {x})}). $$

Analogous results are easily obtained for the conditional Burr model

$$ \forall y>0, \ \bar{F}^{(1)}(y|\mathbf {x}) = (1+y^{-\rho (\mathbf {x})/\gamma (\mathbf {x})})^{1/\rho (\mathbf {x})} $$

when \(\rho <0\) is assumed to be locally Lipschitz continuous, and the conditional mixture Pareto model

$$ \forall y\ge 1, \ \bar{F}^{(1)}(y|\mathbf {x}) = y^{-1/\gamma (\mathbf {x})} \left[ c(\mathbf {x}) + (1-c(\mathbf {x})) y^{\rho (\mathbf {x})/\gamma (\mathbf {x})} \right] $$

when \(\rho <0\) and \(c\in (0,1)\) are assumed to be locally Lipschitz continuous.   \(\square \)

3.1 Intermediate \(L^p\)-quantile Regression

In this paragraph, we assume that \(\sigma _n^{-2}=n h_n^d (1-\alpha _n) \rightarrow \infty \). Such an assumption means that the \(L^p\)-quantile level \(\alpha _n\) tends to 1 slowly (by extreme value standards), hence the denominations intermediate sequence and intermediate \(L^p-\)quantiles. This assumption is widespread in the literature of risk measure regression: see, among others, Daouia et al. (2013, 2011); El Methni et al. (2014); Girard et al. (2021). Throughout, we let \(|| K ||_2^2= \int _S K(\mathbf {u})^2 d\mathbf {u}\) be the squared \(L^2-\)norm of K, \(\Psi (\cdot )\) denote the digamma function and \(IB(t,x,y)=\int _0^t u^{x-1}(1-u)^{y-1}du\) be the incomplete Beta function. Note that \(IB(1,x,y)=B(x,y)\) is the standard Beta function.

We now give our first result on the joint asymptotic normality of a finite number J of empirical conditional quantiles with an empirical conditional \(L^p\)-quantile (\(p>1\)).

Theorem 1

Assume that \((\mathcal {K})\), \((\mathcal {L})\) and \(\mathcal {C}_2 \left( \gamma (\mathbf {x}), \rho (\mathbf {x}), A(.|\mathbf {x}) \right) \) hold. Let \(\alpha _n \rightarrow 1\), \(h_n \rightarrow 0\) and \(a_n=1-\tau (1-\alpha _n) (1+o(1))\), where \(\tau >0\). Assume further that \(\sigma _n^{-2}=n h_n^d (1-\alpha _n) \rightarrow \infty \), \(n h_n^{d+2} (1-\alpha _n) \rightarrow 0\), \(\sigma _n^{-1} A \left( (1-\alpha _n)^{-1}|\mathbf {x} \right) =O(1)\), \(\omega _{h_n}^{(3)}( q^{(1)}(\alpha _n|\mathbf {x})|\mathbf {x}) \rightarrow 0\) and there exists \(\delta \in (0,1)\) such that

$$\begin{aligned} \sigma _n^{-1} \omega _{h_n}^{(1)}((1-\delta ) (\theta \wedge 1) q^{(1)}(\alpha _n|\mathbf {x})|\mathbf {x}) \log (1-\alpha _n) \rightarrow 0, \end{aligned}$$
(5)

where \(\theta =\left( \tau \gamma (\mathbf {x})/B\left( p,\gamma (\mathbf {x})^{-1}-p+1 \right) \right) ^{-\gamma (\mathbf {x})}\). Let further \(\alpha _{n,j}=1-\tau _j(1-\alpha _n)\), for \(0<\tau _1<\tau _2<\, ... \,<\tau _J \le 1\) such that

$$\begin{aligned} \sigma _n^{-1} \omega _{h_n}^{(2)}((1+\delta ) (\theta \vee \tau _1^{-\gamma (\mathbf {x}))}) q^{(1)}(\alpha _n|\mathbf {x})|\mathbf {x}) \rightarrow 0. \end{aligned}$$
(6)

Then, for all \(p \in (1,\gamma (\mathbf {x})^{-1}/2+1)\), one has

$$\begin{aligned} \sigma _n^{-1} \left\{ \left( \frac{\hat{q}_n^{(1)}(\alpha _{n,j}|\mathbf {x})}{q^{(1)}(\alpha _{n,j}|\mathbf {x})}-1 \right) _{1 \le j \le J}, \left( \frac{\hat{q}_n^{(p)}(a_{n}|\mathbf {x})}{q^{(p)}(a_{n}|\mathbf {x})}-1 \right) \right\} \overset{d}{ \longrightarrow } \mathcal {N} \left( \mathbf {0}_{J+1}, \frac{|| K ||_2^2}{g(\mathbf {x})} \gamma (\mathbf {x})^2 {\boldsymbol{\Sigma }}(\mathbf {x}) \right) , \end{aligned}$$
(7)

where \({\boldsymbol{\Sigma }}(\mathbf {x})\) is the symmetric matrix having entries

$$\begin{aligned} \left\{ \begin{array}{cc} \Sigma _{j,\ell }(\mathbf {x})=&{} \left( \tau _j \vee \tau _{\ell } \right) ^{-1} \\ \Sigma _{j,J+1}(\mathbf {x})=&{} \tau _j^{-1} \left[ \gamma (\mathbf {x}) \frac{(p-1) IB\left( \left( 1 \vee \frac{\tau _j^{-\gamma (\mathbf {x})}}{\theta } \right) ^{-1} ,\gamma (\mathbf {x})^{-1}-p+1,p-1 \right) }{B \left( p,\gamma (\mathbf {x})^{-1}-p+1 \right) } +\left( \left( 1 \vee \frac{\tau _j^{-\gamma (\mathbf {x})}}{\theta } \right) -1 \right) ^{p-1} \right] \\ \Sigma _{J+1,J+1}(\mathbf {x})=&{} \frac{B \left( 2p-1,\gamma (\mathbf {x})^{-1}-2p+2 \right) }{\tau B \left( p,\gamma (\mathbf {x})^{-1}-p+1 \right) } \\ \end{array} \right. . \end{aligned}$$
(8)

Theorem 1, which will be useful to introduce estimators of the tail index \(\gamma (\mathbf {x})\) as part of our extrapolation methodology, generalizes and adapts to the conditional setup several results already found in the literature: see Theorem 1 in Daouia et al. (2013), Theorem 1 in Daouia et al. (2019) and Theorem 3 in Daouia et al. (2020b). Note however that, although they are in some sense related, Theorem 1 does not imply Theorem 1 of Girard et al. (2021), because the latter is stated under weaker regularity conditions warranted by the specific context \(p=2\) of extreme conditional expectile estimation. On the technical side, assumptions (5) and (6) ensure that the bias introduced by smoothing in the \(\mathbf {x}\) direction is negligible compared to the standard deviation \(\sigma _n\) of the estimator. The aim of the next paragraph is now to extrapolate our intermediate estimators to properly extreme levels.

3.2 Extreme \(L^p\)-quantile Regression

We consider here a level \(\beta _n \rightarrow 1\) such that \(n h_n^d (1-\beta _n) \rightarrow c <\infty \). The estimators previously introduced no longer work at such an extreme level. In order to overcome this problem, we first recall a result of Daouia et al. (2019) (see also Lemma 5 below)

$$\begin{aligned} \forall p\ge 1, \ \underset{\alpha \rightarrow 1}{\lim } \text { } \frac{q^{(p)}(\alpha |\mathbf {x})}{q^{(1)}(\alpha |\mathbf {x})} = \left( \frac{\gamma (\mathbf {x})}{B \left( p,\gamma (\mathbf {x})^{-1}-p+1 \right) } \right) ^{-\gamma (\mathbf {x})}. \end{aligned}$$
(9)

In the sequel, we shall use the notation \(g_p(\gamma )=\gamma /B \left( p,\gamma ^{-1}-p+1 \right) \). A first consequence of this result is that the \(L^p\)-quantile function is regularly varying, i.e.,

$$\begin{aligned} \forall t>0, \ \underset{y \rightarrow \infty }{\lim } \text { } \frac{q^{(p)}(1-1/(ty)|\mathbf {x})}{q^{(p)}(1-1/y|\mathbf {x})}=t^{\gamma (\mathbf {x})}. \end{aligned}$$
(10)

This suggests then that, by considering an intermediate sequence \((\alpha _n)\), our conditional extreme \(L^p\)-quantile may be approximated (and estimated) as follows:

$$\begin{aligned} \nonumber q^{(p)}( \beta _n |\mathbf {x} )\approx & {} \left( \frac{1-\alpha _n}{1-\beta _n} \right) ^{\gamma (\mathbf {x})} q^{(p)}( \alpha _n |\mathbf {x}), \\ \text{ estimated } \text{ by } \ \tilde{q}^{(p)}_{n,\alpha _n}( \beta _n |\mathbf {x} )= & {} \left( \frac{1-\alpha _n}{1-\beta _n} \right) ^{\hat{\gamma }_{\alpha _n}(\mathbf {x})} \hat{q}^{(p)}_n( \alpha _n |\mathbf {x}). \end{aligned}$$

Here, \(\hat{q}_n^{(p)}( \alpha _n |\mathbf {x})\) is the kernel estimator introduced in Eq. (4), and \(\hat{\gamma }_{\alpha _n}(\mathbf {x})\) is a consistent estimator of the conditional tail index \(\gamma (\mathbf {x})\). This is a class of Weissman-type estimators (see Weissman 1978) of which we give the asymptotic properties.

Theorem 2

Assume that \((\mathcal {K})\), \((\mathcal {L})\) and \(\mathcal {C}_2(\gamma (\mathbf {x}),\rho (\mathbf {x}),A(\cdot |\mathbf {x}))\) hold with \(\rho (\mathbf {x})<0\). Let \(\alpha _n, \beta _n\rightarrow 1\), \(h_n\rightarrow 0\) be such that \( \sigma _n^{-2}=n h_n^d (1-\alpha _n) \rightarrow \infty \) and \(nh_n^d (1-\beta _n) \rightarrow c<\infty \). Assume further that \(n h_n^{d+2} (1-\alpha _n) \rightarrow 0\), \(\omega _{h_n}^{(3)}( q^{(1)}(\alpha _n|\mathbf {x})|\mathbf {x}) \rightarrow 0\) and

  1. (i)

    \(\sigma _n^{-1} A \left( (1-\alpha _n)^{-1}|\mathbf {x} \right) =O(1)\), \(\sigma _n^{-1} (1-\alpha _n)=O(1)\) and

    \(\sigma _n^{-1} \mathbb {E} \left[ Y \mathbbm {1}_{\{ 0< Y < q^{(1)}(\alpha _n|\mathbf {x}) \}} | \mathbf {x} \right] q^{(1)}(\alpha _n|\mathbf {x})^{-1}=O(1)\),

  2. (ii)

    For some \(\delta \in (0,1)\), \(\sigma _n^{-1} \omega _{h_n}^{(1)}((1-\delta ) [g_p(\gamma (\mathbf {x}))]^{-\gamma (\mathbf {x})} q^{(1)}(\alpha _n|\mathbf {x})|\mathbf {x}) \log (1-\alpha _n) \rightarrow 0\) and \(\sigma _n^{-1} \omega _{h_n}^{(2)}((1+\delta ) q^{(1)}(\alpha _n|\mathbf {x})|\mathbf {x}) \rightarrow 0\),

  3. (iii)

    \(\sigma _n^{-1}/ \log \left( (1-\alpha _n)/(1-\beta _n) \right) \rightarrow \infty \).

Take \(p \in (1, \gamma (\mathbf {x})^{-1}/2+1)\). If in addition \( \sigma _n^{-1}( \hat{\gamma }_{\alpha _n}(\mathbf {x})-\gamma (\mathbf {x})) {\mathop { \longrightarrow }\limits ^{d}}\Gamma , \) then

$$ \frac{\sigma _n^{-1}}{\log ( (1-\alpha _n)/(1-\beta _n) )} \left( \frac{\tilde{q}^{(p)}_{n,\alpha _n}( \beta _n |\mathbf {x} )}{q^{(p)} ( \beta _n |\mathbf {x} )}-1 \right) {\mathop { \longrightarrow }\limits ^{d}}\Gamma . $$

We notice, as is classical in the analysis of heavy tails, that the asymptotic distribution of the extrapolated estimator \(\tilde{q}^{(p)}_{n,\alpha _n}( \beta _n |\mathbf {x} )\) is exactly that of the purely empirical estimator \(\hat{\gamma }_{\alpha _n}(\mathbf {x})\) with a slightly slower rate of convergence. Technically speaking, assumption (i) controls the bias due to the asymptotic approximation (9), while assumption (ii) is used to deal with the bias due to smoothing.

Our aim is now to propose some estimators of \(\gamma (\mathbf {x})\) solely based on intermediate \(L^p\)-quantiles, in order to carry out the extrapolation step.

3.3 \(L^p\)-quantile-Based Estimation of the Conditional Tail Index

The aim of this paragraph is to discuss the estimation of the conditional tail index \(\gamma (\mathbf {x})\). A local Pickands estimator is studied in Daouia et al. (2013, 2011). This estimator however has a large variance, which is why Daouia et al. (2011) propose a simplified, conditional, and local version of the Hill estimator:

$$\begin{aligned} \hat{\gamma }_{\alpha _n}^{(H)}(\mathbf {x})= \frac{1}{\log (J!)}\sum \limits _{j=1}^J \log \left( \hat{q}_n \left( \frac{j-1+\alpha _n}{j} |\mathbf {x} \right) /\hat{q}_n \left( \alpha _n |\mathbf {x} \right) \right) . \end{aligned}$$
(11)

They also mentioned that taking \(J=9\) is an optimal choice, and leads to an asymptotic variance close to \(1.25 || K ||_2^2 \gamma (\mathbf {x})^2/g(\mathbf {x})\). Recently, Daouia et al. (2020a); Girard et al. (2021) have shown that replacing the quantile by the expectile in tail index estimators can lead to a significant variance reduction. Our idea here is to propose an estimator based on \(L^p\)-quantiles rather than quantiles. In this context, we propose to follow the approach of Girard et al. (2019) and exploit the asymptotic relationship (9) by introducing the following estimator, valid for all \(1<p<\gamma (\mathbf {x})^{-1}+1\):

$$\begin{aligned} \hat{\gamma }_{\alpha _n}^{(p)}(\mathbf {x})= \inf \left\{ \gamma >0 : g_p(\gamma ) \le \frac{\hat{\bar{F}}_n^{(1)} \left( \hat{q}_n^{(p)}(\alpha _n|\mathbf {x}) | \mathbf {x} \right) }{1-\alpha _n} \right\} . \end{aligned}$$
(12)

This class of estimators is introduced in Girard et al. (2019) in an unconditional setting, and the (explicit) estimator \(\hat{\gamma }_{\alpha _n}^{(2)}(\mathbf {x})\) is introduced in Girard et al. (2021). Using the results previously obtained, we can give the asymptotic distribution of \(\hat{\gamma }_{\alpha _n}^{(p)}(\mathbf {x})\) for all \(1<p<\gamma (\mathbf {x})^{-1}/2+1\).

Theorem 3

Assume that \((\mathcal {K})\), \((\mathcal {L})\) and \(\mathcal {C}_2(\gamma (\mathbf {x}),\rho (\mathbf {x}),A(\cdot |\mathbf {x}))\) hold with \(\gamma (\mathbf {x})<1\). Let \(\alpha _n \rightarrow 1\) and \(h_n \rightarrow 0\). Assume further that \(\sigma _n^{-2} = n h_n^d (1-\alpha _n) \rightarrow \infty \), \(n h_n^{d+2} (1-\alpha _n) \rightarrow 0\), \(\omega _{h_n}^{(3)}( q^{(1)}(\alpha _n|\mathbf {x}) |\mathbf {x}) \rightarrow 0\) and

  1. (i)

    \(\sigma _n^{-1} A \left( (1-\alpha _n)^{-1} |\mathbf {x} \right) \rightarrow 0 \),

  2. (ii)

    \(\sigma _n^{-1} q^{(1)}(\alpha _n |\mathbf {x})^{-1} \rightarrow \lambda \in \mathbb {R} \),

  3. (iii)

    For some \(\delta \in (0,1)\), \(\sigma _n^{-1} \omega _{h_n}^{(1)}((1-\delta ) \left( g_p(\gamma (\mathbf {x}))^{-\gamma (\mathbf {x})} q^{(1)}(\alpha _n|\mathbf {x}) \right) |\mathbf {x}) \log (1-\alpha _n) \rightarrow 0\) and \(\sigma _n^{-1} \omega _{h_n}^{(2)}((1+\delta ) \left( q^{(1)}(\alpha _n|\mathbf {x}) \right) |\mathbf {x}) \rightarrow 0\).

Then, for all \(p \in (1,\gamma (\mathbf {x})^{-1}/2+1)\), one has

$$\begin{aligned} \sigma _n^{-1} \left( \hat{\gamma }_{\alpha _n}^{(p)}(\mathbf {x})- \gamma (\mathbf {x}), \frac{\hat{q}^{(p)}_n( \alpha _n |\mathbf {x} )}{q^{(p)} ( \alpha _n |\mathbf {x} )}-1 \right) \overset{d}{\longrightarrow } {\boldsymbol{\Theta }}, \end{aligned}$$
(13)

where \({\boldsymbol{\Theta }}\) is a bivariate Gaussian distribution with mean vector \(\left( b_p(\mathbf {x}),0 \right) \) and covariance matrix \(|| K ||_2^2 \gamma (\mathbf {x})^2 g(\mathbf {x})^{-1} {\boldsymbol{\Omega }}(\mathbf {x})\) such that

$$\begin{aligned} \left\{ \begin{array}{ll} b_p(\mathbf {x}) =&{} \frac{(1-p) \gamma (\mathbf {x}) g_p(\gamma (\mathbf {x}))^{\gamma (\mathbf {x})} \mathbb {E}[Y|\mathbf {X}=\mathbf {x}] }{1-\frac{1}{\gamma (\mathbf {x})} \left( \Psi \left( \gamma (\mathbf {x})^{-1}+1 \right) -\Psi \left( \gamma (\mathbf {x})^{-1}-p+1 \right) \right) } \lambda \\ \Omega _{11}(\mathbf {x}) =&{} \frac{B \left( p, \gamma (\mathbf {x})^{-1}-p+1 \right) }{\left( 1- \frac{1}{\gamma (\mathbf {x})} \left( \Psi \left( \gamma (\mathbf {x})^{-1}+1 \right) -\Psi \left( \gamma (\mathbf {x})^{-1}-p+1 \right) \right) \right) ^2} \left( \frac{B \left( 2p-1,\gamma (\mathbf {x})^{-1}-2p+2 \right) }{B \left( p,\gamma (\mathbf {x})^{-1}-p+1 \right) ^2}-\frac{1}{\gamma (\mathbf {x})} \right) \\ \Omega _{12}(\mathbf {x}) =&{} \frac{B \left( p, \gamma (\mathbf {x})^{-1}-p+1 \right) }{1-\frac{1}{\gamma (\mathbf {x})} \left( \Psi \left( \gamma (\mathbf {x})^{-1}+1 \right) -\Psi \left( \gamma (\mathbf {x})^{-1}-p+1 \right) \right) } \left( \frac{1}{\gamma (\mathbf {x})}-\frac{B \left( 2p-1,\gamma (\mathbf {x})^{-1}-2p+2 \right) }{B \left( p,\gamma (\mathbf {x})^{-1}-p+1 \right) ^2} \right) \\ \Omega _{22}(\mathbf {x}) =&{} \frac{B \left( 2p-1,\gamma (\mathbf {x})^{-1}-2p+2 \right) }{B \left( p,\gamma (\mathbf {x})^{-1}-p+1 \right) } \end{array} \right. . \end{aligned}$$
(14)

Let us remark here that although Theorem 3 can be seen as a version of Theorem 4 of Girard et al. (2021), the latter is stated under weaker regularity assumptions and applies to further examples of estimators developed specifically in the conditional expectile setup.

Note that condition \(\gamma (\mathbf {x})<1\) entails \(\mathbb {E}[Y|\mathbf {X}=\mathbf {x}]<\infty \) and leads to a simple expression of the bias term \(b_p(\mathbf {x})\). A result dropping this assumption is available in the unconditional setting in Girard et al. (2019); here, our motivation for this condition is that we shall use extreme regression \(L^p\)-quantiles as a way to estimate extreme regression expectiles, for the existence of which a natural condition is that \(\mathbb {E}[|Y||\mathbf {X}=\mathbf {x}]<\infty \). The bias term \(b_p(\mathbf {x})\) is related to \(\gamma (\mathbf {x})\), \(q^{(1)}(\alpha _n|\mathbf {x})\) and \(\mathbb {E}[Y|\mathbf {X}=\mathbf {x}]\). All these quantities may be easily estimated (the latter two by kernel regression estimators) to construct a bias-reduced conditional tail index estimator as follows:

$$ \tilde{\gamma }_{\alpha _n}^{(p)}(\mathbf {x})= \hat{\gamma }_{\alpha _n}^{(p)}(\mathbf {x}) \left( 1+ \frac{(p-1) \left( \frac{\sum \limits _{i=1}^n Y_i K\left( \frac{\mathbf {x}-\mathbf {X}_i}{h_n} \right) }{\sum \limits _{i=1}^n K\left( \frac{\mathbf {x}-\mathbf {X}_i}{h_n} \right) } \right) \hat{q}_n^{(p)}(\alpha _n|\mathbf {x})^{-1}}{1+\frac{1}{\hat{\gamma }_{\alpha _n}^{(p)}(\mathbf {x})} \left( \Psi \left( 1/\hat{\gamma }_{\alpha _n}^{(p)}(\mathbf {x})-p+1 \right) -\Psi \left( 1/\hat{\gamma }_{\alpha _n}^{(p)}(\mathbf {x})+1 \right) \right) } \right) . $$

Under the conditions of Theorem 3, it is clear that \( \sigma _n^{-1} ( \tilde{\gamma }_{\alpha _n}^{(p)}(\mathbf {x})- \gamma (\mathbf {x}) ) \overset{d}{\longrightarrow } \mathcal {N} \left( 0, \Omega _{11}(\mathbf {x}) \right) \) where \(\Omega _{11}(\mathbf {x})\) is given in Eq. (14). This bias reduction improves significantly the numerical results, and is used in the finite-sample study below.

Even though \(L^p\)-quantiles with \(1<p<2\) are more widely estimable than expectiles and take into account the whole tail information, they are neither easy to interpret nor coherent as risk measures. Recent work in Daouia et al. (2019) has shown that extreme \(L^p\)-quantiles can be used as vehicles for extreme quantile and expectile estimation; see also Gardes et al. (2020) for an analogous study of the estimation of (a compromise between) Median Shortfall and Conditional Tail Expectation at extreme levels, using tail \(L^p-\)medians. Our focus in the following finite-sample study is to analyze the potential of extreme regression \(L^p\)-quantiles for the estimation of extreme regression quantiles and expectiles.

4 Simulation Study

We consider here a one-dimensional covariate (\(d=1\)), uniformly distributed on [0, 1], and a Burr-type distribution for Y given \(X=x\):

$$ \bar{F}^{(1)}(y|x)= \left( 1+y^{-\rho (x)/\gamma (x)} \right) ^{1/\rho (x)} \text {, } \gamma (x)=\frac{4+\sin (2 \pi x)}{10} \text { and } \rho (x) \equiv -1.$$

Such a distribution fulfills Assumption \(\mathcal {C}_2(\gamma (x),\rho (x),A(\cdot |x))\) with auxiliary function \(A(y|x)=\gamma (x) y^{\rho (x)}\). We simulate \(N=500\) samples of size \(n=1{,}000\) independent replications of (XY), and propose to estimate the conditional quantiles and expectiles of level \(\beta _n=1-1/n=0.999\) using our extreme regression \(L^p\)-quantile estimators. Note that the quantiles may be calculated explicitly:

$$ q(\alpha |x)= \left[ (1-\alpha )^{\rho (x)}-1 \right] ^{-\gamma (x)/\rho (x)} .$$

Expectiles have to be approximated numerically, since they do not have a simple closed form. In order to estimate these two quantities, we propose to compare different approaches (called either direct or indirect):

  1. (i)

    Use the conditional Weissman-type estimators, respectively, based on empirical quantiles and the estimator \(\hat{\gamma }_{\alpha _n}^{(H)}(x)\) (direct quantile estimator) and on empirical expectiles and \(\tilde{\gamma }_{\alpha _n}^{(2)}(x)\) (direct expectile estimator), i.e.

    $$ \left( \frac{1-\alpha _n}{1-\beta _n} \right) ^{\hat{\gamma }_{\alpha _n}^{(H)}(x)} \hat{q}^{(1)}_n( \alpha _n |x) \text { , } \left( \frac{1-\alpha _n}{1-\beta _n} \right) ^{\tilde{\gamma }_{\alpha _n}^{(2)}(x)} \hat{q}^{(2)}_n( \alpha _n |x). $$
  2. (ii)

    Indirect quantile estimator: estimate first the conditional \(L^p\)-quantile using estimator (4), and exploit asymptotic relationship (9) to recover the extreme conditional quantile,

    $$ \left( \frac{1-\alpha _n}{1-\beta _n} \right) ^{\tilde{\gamma }_{\alpha _n}^{(p)}(x)} \hat{q}^{(p)}_n( \alpha _n |x) \left( \frac{\tilde{\gamma }_{\alpha _n}^{(p)}(x)}{B \left( p,\tilde{\gamma }_{\alpha _n}^{(p)}(x)^{-1}-p+1 \right) } \right) ^{\tilde{\gamma }_{\alpha _n}^{(p)}(x)}. $$
  3. (iii)

    Indirect expectile estimator: use Eq. (9) to get a connection between \(L^p\)-quantile and quantile, and quantile and expectile, resulting in the extreme conditional expectile estimator

    $$ \left( \frac{1-\alpha _n}{1-\beta _n} \right) ^{\tilde{\gamma }_{\alpha _n}^{(p)}(x)} \hat{q}^{(p)}_n( \alpha _n |x) \left( \frac{B \left( 2, \tilde{\gamma }_{\alpha _n}^{(p)}(x)^{-1}-1 \right) }{B \left( p, \tilde{\gamma }_{\alpha _n}^{(p)}(x)^{-1}-p+1 \right) } \right) ^{\tilde{\gamma }_{\alpha _n}^{(p)}(x)}. $$

The choice of p is discussed in Girard et al. (2019) using the MSE of (the unconditional version of) \(\tilde{\gamma }_{\alpha _n}^{(p)}(x)\) as a criterion. Cross-validation choices of the bandwidth \(h_n\) and intermediate quantile level \(\alpha _n\), meanwhile, are discussed in Daouia et al. (2013); Girard et al. (2021). For the sake of simplicity, we choose here common parameters \(p=1.7\) following the guidelines of Girard et al. (2019)), \(h_n=0.15\) and \(\alpha _n=1-1/\sqrt{n} \approx 0.968\) across all replications and K is the Epanechnikov kernel defined by \(K(t)=0.75(1-t^2) \mathbbm {1}_{\{ |t|<1 \}}\). Results are shown in Fig. 1.

Fig. 1
figure 1

Left: Boxplots of 500 estimates of \(q^{(1)}(\beta _n|x)\) with the direct (green) and indirect (blue) quantile estimators. Right: Boxplots of 500 estimates of \(q^{(2)}(\beta _n|x)\) with the direct (green) and indirect (blue) expectile estimators. True values are in red

We can notice that an indirect estimation of extreme quantiles or expectiles with a \(L^p\)-quantile (with p between 1 and 2) leads to a trade-off between bias and variance: the indirect \(L^p-\)estimator of an extreme regression quantile is less variable than the direct estimator but slightly more biased, and the indirect \(L^p-\)estimator of an extreme regression expectile is more variable than the direct estimator but less biased. For conditional quantiles, an explanation is that using the asymptotic approximation (9) in the construction of the indirect estimator adds a source of bias, while the reduced variance stems from the use of \(p=1.7\) in the estimator \(\tilde{\gamma }_{\alpha _n}^{(p)}(x)\), providing an estimator with lower variance compared to the simple Hill estimator in our case (see Girard et al. 2019). The case of conditional expectiles is less clear, although the increased variability observed for \(x\in [0,0.5]\) seems to originate in the use of the estimated constant \(B( 2, \tilde{\gamma }_{\alpha _n}^{(p)}(x)^{-1}-1)/B( p, \tilde{\gamma }_{\alpha _n}^{(p)}(x)^{-1}-p+1)\): when \(\tilde{\gamma }_{\alpha _n}^{(p)}(x)\) gets close to 1, which is sometimes the case in this zone where \(\gamma (x)\in [0.4,0.5]\), this estimated constant tends to explode, while the direct estimator is less affected. A similar observation, in the context of extreme Wang distortion risk measure estimation, is made by El Methni and Stupfler (2017).

5 Real Data Example

We study here a data set on motorcycle insurance, collected from the former Swedish insurance provider Wasa. Our data is on motorcycle insurance policies and claims over the period 1994–1998 and is available from www.math.su.se/GLMbook or the R packages insuranceData and CASdatasets, and analyzed in Ohlsson and Johansson (2010). We concentrate here on the relationship between the claim severity Y (defined as the ratio of claim cost by number of claims for each given policyholder) in Swedish kroner (SEK), and the number of years X of exposure of a policyholder. Data for \(X>3\) are very sparse, so we restrict our attention to the case \(Y>0\) and \(X\in [0,3]\), resulting in \(n = 593\) pairs \((X_i,Y_i)\).

Our goal in this section is to estimate extreme conditional quantiles and expectiles of Y given X, at a level \(\beta _n=1-3/n\approx 0.9949\). This level is slightly less extreme than the more standard \(\beta _n=1-1/n\approx 0.9985\), but is an appropriately extreme level in this conditional context where less data are available locally for the estimation. A preliminary diagnostic using a local version of the Hill estimator (which we do not show here) suggests that the data is indeed heavy-tailed with \(\gamma (x)\in [0.25,0.6]\). Following again the guidelines in Girard et al. (2019), we choose \(p=1.7\) for our indirect extreme conditional quantile and expectile estimators. These are, respectively, compared to

  • the estimator \(\widehat{q}_n^{W}(\beta _n|x)\) of Girard et al. (2021), calculated as in Sect. 5 therein, and our direct quantile estimator presented in Sect. 4 (i),

  • the estimator \(\widehat{e}_n^{W,BR}(\beta _n|x)\) of Girard et al. (2021), calculated as in Sect. 5 therein, and our direct expectile estimator presented in Sect. 4 (i).

For the direct and indirect estimators presented in Sect. 4 (ii)–(iii), the parameters \(\alpha _n\) and \(h_n\) are chosen by a cross-validation procedure analogous to that of Girard et al. (2021). The Epanechnikov kernel is adopted. Results are given in Fig. 2. In each case, all three estimators reassuringly point to roughly the same results, with slight differences; in particular, for quantile estimation and when data is scarce, the direct estimator in Sect. 4 (i) appears to be more sensitive to the local shape of the tail than the indirect, \(L^p\)-quantile-based estimator in Sect. 4 (ii), resulting in less stable estimates.

Fig. 2
figure 2

Swedish motorcycle insurance data. Left panel: extreme conditional quantile estimation, black curve: estimator \(\widehat{q}_n^{W}(\beta _n|x)\) of Girard et al. (2021), blue curve: direct quantile estimator (i) of Sect. 4, red curve: indirect quantile estimator (ii) of Sect. 4. Right panel: extreme conditional expectile estimation, black curve: estimator \(\widehat{e}_n^{W,BR}(\beta _n|x)\) of Girard et al. (2021), blue curve: direct expectile estimator (i) of Sect. 4, red curve: indirect expectile estimator (iii) of Sect. 4. In each panel, x-axis: number of years of exposure of policyholder, y-axis: claim severity

6 Appendix

6.1 Preliminary Results

Lemma 1

Assume that \((\mathcal {L})\) and \(\mathcal {C}_2 \left( \gamma (\mathbf {x}), \rho (\mathbf {x}), A(.|\mathbf {x}) \right) \) hold, and let \(y_n \rightarrow \infty \) and \(h_n \rightarrow 0\) be such that \(\omega _{h_n}^{(1)}(y_n|\mathbf {x}) \log (y_n)\rightarrow 0\) and \(\omega _{h_n}^{(2)}(y_n|\mathbf {x}) \rightarrow 0\). Then for all \(0 \le k < \gamma (\mathbf {x})^{-1}\) we have, uniformly in \(\mathbf {x}' \in B(\mathbf {x},h_n)\),

$$ m^{(k)}(y_n|\mathbf {x}') = m^{(k)}(y_n|\mathbf {x}) \left( 1+ O \left( h_n \right) + o \left( \omega _{h_n}^{(1)}(y_n|\mathbf {x}) \right) + O \left( \omega _{h_n}^{(2)}(y_n|\mathbf {x}) \right) \right) . $$

In particular \(m^{(k)}(y_n|\mathbf {x}') = y_n^k g(\mathbf {x}) \left( 1+ o(1) \right) \) uniformly in \(\mathbf {x}' \in B(\mathbf {x},h_n)\).

Proof

Let us first write

$$ m^{(k)}(y_n|\mathbf {x}) = \mathbb {E}\left[ (Y-y_n)^k \mathbbm {1}_{\{ Y>y_n \}} | \mathbf {X}=\mathbf {x} \right] g(\mathbf {x}) + \mathbb {E}\left[ (y_n-Y)^k \mathbbm {1}_{\{ Y\le y_n \}} | \mathbf {X}=\mathbf {x} \right] g(\mathbf {x}). $$

By the arguments of the proof of Lemma 3 in Girard et al. (2021),

$$ \frac{\mathbb {E}\left[ (Y-y_n)^k \mathbbm {1}_{\{ Y>y_n \}} | \mathbf {X}=\mathbf {x}' \right] g(\mathbf {x}')}{\mathbb {E}\left[ (Y-y_n)^k \mathbbm {1}_{\{ Y>y_n \}} | \mathbf {X}=\mathbf {x} \right] g(\mathbf {x})} = 1+ O \left( h_n \right) + O \left( \omega _{h_n}^{(1)}(y_n|\mathbf {x}) \log (y_n) \right) . $$

Besides, an integration by parts yields

$$ \mathbb {E}\left[ (y_n-Y)^k \mathbbm {1}_{\{ Y\le y_n \}} | \mathbf {X}=\mathbf {x} \right] = \int _0^{y_n} k t^{k-1} F^{(1)}(y_n-t|\mathbf {x}) \, dt. $$

It clearly follows that

$$ \left| \mathbb {E}\left[ (y_n-Y)^k \mathbbm {1}_{\{ Y\le y_n \}} | \mathbf {X}=\mathbf {x}' \right] - \mathbb {E}\left[ (y_n-Y)^k \mathbbm {1}_{\{ Y\le y_n \}} | \mathbf {X}=\mathbf {x} \right] \right| \le y_n^k \omega _{h_n}^{(2)}(y_n|\mathbf {x}). $$

Now

$$ \mathbb {E}\left[ (y_n-Y)^k \mathbbm {1}_{\{ Y\le y_n \}} | \mathbf {X}=\mathbf {x} \right] = y_n^k \, \mathbb {E}\left[ \left( 1 - \frac{Y}{y_n} \right) ^k \mathbbm {1}_{\{ Y\le y_n \}} | \mathbf {X}=\mathbf {x} \right] = y_n^k(1+o(1)) $$

by the dominated convergence theorem, and

$$\begin{aligned} \mathbb {E}\left[ (Y-y_n)^k \mathbbm {1}_{\{ Y>y_n \}} | \mathbf {X}=\mathbf {x} \right] = \frac{ g(\mathbf {x}) B\left( k+1,\gamma (\mathbf {x})^{-1}-k \right) }{\gamma (\mathbf {x})} y_n^k \bar{F}^{(1)}(y_n|\mathbf {x})(1+o(1)), \end{aligned}$$
(15)

see for instance Lemma 1(i) in Daouia et al. (2019). The result follows from direct calculations.

Lemma 2

Assume that \((\mathcal {K})\), \((\mathcal {L})\) and \(\mathcal {C}_2 \left( \gamma (\mathbf {x}), \rho (\mathbf {x}), A(.|\mathbf {x}) \right) \) hold, and let \(y_n \rightarrow \infty \) and \(h_n \rightarrow 0\) be such that \(nh_n^d \rightarrow \infty \), \(\omega _{h_n}^{(1)}(y_n|\mathbf {x}) \log (y_n)\rightarrow 0\) and \(\omega _{h_n}^{(2)}(y_n|\mathbf {x}) \rightarrow 0\). Then for all \(0 \le k < \gamma (\mathbf {x})^{-1}/2\),

$$\begin{aligned} \mathbb {E} \left[ \hat{m}_n^{(k)}(y_n|\mathbf {x}) \right]= & {} m^{(k)}(y_n|\mathbf {x}) \left( 1+ O \left( h_n \right) + o \left( \omega _{h_n}^{(1)}(y_n|\mathbf {x}) \right) + O \left( \omega _{h_n}^{(2)}(y_n|\mathbf {x}) \right) \right) \\ \text{ and } \ \mathbb {V}\text {ar} \left[ \hat{m}_n^{(k)}(y_n|\mathbf {x}) \right]= & {} \frac{|| K ||_2^2}{n h_n^d} g(\mathbf {x}) y_n^{2k} (1+o(1)). \end{aligned}$$

Proof

Note that \( \mathbb {E} \left[ \hat{m}_n^{(k)}(y_n|\mathbf {x}) \right] = \int _S m^{(k)}(y_n|\mathbf {x}-\mathbf {u} h_n) K(\mathbf {u}) d\mathbf {u} \) by Assumption \((\mathcal {K})\) and a change of variables, and use Lemma 1 to get the first result. The second result is obtained through similar calculations.   \(\square \)

Lemma 3

Assume that \((\mathcal {K})\), \((\mathcal {L})\) and \(\mathcal {C}_2 \left( \gamma (\mathbf {x}), \rho (\mathbf {x}), A(.|\mathbf {x}) \right) \) hold. Let \(y_n \rightarrow \infty \), \(h_n \rightarrow 0\) be such that \(nh_n^d \rightarrow \infty \) and \(\omega _{h_n}^{(1)}(y_n|\mathbf {x}) \log (y_n) \rightarrow 0\). Then for all \(0 \le k < \gamma (\mathbf {x})^{-1}/2\),

$$ \left\{ \begin{array}{cc} \mathbb {E} \left[ \hat{\varphi }_n^{(k)}(y_{n}|\mathbf {x}) \right] &{}= \varphi ^{(k)}(y_n|\mathbf {x}) \left( 1+ O(h_n)+O \left( \omega _{h_n}^{(1)}(y_n|\mathbf {x}) \log (y_n) \right) \right) , \\ \mathbb {V}\text {ar} \left[ \hat{\varphi }_n^{(k)}(y_{n}|\mathbf {x}) \right] &{}= || K ||_2^2 g(\mathbf {x}) \frac{B\left( 2k+1, \gamma (\mathbf {x})^{-1}-2k \right) }{\gamma (\mathbf {x})} \frac{y_n^{2k} \bar{F}^{(1)}(y_n|\mathbf {x})}{n h_n^d} (1+o(1)). \end{array} \right. $$

Proof

See Lemma 5 of Girard et al. (2021).

Lemma 4

Assume that \(\mathcal {C}_2 \left( \gamma (\mathbf {x}), \rho (\mathbf {x}), A(.|\mathbf {x}) \right) \) holds. Let \(\lambda \ge 1\), \(y_n \rightarrow \infty \), \(y_n'=\lambda y_n(1+o(1))\) and \(0< k < \gamma (\mathbf {x})^{-1}\).

(i) Then the following asymptotic relationship holds:

$$\begin{aligned}&\mathbb {E} \left[ |Y-y_n|^k \mathbbm {1}_{\{ Y>y_n' \}} | \mathbf {X}=\mathbf {x} \right] \\= & {} y_n^k \bar{F}^{(1)}(y_n|\mathbf {x}) \left[ k IB \left( \lambda ^{-1}, \gamma (\mathbf {x})^{-1}-k,k \right) +(\lambda -1)^k \lambda ^{-1/\gamma (\mathbf {x})} \right] (1+o(1)). \end{aligned}$$

   \(\square \)

(ii) Assume further that \(\omega _{h_n}^{(1)}(y_n \wedge y_n' | \mathbf {x}) \log (y_n) \rightarrow 0\) and \(\omega _{h_n}^{(3)}(y_n | \mathbf {x}) \rightarrow 0\). Then, uniformly in \(\mathbf {x}' \in B(\mathbf {x},h_n)\),

$$ \mathbb {E} \left[ |Y-y_n|^k \mathbbm {1}_{\{ Y>y_n' \}} | \mathbf {X}=\mathbf {x}' \right] =\mathbb {E} \left[ |Y-y_n|^k \mathbbm {1}_{\{ Y>y_n' \}} | \mathbf {X}=\mathbf {x} \right] (1+o(1)). $$

Proof

(i) Straightforward calculations entail

$$\begin{aligned}&\mathbb {E} \left[ |Y-y_n|^k \mathbbm {1}_{\{ Y>y_n' \}} | \mathbf {X}=\mathbf {x} \right] \\= & {} y_n^k \mathbb {E} \left[ \left\{ \left( \frac{Y}{y_n}-1 \right) ^k-(\lambda -1)^k \right\} \mathbbm {1}_{\{ Y>\lambda y_n \}} | \mathbf {X}=\mathbf {x} \right] (1+o(1)) \\+ & {} y_n^k (\lambda -1)^k \bar{F}^{(1)}(\lambda y_n | \mathbf {x})(1+o(1)) , \end{aligned}$$

with \(y_n'=\lambda y_n (1+o(1))\). The result then comes directly from the regular variation property of \(\bar{F}^{(1)}(\cdot | \mathbf {x})\) and Lemma 1 in Daouia et al. (2019) with \(H(t)=(t-1)^k\) and \(b= \lambda \).

(ii) Note first that for n large enough

$$\begin{aligned}&\left| \mathbb {E} \left[ |Y-y_n|^k \mathbbm {1}_{\{ Y>y'_n \}} | \mathbf {X}=\mathbf {x}' \right] - \mathbb {E} \left[ |Y-y_n|^k \mathbbm {1}_{\{ Y>\lambda y_n \}} | \mathbf {X}=\mathbf {x}' \right] \right| \\\le & {} \left[ |y'_n-y_n|^k + (\lambda -1)^k y_n^k \right] \left[ \bar{F}^{(1)} (y'_n\wedge \lambda y_n | \mathbf {x}') - \bar{F}^{(1)} (y'_n\vee \lambda y_n | \mathbf {x}') \right] \\\le & {} 3(\lambda -1)^k y_n^k \times \bar{F}^{(1)}(y_n'|\mathbf {x}') \times \omega _{h_n}^{(3)}(y_n | \mathbf {x}). \end{aligned}$$

Write \((Y-y_n)^k = ((Y-y_n)^k - (\lambda -1)^k y_n^k) + (\lambda -1)^k y_n^k\). It then follows from the assumption \(\omega _{h_n}^{(3)}(y_n | \mathbf {x}) \rightarrow 0\) that, uniformly in \(\mathbf {x}' \in B(\mathbf {x},h_n)\),

$$\begin{aligned} \mathbb {E} \left[ |Y-y_n|^k \mathbbm {1}_{\{ Y>y'_n \}} | \mathbf {X}=\mathbf {x}' \right]= & {} (\lambda -1)^k y_n^k \bar{F}^{(1)}(y_n'|\mathbf {x}') (1+o(1)) \\+ & {} k \int _{\lambda y_n}^{\infty }(z-y_n)^{k-1} \bar{F}^{(1)}(z|\mathbf {x}')dz (1+o(1)). \end{aligned}$$

Remark now \( \bar{F}^{(1)} \left( y_n'|\mathbf {x} \right) (y_n')^{-\omega _{h_n}^{(1)}(y_n'|\mathbf {x})} \le \bar{F}^{(1)} \left( y_n'|\mathbf {x}' \right) \le \bar{F}^{(1)} \left( y_n'|\mathbf {x} \right) (y_n')^{\omega _{h_n}^{(1)}(y_n'|\mathbf {x})}. \) Then condition \(\omega _{h_n}^{(1)}(y_n'|\mathbf {x}) \log (y_n) \rightarrow 0\) entails, uniformly in \(\mathbf {x}' \in B(\mathbf {x},h_n)\), \(\bar{F}^{(1)} \left( y_n'|\mathbf {x}' \right) = \bar{F}^{(1)} \left( y_n'|\mathbf {x} \right) (1+o(1)) = \bar{F}^{(1)} \left( \lambda y_n|\mathbf {x} \right) (1+o(1))\). Besides, for any \(z\ge \lambda y_n\ge y_n\), \( \bar{F}^{(1)} \left( z|\mathbf {x} \right) z^{-\omega _{h_n}^{(1)}(y_n|\mathbf {x})} \le \bar{F}^{(1)} \left( z|\mathbf {x}' \right) \le \bar{F}^{(1)} \left( z|\mathbf {x} \right) z^{\omega _{h_n}^{(1)}(y_n|\mathbf {x})}. \) Following the proof of Lemma 3 in Girard et al. (2021), we get, uniformly in \(\mathbf {x}' \in B(\mathbf {x},h_n)\),

$$ \left| \frac{\int _{\lambda y_n}^{\infty }(z-y_n)^{k-1} \bar{F}^{(1)}(z|\mathbf {x}')dz}{\int _{\lambda y_n}^{\infty }(z-y_n)^{k-1} \bar{F}^{(1)}(z|\mathbf {x})dz} - 1 \right| = O( \omega _{h_n}^{(1)}(y_n | \mathbf {x}) \log (y_n) ) \rightarrow 0. $$

Since \(\int _{\lambda y_n}^{\infty }(z-y_n)^{k-1} \bar{F}^{(1)}(z|\mathbf {x})dz\) is of order \(y_n^k \bar{F}^{(1)}(y_n|\mathbf {x})\) (by regular variation of \(\bar{F}^{(1)}(\cdot |\mathbf {x})\)), the conclusion follows.

Lemma 5

Assume that \(\mathcal {C}_2 \left( \gamma (\mathbf {x}), \rho (\mathbf {x}), A(.|\mathbf {x}) \right) \) holds. For all \(1 \le p < \gamma (\mathbf {x})^{-1}+1\),

$$ \frac{\bar{F}^{(p)}(y|\mathbf {x})}{\bar{F}^{(1)}(y|\mathbf {x})}= \frac{B\left( p,\gamma (\mathbf {x})^{-1}-p+1 \right) }{\gamma (\mathbf {x})} \left[ 1+r(y|\mathbf {x}) \right] $$

where there are constants \(C_1(\mathbf {x})\), \(C_2(\mathbf {x})\), \(C_3(\mathbf {x})\) such that

$$\begin{aligned} r(y|\mathbf {x})= & {} C_1(\mathbf {x})\frac{\mathbb {E}(Y \mathbbm {1}_{\{ 0<Y<y\}} | \mathbf {X}=\mathbf {x})}{y}(1+o(1)) + C_2(\mathbf {x}) \bar{F}^{(1)}(y|\mathbf {x}) (1+o(1)) \\+ & {} C_3(\mathbf {x}) A(1/\bar{F}^{(1)}(y|\mathbf {x})|\mathbf {x}) (1+o(1)) \ \text{ as } y\rightarrow \infty . \end{aligned}$$

Similarly

$$ \frac{q^{(p)}(\alpha |\mathbf {x})}{q^{(1)}(\alpha |\mathbf {x})} = \left( \frac{\gamma (\mathbf {x})}{B \left( p,\gamma (\mathbf {x})^{-1}-p+1 \right) } \right) ^{-\gamma (\mathbf {x})} \left[ 1+R(\alpha |\mathbf {x}) \right] $$

where there are constants \(D_1(\mathbf {x})\), \(D_2(\mathbf {x})\), \(D_3(\mathbf {x})\) such that

$$\begin{aligned} R(\alpha |\mathbf {x})= & {} D_1(\mathbf {x})\frac{\mathbb {E}(Y \mathbbm {1}_{\{ 0<Y<q^{(1)}(\alpha |\mathbf {x})\}} | \mathbf {X}=\mathbf {x})}{q^{(1)}(\alpha |\mathbf {x})}(1+o(1)) + D_2(\mathbf {x}) (1-\alpha ) (1+o(1)) \\+ & {} D_3(\mathbf {x}) A((1-\alpha )^{-1}|\mathbf {x}) (1+o(1)) \ \text{ as } \alpha \rightarrow 1. \end{aligned}$$

   \(\square \)

Proof

We start by focusing on the ratio \(\bar{F}^{(p)}(y|\mathbf {x})/\bar{F}^{(1)}(y|\mathbf {x})\). By Lemma 1 in Girard et al. (2019), the function \(\bar{F}^{(p)}(\cdot |\mathbf {x})\) is continuous and strictly decreasing on the support of Y given \(\mathbf {X}=\mathbf {x}\). It is therefore enough to show the announced formula for \(y=q^{(p)}(\alpha |\mathbf {x})\) with \(\alpha \rightarrow 1\); this, in turn, is a simple corollary of Proposition 2 in Daouia et al. (2019). To show the analogous formula on \(q^{(p)}(\alpha |\mathbf {x})/q^{(1)}(\alpha |\mathbf {x})\), we define \(U^{(1)}(t|\mathbf {x}) = q^{(1)}(1-t^{-1}|\mathbf {x})\); \(U^{(1)}(\cdot |\mathbf {x})\) also satisfies a (local uniform) second-order regular variation condition, see Theorem 2.3.9 p.48 in de Haan and Ferreira (2006). Consequently, we note that the asymptotic expansion on \(\bar{F}^{(p)}(y|\mathbf {x})/\bar{F}^{(1)}(y|\mathbf {x})\) entails a similar expansion on

$$ \frac{U^{(1)}(1/\bar{F}^{(1)}(y|\mathbf {x})|\mathbf {x})}{U^{(1)}(1/\bar{F}^{(p)}(y|\mathbf {x})|\mathbf {x})} = \frac{y}{q^{(1)}(F^{(p)}(y|\mathbf {x}))} (1+o(A(1/\bar{F}^{(1)}(y|\mathbf {x})|\mathbf {x}))) $$

as \(y\rightarrow \infty \), with different constants (here Lemma 1 in Daouia et al. (2020b) was used). Setting \(y=q^{(p)}(\alpha |\mathbf {x})\), with \(\alpha \rightarrow 1\), gives the announced result.

Lemma 6

Assume that \((\mathcal {K})\), \((\mathcal {L})\) and \(\mathcal {C}_2 \left( \gamma (\mathbf {x}), \rho (\mathbf {x}), A(.|\mathbf {x}) \right) \) hold. Let \(y_n \rightarrow \infty \), \(h_n \rightarrow 0\) and \(z_n=\theta y_n (1+o(1))\), where \(\theta >0\). Assume further that \(\epsilon _n^{-2}=n h_n^d \bar{F}^{(1)}(y_n|\mathbf {x}) \rightarrow \infty \), \(n h_n^{d+2} \bar{F}^{(1)}(y_n|\mathbf {x}) \rightarrow 0\), there exists \(\delta \in (0,1)\) such that \( \epsilon _n^{-1} \omega _{h_n}^{(1)}((1-\delta ) (\theta \wedge 1) y_n|\mathbf {x}) \log (y_n) \rightarrow 0\), and \(\omega _{h_n}^{(3)}(z_n|\mathbf {x}) \rightarrow 0\). Letting, for all \(j \in \{ 1,...,J \}\), \(y_{n,j}=\tau _{j}^{-\gamma (\mathbf {x})} y_n (1+o(1))\) with \(0<\tau _1<\tau _2<\, ... \,<\tau _J \le 1\), and \(p \in (1,\gamma (\mathbf {x})^{-1}/2+1)\), one has

$$ \epsilon _n^{-1} \left\{ \left( \frac{\hat{\varphi }_n^{(0)}(y_{n,j}|\mathbf {x})}{\varphi ^{(0)}(y_{n,j}|\mathbf {x})}-1 \right) _{1 \le j \le J}, \left( \frac{\hat{\varphi }_n^{(p-1)}(z_n|\mathbf {x})}{\varphi ^{(p-1)}(z_n|\mathbf {x})}-1 \right) \right\} \overset{d}{ \rightarrow } \mathcal {N} \left( \mathbf {0}_{J+1}, \frac{|| K ||_2^2}{g(\mathbf {x})} {\boldsymbol{\Lambda }}(\mathbf {x}) \right) , $$

where \({\boldsymbol{\Lambda }}(\mathbf {x})\) is a symmetric matrix having entries:

$$\begin{aligned} \left\{ \begin{array}{cc} \Lambda _{j,\ell }(\mathbf {x})=&{} \left( \tau _j \vee \tau _{\ell } \right) ^{-1} \\ \Lambda _{j,J+1}(\mathbf {x})=&{} \gamma (\mathbf {x}) \frac{(p-1) IB\left( \left( 1 \vee \frac{ \tau _j^{-\gamma (\mathbf {x})}}{\theta } \right) ^{-1},\gamma (\mathbf {x})^{-1}-p+1,p-1 \right) +\left( 1 \vee \frac{\tau _j^{-\gamma (\mathbf {x})}}{\theta } -1 \right) ^{p-1} \left( 1 \vee \frac{\tau _j^{-\gamma (\mathbf {x})}}{\theta } \right) ^{-1/\gamma (\mathbf {x})}}{\tau _j B \left( p,\gamma (\mathbf {x})^{-1}-p+1 \right) } \\ \Lambda _{J+1,J+1}(\mathbf {x})=&{} \gamma (\mathbf {x}) \frac{B \left( 2p-1,\gamma (\mathbf {x})^{-1}-2p+2 \right) }{B \left( p,\gamma (\mathbf {x})^{-1}-p+1 \right) ^2} \theta ^{1/\gamma (\mathbf {x})} \\ \end{array} \right. . \end{aligned}$$
(16)

Proof

Let \(\mathbf {\beta }=\left( \beta _1, ... ,\beta _J,\beta _{J+1} \right) \in \mathbb {R}^{J+1}\). Set

$$ \mathcal {Z}_n = \epsilon _n^{-1} \sum \limits _{j=1}^J \beta _j \left( \frac{\hat{\varphi }_n^{(0)}(y_{n,j}|\mathbf {x})}{\varphi ^{(0)}(y_{n,j}|\mathbf {x})}-1 \right) + \epsilon _n^{-1} \beta _{J+1} \left( \frac{\hat{\varphi }_n^{(p-1)}(z_n|\mathbf {x})}{\varphi ^{(p-1)}(z_n|\mathbf {x})}-1 \right) . $$

Clearly \(\omega _{h_n}^{(1)}(y_{n,j}|\mathbf {x}) \le \omega _{h_n}^{(1)}((1-\delta )y_{n}|\mathbf {x}) \) and \(\omega _{h_n}^{(1)}(z_n|\mathbf {x}) \le \omega _{h_n}^{(1)}((1-\delta )\theta y_n|\mathbf {x})\) for n large enough. Lemma 3 then provides \(\mathbb {E}(\mathcal {Z}_n) = o(1)\). It thus remains to focus on the asymptotic distribution of the centered variable \(Z_n=\mathcal {Z}_n-\mathbb {E}(\mathcal {Z}_n)\). Note that \(\mathbb {V}\text {ar}[Z_n]=\epsilon _n^{-2} \mathbf {\beta }^{\top } {\boldsymbol{B}}^{(n)} \mathbf {\beta }\), where \({\boldsymbol{B}}^{(n)}\) is the symmetric matrix having entries:

$$ \left\{ \begin{array}{ll} B_{j,\ell }^{(n)}(\mathbf {x}) =&{} \frac{\text {cov} \left( \hat{\varphi }_n^{(0)}(y_{n,j}|\mathbf {x}), \hat{\varphi }_n^{(0)}(y_{n,\ell }|\mathbf {x}) \right) }{\varphi ^{(0)}(y_{n,j}|\mathbf {x}) \varphi ^{(0)}(y_{n,\ell }|\mathbf {x})}, \ j, \ell \in \{ 1, ... ,J \}, \ j \le \ell , \\ B_{j,J+1}^{(n)}(\mathbf {x}) =&{} \frac{\text {cov} \left( \hat{\varphi }_n^{(0)}(y_{n,j}|\mathbf {x}), \hat{\varphi }_n^{(p-1)}(z_n|\mathbf {x}) \right) }{\varphi ^{(0)}(y_{n,j}|\mathbf {x}) \varphi ^{(p-1)}(z_n|\mathbf {x})}, \ j \in \{ 1, ... ,J \}, \\ B_{J+1,J+1}^{(n)}(\mathbf {x}) =&{} \frac{\mathbb {V}\text {ar} \left[ \hat{\varphi }_n^{(p-1)}(z_n|\mathbf {x}) \right] }{\varphi ^{(p-1)}(z_n|\mathbf {x})^2}. \end{array} \right. $$

We recall \(z_n=\theta y_n (1+o(1))\), hence \(\bar{F}^{(1)}(z_n|\mathbf {x})= \theta ^{-1/\gamma (\mathbf {x})} \bar{F}^{(1)}(y_n|\mathbf {x})(1+o(1))\) and Lemma 3 combined with Eq. (15) immediately gives

$$B_{J+1,J+1}^{(n)}(\mathbf {x})=\frac{|| K ||_2^2}{g(\mathbf {x})} \gamma (\mathbf {x}) \frac{B \left( 2p-1,\gamma (\mathbf {x})^{-1}-2p+2 \right) }{B \left( p,\gamma (\mathbf {x})^{-1}-p+1 \right) ^2} \theta ^{1/\gamma (\mathbf {x})} \epsilon _n^{2}(1+o(1)) .$$

The calculation of \(B_{j,\ell }^{(n)}(\mathbf {x})\) gives, through straightforward calculations and the use of Lemma 3 and Eq. (15),

$$ B_{j,\ell }^{(n)}(\mathbf {x})= \frac{|| K ||_2^2}{n h_n^d} \frac{\bar{F}^{(1)} \left( y_{n,j} \vee y_{n,\ell } |\mathbf {x} \right) }{g(\mathbf {x}) \bar{F}^{(1)} \left( y_{n,j} |\mathbf {x} \right) \bar{F}^{(1)} \left( y_{n,\ell } |\mathbf {x} \right) }(1+o(1)). $$

The regular variation property of \(\bar{F}^{(1)}\) gives \( B_{j,\ell }^{(n)}(\mathbf {x})= \frac{|| K ||_2^2}{g(\mathbf {x})} (\tau _j \vee \tau _{\ell })^{-1} \epsilon _n^2 (1+o(1)) . \) It remains to calculate \(B_{j,J+1}^{(n)}(\mathbf {x})\). Using Eq. (15), with \(Q(\cdot )=K(\cdot )^2/|| K ||_2^2\) a kernel satisfying \((\mathcal {K})\), this term equals

$$\begin{aligned}&\frac{ \frac{1}{n h_n^{2d}} || K ||_2^2 \mathbb {E} \left[ \left| Y-z_n \right| ^{p-1} Q \left( \frac{\mathbf {x}-\mathbf {X}}{h_n} \right) \mathbbm {1}_{\{ Y>z_n \vee y_{n,j} \}} \right] }{g(\mathbf {x})^2 B\left( p,\gamma (\mathbf {x})^{-1}-p+1 \right) z_n^{p-1} \bar{F}^{(1)}(y_{n,j}|\mathbf {x}) \bar{F}^{(1)}(z_n|\mathbf {x})/\gamma (\mathbf {x}) (1+o(1))} \\- & {} \frac{\frac{1}{n} \mathbb {E} \left[ \frac{1}{h_n^d} K \left( \frac{\mathbf {x}-\mathbf {X}}{h_n} \right) \mathbbm {1}_{\{ Y> y_{n,j} \}} \right] \mathbb {E} \left[ \left| Y-z_n \right| ^{p-1} \frac{1}{h_n^d} K \left( \frac{\mathbf {x}-\mathbf {X}}{h_n} \right) \mathbbm {1}_{\{ Y >z_n \}} \right] }{g(\mathbf {x})^2 B\left( p,\gamma (\mathbf {x})^{-1}-p+1 \right) z_n^{p-1} \bar{F}^{(1)}(y_{n,j}|\mathbf {x}) \bar{F}^{(1)}(z_n|\mathbf {x})/\gamma (\mathbf {x}) (1+o(1))} . \end{aligned}$$

Clearly, as a direct consequence of Lemma 3, the first term dominates. Remark that \(z_n \vee y_{n,j} = (1 \vee \tau _j^{-\gamma (\mathbf {x})}/\theta )z_n (1+o(1))\) and combine Assumption \((\mathcal {K})\), the results of Lemma 4 (with \(\lambda =(1 \vee \tau _j^{-\gamma (\mathbf {x})}/\theta )\)), and the regular variation property of \(\varphi ^{(k)}(\cdot )\) (see Eq. (15)) to find that the numerator of this first term is asymptotically equivalent to

$$\begin{aligned} \frac{|| K ||_2^2}{n h_n^d} g(\mathbf {x}) z_n^{p-1} \bar{F}^{(1)}(z_n|\mathbf {x})&\left[ (p-1) IB \left( (1 \vee \tau _j^{-\gamma (\mathbf {x})}/\theta )^{-1},\gamma (\mathbf {x})^{-1}-p+1,p-1 \right) \right. \nonumber \\&\,\,\, \left. +((1 \vee \tau _j^{-\gamma (\mathbf {x})}/\theta )-1)^{p-1} \left( 1 \vee \tau _j^{-\gamma (\mathbf {x})}/\theta \right) ^{-1/\gamma (\mathbf {x})} \right] . \end{aligned}$$

And finally \(B_{j,J+1}^{(n)}(\mathbf {x})\) is asymptotically equivalent to

$$\begin{aligned} \frac{\tau _j^{-1} \gamma (\mathbf {x}) \frac{|| K ||_2^2}{g(\mathbf {x})} \epsilon _n^2}{B \left( p,\gamma (\mathbf {x})^{-1}-p+1 \right) }&\left[ (p-1) IB \left( (1 \vee \tau _j^{-\gamma (\mathbf {x})}/\theta )^{-1},\gamma (\mathbf {x})^{-1}-p+1,p-1 \right) \right. \nonumber \\&\qquad \left. +((1 \vee \tau _j^{-\gamma (\mathbf {x})}/\theta )-1)^{p-1} \left( 1 \vee \tau _j^{-\gamma (\mathbf {x})}/\theta \right) ^{-1/\gamma (\mathbf {x})} \right] . \end{aligned}$$

Therefore, \(\mathbb {V}\text {ar}[Z_n]=|| K ||_2^2 \mathbf {\beta }^{\top } {\boldsymbol{\Lambda }}(\mathbf {x}) \mathbf {\beta } / g(\mathbf {x})(1+o(1))\), where \({\boldsymbol{\Lambda }}(\mathbf {x})\) is given in Eq. (16). It remains to prove the asymptotic normality of \(Z_n\). For that purpose, we denote \(Z_n=\sum _{i=1}^n Z_{i,n}\), where

$$\begin{aligned}&\qquad Z_{i,n}= \frac{\epsilon _n^{-1}}{n h_n^d} \sum \limits _{j=1}^J \beta _j \frac{K \left( \frac{\mathbf {x}-\mathbf {X}_i}{h_n} \right) \mathbbm {1}_{\{ Y_i> y_{n,j} \}}-\mathbb {E} \left[ K \left( \frac{\mathbf {x}-\mathbf {X}_i}{h_n} \right) \mathbbm {1}_{\{ Y_i> y_{n,j} \}} \right] }{\varphi ^{(0)}(y_{n,j}|\mathbf {x})} \nonumber \\&+ \frac{\epsilon _n^{-1}}{n h_n^d} \beta _{J+1} \frac{ \left| Y_i-z_n \right| ^{p-1} K \left( \frac{\mathbf {x}-\mathbf {X}_i}{h_n} \right) \mathbbm {1}_{\{ Y_i> z_n \}}-\mathbb {E}\left[ \left| Y_i-z_n \right| ^{p-1} K \left( \frac{\mathbf {x}-\mathbf {X}_i}{h_n} \right) \mathbbm {1}_{\{ Y_i > z_n \}} \right] }{\varphi ^{(p-1)}(z_{n}|\mathbf {x})}. \end{aligned}$$

Taking \(\delta >0\) sufficiently small and arguing as in the closing stages of the proof of Lemma 6 in Girard et al. (2021), we find that \(n \mathbb {E}\left[ |Z_{1,n}|^{2+\delta } \right] =O \left( \epsilon _n^{\delta } \right) =o(1)\). Applying the classical Lyapunov central limit theorem concludes the proof.   \(\square \)

Proposition 1

Assume that \((\mathcal {K})\), \((\mathcal {L})\) and \(\mathcal {C}_2 \left( \gamma (\mathbf {x}), \rho (\mathbf {x}), A(.|\mathbf {x}) \right) \) hold. Let \(y_n \rightarrow \infty \), \(h_n \rightarrow 0\) and \(z_n=\theta y_n (1+o(1))\), where \(\theta >0\). Assume further that \(\epsilon _n^{-2}=n h_n^d \bar{F}^{(1)}(y_n|\mathbf {x}) \rightarrow \infty \), \(n h_n^{d+2} \bar{F}^{(1)}(y_n|\mathbf {x}) \rightarrow 0\), \( \omega _{h_n}^{(3)}(y_n|\mathbf {x}) \rightarrow 0\) and there exists \(\delta \in (0,1)\) such that \( \epsilon _n^{-1} \omega _{h_n}^{(1)}((1-\delta ) (\theta \wedge 1) y_n|\mathbf {x}) \log (y_n) \rightarrow 0\). If, for all \(j \in \{ 1,...,J \}\), the \(y_{n,j}=\tau _{j}^{-\gamma (\mathbf {x})} y_n (1+o(1))\) with \(0<\tau _1<\tau _2<\, ... \,<\tau _J \le 1\) are such that \( \epsilon _n^{-1} \omega _{h_n}^{(2)}((1+\delta ) (\theta \vee \tau _1^{-\gamma (\mathbf {x})}) y_n|\mathbf {x}) \rightarrow 0 \), then, for all \(p \in (1,\gamma (\mathbf {x})^{-1}/2+1)\), one has

$$ \epsilon _n^{-1} \left\{ \left( \frac{\hat{\bar{F}}_n^{(1)}(y_{n,j}|\mathbf {x})}{\bar{F}^{(1)}(y_{n,j}|\mathbf {x})}-1 \right) _{1 \le j \le J}, \left( \frac{\hat{\bar{F}}_n^{(p)}(z_n|\mathbf {x})}{\bar{F}^{(p)}(z_n|\mathbf {x})}-1 \right) \right\} \overset{d}{ \rightarrow } \mathcal {N} \left( \mathbf {0}_{J+1}, \frac{|| K ||_2^2}{g(\mathbf {x})} {\boldsymbol{\Lambda }}(\mathbf {x}) \right) , $$

where \({\boldsymbol{\Lambda }}(\mathbf {x})\) is given in Eq. (16).

Proof

Notice that

$$ \frac{\hat{\bar{F}}_n^{(p)}(u_n|\mathbf {x})}{\bar{F}^{(p)}(u_n|\mathbf {x})}-1 = \left( \frac{\hat{\varphi }_n^{(p-1)}(u_n|\mathbf {x})}{\varphi ^{(p-1)}(u_n|\mathbf {x})}-1 \right) \frac{m^{(p-1)}(u_n|\mathbf {x})}{\hat{m}_n^{(p-1)}(u_n|\mathbf {x})}+\left( \frac{m^{(p-1)}(u_n|\mathbf {x})}{\hat{m}_n^{(p-1)}(u_n|\mathbf {x})}-1 \right) . $$

Lemma 2 and the Chebyshev inequality ensure that for all \(p \in (1,\gamma (\mathbf {x})^{-1}/2+1)\) and \(u_n\in \{ y_{n,1},\ldots ,y_{n,J},z_n \}\), \(\hat{m}_n^{(p-1)}(u_n|\mathbf {x})/m^{(p-1)}(u_n|\mathbf {x}) - 1 =O_{\mathbb {P}}(1/\sqrt{nh_n^d})\), so that

$$ \epsilon _n^{-1} \left( \frac{\hat{\bar{F}}_n^{(p)}(u_n|\mathbf {x})}{\bar{F}^{(p)}(u_n|\mathbf {x})}-1 \right) = \epsilon _n^{-1} \left( \frac{\hat{\varphi }_n^{(p-1)}(u_n|\mathbf {x})}{\varphi ^{(p-1)}(u_n|\mathbf {x})}-1 \right) +o_{\mathbb {P}}(1). $$

Applying Lemma 6 concludes the proof.   \(\square \)

6.2 Proofs of Main Results

Proof of Theorem 1 Let us denote \(\mathbf {t}=(t_1,...,t_J,t_{J+1})\) and focus on the probability

$$ \Phi _n(\mathbf {t})= \mathbb {P} \left( \bigcap \limits _{j=1}^J \left\{ \sigma _n^{-1} \left( \frac{\hat{q}_n^{(1)}(\alpha _{n,j}|\mathbf {x})}{q^{(1)}(\alpha _{n,j}|\mathbf {x})} -1 \right) \le t_j \right\} \bigcap \left\{ \sigma _n^{-1} \left( \frac{\hat{q}_n^{(p)}(a_{n}|\mathbf {x})}{q^{(p)}(a_{n}|\mathbf {x})} -1 \right) \le t_{J+1} \right\} \right) . $$

Set \(y_{n}=q^{(1)}(\alpha _{n}|\mathbf {x})\), \(y_{n,j}=q^{(1)}(\alpha _{n,j}|\mathbf {x}) \left( 1+ \sigma _n t_{j} \right) \) and \(z_n=q^{(p)}(a_{n}|\mathbf {x}) \left( 1+ \sigma _n t_{J+1} \right) \). The technique of proof of Proposition 1 in Girard et al. (2019) yields

$$\begin{aligned} \Phi _n(\mathbf {t})= & {} \mathbb {P} \left( \bigcap \limits _{j=1}^J \left\{ \sigma _n^{-1} \left( \frac{\hat{\bar{F}}_n^{(1)} \left( y_{n,j}|\mathbf {x} \right) }{\bar{F}^{(1)} \left( y_{n,j}|\mathbf {x} \right) }-1 \right) \le \sigma _n^{-1} \left( \frac{\bar{F}^{(1)}\left( q^{(1)}(\alpha _{n,j}|\mathbf {x})|\mathbf {x} \right) }{\bar{F}^{(1)} \left( y_{n,j}|\mathbf {x} \right) } -1 \right) \right\} \right. \\&\left. \qquad \bigcap \left\{ \sigma _n^{-1} \left( \frac{\hat{\bar{F}}_n^{(p)} \left( z_n|\mathbf {x} \right) }{\bar{F}^{(p)} \left( z_n|\mathbf {x} \right) } -1 \right) \le \sigma _n^{-1} \left( \frac{\bar{F}^{(p)}\left( q^{(p)}(a_{n}|\mathbf {x})|\mathbf {x} \right) }{\bar{F}^{(p)} \left( z_n|\mathbf {x} \right) } -1 \right) \right\} \right) . \end{aligned}$$

Second-order regular variation arguments similar to those of the proof of Proposition 1 in Girard et al. (2019) give, for all \(j \in \{ 1, ... ,J \}\),

$$ \sigma _n^{-1} \left( \frac{\bar{F}^{(1)}\left( q^{(1)}(\alpha _{n,j}|\mathbf {x})|\mathbf {x} \right) }{\bar{F}^{(1)} \left( y_{n,j}|\mathbf {x} \right) } -1 \right) = \frac{ t_{j}}{\gamma (\mathbf {x})}(1+o(1)) $$

and similarly

$$ \sigma _n^{-1} \left( \frac{\bar{F}^{(p)}\left( q^{(p)}(a_{n}|\mathbf {x})|\mathbf {x} \right) }{\bar{F}^{(p)} \left( z_n|\mathbf {x} \right) } -1 \right) = \frac{ t_{J+1}}{\gamma (\mathbf {x})}(1+o(1)). $$

Finally, notice that \(y_{n,j}=\tau _j^{-\gamma (\mathbf {x})} y_n (1+o(1))\) and \(z_n=\theta y_n(1+o(1))\) (see (9)). Moreover, for n large enough, \(\omega _{h_n}^{(1)}(y_{n,j}|\mathbf {x}) \le \omega _{h_n}^{(1)}\left( (1-\delta )q^{(1)}(\alpha _n|\mathbf {x}) |\mathbf {x} \right) \) and \(\omega _{h_n}^{(1)}(z_n|\mathbf {x}) \le \omega _{h_n}^{(1)}\left( (1-\delta ) \theta q^{(1)}(\alpha _n|\mathbf {x}) |\mathbf {x} \right) \). Similarly, \( \omega _{h_n}^{(2)}(y_{n,j}|\mathbf {x}) \le \omega _{h_n}^{(2)}\left( (1+\delta ) \tau _1^{-\gamma (\mathbf {x})} q^{(1)}(\alpha _n|\mathbf {x}) |\mathbf {x} \right) \) and \(\omega _{h_n}^{(2)}(z_n|\mathbf {x}) \le \omega _{h_n}^{(2)}\left( (1+\delta ) \theta q^{(1)}(\alpha _n|\mathbf {x}) |\mathbf {x} \right) \). Conclude using Proposition 1. \(\Box \)

Proof of Theorem 2 We recall \(\sigma _n^{-2}=n h_n^d (1-\alpha _n)\). Write

$$\begin{aligned} \nonumber \frac{\sigma _n^{-1}}{\log \left( \frac{1-\alpha _n}{1-\beta _n} \right) } \log \left( \frac{\tilde{q}^{(p)}_{n,\alpha _n}( \beta _n |\mathbf {x} )}{q^{(p)} ( \beta _n |\mathbf {x} )} \right)= & {} \sigma _n^{-1} (\hat{\gamma }_{\alpha _n}(\mathbf {x})-\gamma (\mathbf {x})) + \frac{\sigma _n^{-1}}{\log \left( \frac{1-\alpha _n}{1-\beta _n} \right) } \log \left( \frac{\hat{q}^{(p)}_{n}( \alpha _n |\mathbf {x} )}{q^{(p)} ( \alpha _n |\mathbf {x} )} \right) \\+ & {} \frac{\sigma _n^{-1}}{\log \left( \frac{1-\alpha _n}{1-\beta _n} \right) } \log \left( \left( \frac{1-\alpha _n}{1-\beta _n} \right) ^{\gamma (\mathbf {x})} \frac{q^{(p)} ( \alpha _n |\mathbf {x} )}{q^{(p)} ( \beta _n |\mathbf {x} )} \right) . \end{aligned}$$

The first term converges in distribution to \(\Gamma \). The second one converges to 0 in probability, by Theorem 1. To control the third one, write

$$ \left( \frac{1-\alpha _n}{1-\beta _n} \right) ^{\gamma (\mathbf {x})} \frac{q^{(p)}(\alpha _n|\mathbf {x})}{q^{(p)}(\beta _n|\mathbf {x})}= \left( \frac{1-\alpha _n}{1-\beta _n} \right) ^{\gamma (\mathbf {x})} \frac{q^{(1)}(\alpha _n|\mathbf {x})}{q^{(1)}(\beta _n|\mathbf {x})} \frac{q^{(p)}(\alpha _n|\mathbf {x})}{q^{(1)}(\alpha _n|\mathbf {x})} \frac{q^{(1)}(\beta _n|\mathbf {x})}{q^{(p)}(\beta _n|\mathbf {x})} . $$

In view of Theorem 4.3.8 in de Haan and Ferreira (2006) and its proof, \( \left( (1-\alpha _n)/(1-\beta _n) \right) ^{\gamma (\mathbf {x})} q^{(1)}(\alpha _n|\mathbf {x}) = q^{(1)}(\beta _n|\mathbf {x}) \left( 1+O\left( A\left( (1-\alpha _n)^{-1} |\mathbf {x} \right) \right) \right) = q^{(1)}(\beta _n|\mathbf {x}) \left( 1+O(\sigma _n) \right) .\) By Lemma 5 then,

$$ \left( \frac{1-\alpha _n}{1-\beta _n} \right) ^{\gamma (\mathbf {x})} \frac{q^{(p)}(\alpha _n|\mathbf {x})}{q^{(p)}(\beta _n|\mathbf {x})}=1+ O(\sigma _n). $$

The third term therefore converges to 0. Conclude using Slutsky’s lemma and the delta-method. \(\Box \)

Proof of Theorem 3 This proof is similar to those of Theorem 4 in Girard et al. (2021) (where \(p=2\)) and Theorem 1 in Girard et al. (2019) (an unconditional version) and is thus left to the reader.