Keywords

1 Introduction and Scope of the Article

Let \((X_1, \dots , X_n)\) denote a random sample of size n from a population with unknown cumulative distribution function (CDF) \(F(x)=\mathbb {P}(X\le x)\) and consider the associated sample of ascending order statistics (OSs) \( (X_{1:n}:=\mathop {\min }\limits _{1\le i\le n} X_i\le \cdots \le X_{n:n}:= \mathop {\max }\limits _{1\le i\le n} X_i). \) Further assume that for large values of x, F(x) is a Pareto-type model, i.e., a model with a regular varying right tail with a negative index of regular variation equal to \(-1/\xi \) \((\xi >0)\). Consequently,

$$\begin{aligned} \overline{F}(x):=1-F(x)=\mathbb {P}(X>x)= x^{-1/\xi }L(x), \quad \text{ as }\ x\rightarrow \infty , \end{aligned}$$
(1)

with \(L(\cdot )\) a slowly varying function, i.e.

$$\begin{aligned} \lim _{t\rightarrow \infty } \frac{L(tx)}{L(t)} =1,\quad \forall \ x>0. \end{aligned}$$

Models satisfying the condition (1) are in the domain of attraction for maxima of a non-degenerate distribution. This means that there exist normalizing constants \(a_n>0\) and \(b_n\in \mathbb {R}\) such that

$$\begin{aligned} \lim _{n\rightarrow \infty } \mathbb {P}\left( \frac{X_{n:n}-b_n}{a_n}\le x\right) = \lim _{n\rightarrow \infty } F^n\left( a_n x+b_n\right) = G(x), \end{aligned}$$
(2)

with \(G(\cdot )\) a non-degenerate CDF. With the appropriate choice of the normalizing constants in (2), and under a general framework, G is the general extreme value (EV) distribution,

$$\begin{aligned} G(x)\equiv \textrm{EV}_\xi (x):=\left\{ \begin{array}{lll} \exp \left( -(1+\xi x)^{-1/\xi }\right) ,\ 1+\xi x>0, &{} \text{ if } &{} \xi \not =0,\\ \exp (-\exp (-x)),\ x\in \mathbb {R}, &{} \text{ if } &{} \xi =0, \end{array} \right. \end{aligned}$$
(3)

given here in the von Mises-Jenkinson form (see [1, 2]). Whenever such a non-degenerate limit exists, we write \(F\in \mathcal{D_M}(EV_\xi )\), and the real parameter \(\xi \) is the extreme value index (EVI).

As already mentioned, we shall deal with Pareto right-tails, i.e. heavy right-tails or equivalently a model with a positive EVI. Then, the right-tail function is of regular variation with an index of regular variation equal to \(-1/\xi \), i.e.

$$\begin{aligned} F\in \mathcal{D_M}(EV_\xi )_{\xi >0}\quad \Longleftrightarrow \quad \overline{F} :=1-F \in RV_{-1/\xi }, \end{aligned}$$
(4)

where the notation \(RV_\alpha \) stands for the class of regularly varying functions at infinity with an index of regular variation equal to \(\alpha \), i.e. positive measurable functions g such that \(\lim \limits _{t\rightarrow \infty }g(tx)/g(t)=x^\alpha \), for all \(x>0\). With the notation

$$\begin{aligned} U(t) :=F^\leftarrow (1-1/t),\ t\ge 1, \quad F^\leftarrow (y):=\inf \left\{ x:F(x)\ge y\right\} , \end{aligned}$$
(5)

condition (4) is equivalent to \(U \in RV_\xi \). Pareto-type models are extremely important in practice due to the frequency and magnitude of extreme values and inference on extreme and large events is usually performed on the basis of the \(k+1\) largest order statistics in the sample, as sketched in Fig. 1.

Fig. 1
figure 1

A Pareto right-tail probability density function

1.1 EVI-Estimators Under Consideration

One of the first classes of semi-parametric estimators of a positive EVI was the class of Hill (H) estimators introduced in [3] and given by

$$\begin{aligned} \hat{\xi }_{k,n}^\textrm{H}:= \frac{1}{k}\sum _{i=1}^k \left\{ \ln X_{n-i+1:n}-\ln X_{n-k:n}\right\} , \ k=1, 2, \dots , n-1. \end{aligned}$$
(6)

This estimator can be highly sensitive to the choice of k, especially in the presence of a substantial bias. As an alternative, we shall also consider the Pareto probability weighted moments (PPWM) EVI-estimators, introduced in [4]. They are consistent for \(0<\xi <1\), compare favourably with the Hill estimator, and are given by

$$\begin{aligned} \hat{\xi }_{k,n}^\textrm{PPWM}:=1- \frac{\hat{a}_1(k)}{\hat{a}_0(k)-\hat{a}_1(k)}, \end{aligned}$$
(7)

with

$$ \hat{a}_0(k):=\frac{1}{k}\sum _{i=1}^k X_{n-i+1:n} \quad \text{ and }\quad \hat{a}_1(k):=\frac{1}{k} \sum _{i=1}^k \frac{i}{k}\ X_{n-i+1:n}. $$

For other alternative estimators of the EVI see Refs. [5,6,7], among others. Consistency of the EVI-estimators in (6) and (7) is achieved if \(X_{n-k:n}\) is an intermediate OS, i.e. if

$$ k=k_n\rightarrow \infty \quad \text {and}\quad k/n\rightarrow 0,\quad \text {as}\quad n\rightarrow \infty . $$

In order to derive the asymptotic normality of these EVI-estimators, it is often assumed the validity of a second-order condition, like

$$\begin{aligned} \lim _{t\rightarrow \infty }\frac{\ln U(tx)-\ln U(t)-\xi \ln x}{A(t)}= \left\{ \begin{array}{lll} \frac{x^\rho -1}{\rho }, &{} \text{ if } &{} \rho <0,\\ \ln x, &{} \text{ if } &{} \rho =0, \end{array} \right. \end{aligned}$$
(8)

where \(U(\cdot )\) is defined in (5) and \(|A|\in RV_\rho \), \(\rho \le 0\). Under such a second-order framework, if \(\sqrt{k}A(n/k)\rightarrow \lambda _{_A}\), finite, as \(n\rightarrow \infty \), these EVI-estimators are asymptotically normal. Denoting \(\hat{\xi }_{k,n}^\bullet \), any of the estimators above, we have, with \(Z_k^\bullet \) an asymptotically standard normal random variable and for adequate \((b_\bullet ,\ \sigma _\bullet )\in (\mathbb {R},\ \mathbb {R}^+)\),

$$\begin{aligned} \hat{\xi }_{k,n}^\bullet \ {\mathop {=}\limits ^{d}}\ \xi + \frac{\sigma _\bullet Z_k^\bullet }{\sqrt{k}}+ b_\bullet \ A(n/k)(1+o_{_{\mathbb {P}}}(1)),\quad \text{ as }\quad n\rightarrow \infty , \end{aligned}$$
(9)

with \(b_\bullet \) the asymptotic bias, and \(\sigma _\bullet ^2\) the asymptotic standard deviation of the approximation, given in Table 1.

Table 1 Asymptotic variance and bias’ rulers of H and PPWM EVI–estimators

Under the above second-order framework, in (8), but with \(\rho <0\), let us use the parametrization

$$ A(t)=\xi \beta t^\rho , \quad \text{ with } \beta \ne 0\ \text{ and } \rho <0, $$

where \(\beta \) and \(\rho \) are generalized scale and shape second-order parameters, which need to be adequately estimated on the basis of the available sample. Let us denote the optimal level by

$$ \tilde{k}_0^{\bullet }(n):=\arg \min _k \textrm{MSE}(\hat{\xi }^\bullet _{k,n}), $$

with MSE standing for mean squared error. With \(\mathbb {E}\) denoting the mean value operator and AMSE standing for asymptotic MSE, a possible substitute for MSE\((\hat{\xi }^\bullet _{k,n}\)) is

$$ \textrm{AMSE}(\hat{\xi }^\bullet _{k,n}):= \mathbb {E}\left( \frac{\sigma _\bullet }{\sqrt{k}} Z_k^\bullet + b_\bullet A(n/k)\right) ^2= \frac{\sigma _\bullet ^2}{k}+b_\bullet ^2\, \xi ^2\, \beta ^2\, \left( \frac{n}{k}\right) ^{2\rho }, $$

cf. Eq. (9). Then, with the notation \(k_{0}^\bullet (n) := \arg \min _k \textrm{AMSE}\big (\hat{\xi }^\bullet _{k,n}\big )\), we get

$$\begin{aligned} k_{0}^\bullet (n) = \left( \frac{\sigma _\bullet ^2\, n^{-2\rho }}{(-2\rho )\, b_\bullet ^2\, \xi ^2 \beta ^2\, }\right) ^{1/(1-2\rho )}\ = \tilde{k}_{0}^{\bullet }(n)(1+o(1)). \end{aligned}$$
(10)

For the Hill estimator in (6), and as can be seen in Table 1, we have \((b_{_\textrm{H}},\sigma _{_\textrm{H}})=(1/(1-\rho ),\xi )\). Consequently, with \((\hat{\beta }, \hat{\rho })\) a consistent estimator of \((\beta , \rho )\) and [x] denoting the integer part of x, we have an asymptotic justification for the estimator

$$ \hat{k}_0^\textrm{H}:= \left[ \left( \frac{(1-\hat{\rho })^2n^{-2\hat{\rho }}}{(-2\hat{\rho }\hat{\beta }^2)}\right) ^{1/(1-2\hat{\rho })}\right] +1. $$

The same does not happen with the PPWM EVI-estimators, due to the fact that \(\sigma _{_\mathrm{\mathrm PPWM}}\), \(b_{_\textrm{PPWM}}\) and consequently \(k_0^\textrm{PPWM}\) depend on the value of \(\xi \) (see Table 1, again). It is thus sensible to use the bootstrap methodology for the adaptive choice of the threshold associated to the PPWM EVI-estimation.

1.2 Scope of the Article

The main goal is the adaptive estimation of the EVI. For that purpose, the choice of the threshold is crucial and we study computationally a recent bootstrap algorithm. After a review, in Sect. 2, of the role of the bootstrap methodology in the estimation of optimal sample fractions, we provide an algorithm for the adaptive estimation through the Hill and the PPWM EVI-estimators. In Sect. 3 we provide results from a Monte Carlo simulation study. In Sect. 4, as an illustration, we apply such methodology to a data set in the field of insurance. Section 5 concludes the paper.

2 Adaptive EVI-Estimation and the Bootstrap Methodology

Similarly to what has been done in Gomes and Oliveira [8], for the H estimator, and in Gomes et al. [9], for adaptive reduced-bias estimation, we can use the algorithm in Caeiro et al. [10] (see also [4]), considering the auxiliary statistic,

$$\begin{aligned} T_{k,n}^\bullet :=\hat{\xi }^\bullet _{[k/2], n}-\hat{\xi }^\bullet _{k,n},\quad k=2,\dots , n-1, \end{aligned}$$
(11)

which converges to the known value zero, and double-bootstrap it adequately, in order to estimate \(k_{0}^\bullet (n)\), through a bootstrap estimate \(\hat{k}_{0}^{\bullet , *}\). Indeed, again under the second-order framework, in (8), we get, for the auxiliary statistic \(T_{k,n}^\bullet \), in (11), the asymptotic distributional representation,

$$ T_{k,n}^\bullet \ {\mathop {=}\limits ^{d}}\, \ \frac{\sigma _\bullet \, Q_k^\bullet }{\sqrt{k}} + b_\bullet (2^{\rho }-1)\ A(n/k)+o_{_{\mathbb {P}}}(A(n/k)),$$

with \(Q_k^\bullet \) asymptotically standard normal, and \((b_\bullet , \sigma _\bullet )\) given in Table 1. The AMSE of \(T_{k,n}^\bullet \) is thus minimal at a level \(k_{0|T}^\bullet (n)\) such that \(\sqrt{k}\ A(n/k)\rightarrow \lambda '_{_A}\not =0\), i.e. a level of the type of the one in (10), with \(b_\bullet \) replaced by \(b_\bullet (2^{\rho }-1)\), and we consequently have

$$ k_{0}^\bullet (n) = k_{0|T}^\bullet (n)\left( 1-2^{\rho }\right) ^{\frac{2}{1-2\rho }}(1+o(1)). $$

2.1 The Bootstrap Methodology in Action

Given the sample \(\underline{X}_n=(X_1,\dots ,X_n)\) from an unknown model F, consider for any \(n_1=O(n^{1-\epsilon })\), with \(0<\epsilon <1\), the bootstrap sample \(\underline{X}_{n_1}^{*}=(X_1^{*},\dots ,X_{n_1}^{*})\), from the model \(F_n^{*}(x)=\frac{1}{n} \sum _{i=1}^{n}I_{\{X_i \le x\}}\), the empirical CDF associated with the original sample \(\underline{X}_n\). We choose the resample size \(n_1\) to be less than the original sample size to avoid underestimation of the bias (see Hall [11]). Next, associate to that bootstrap sample the corresponding bootstrap auxiliary statistic, denoted \(T_{k_1,n_1}^{\bullet ,*}\), \(1< k_1 < n_1\). Then, with the notation

$$k_{{0|T}}^{\bullet ,*}(n_1) =\arg \min _{k_1} \textrm{AMSE}\big (T_{k_1,n_1}^{\bullet ,*}\big ), $$

we have that

$$ \frac{k_{{0|T}}^{\bullet ,*}(n_1)}{k_{{0|T}}^{\bullet ,*}(n)} =\left( {n_1}/{n}\right) ^ { -\frac{2\ \rho }{1-2\ \rho }} (1+o(1)){.} $$

Consequently, for another sample size \(n_2=n_1^2/n\),

$$ \frac{\big (k_{{0|T}}^{\bullet ,*}(n_1)\big )^2}{k_{{0|T}}^{\bullet ,*}(n_2)}=k_{{0|T}}^\bullet (n)(1+o(1)), \ \text{ as } \ n\rightarrow {\small \infty }. $$

We are now able to estimate \(k_{0}^{\bullet }(n)\), on the basis of any estimate \(\hat{\rho }\) of \(\rho \). With \(\hat{k}_{{0|T}}^{\bullet ,*}\) denoting the sample counterpart of \(k_{{0|T}}^{\bullet ,*}\), \(\hat{\rho }\) the \(\rho \)-estimate and taking into account (10), we can build the \(k_0\)-estimate,

$$\begin{aligned} \hat{k}_{0}^{\bullet ,*}\equiv \hat{k}_{0}^{\bullet , *}(n;n_1) := \min \bigg (n-1,\ \bigg [\frac{{\left( 1-2^{\hat{\rho }} \right) ^{\frac{2}{1-2\hat{\rho }}}} \big (\hat{k}_{{0|T}}^{\bullet ,*}(n_1)\big )^2}{\hat{k}_{{0|T}}^{\bullet ,*}([n_1^2/n]+1)}\bigg ]+1\bigg ), \end{aligned}$$
(12)

and the \(\xi \)-estimate

$$\begin{aligned} \hat{\xi }^{\bullet ,*}\equiv \hat{\xi }^{\bullet ,*}({n;n_1}) := \hat{\xi }_{\hat{k}_{{0|T}}^{\bullet ,*}(n;n_1), n}. \end{aligned}$$
(13)

A few questions, some of them with answers outside the scope of this paper, may be raised: How does the bootstrap method work for small or moderate sample sizes? Is the method strongly dependent on the choice of \(n_1\)? What is the type of the sample path of the EVI-estimator, as a function of \(n_1\)? What is the sensitivity of the bootstrap method with respect to the choice of the \(\rho \)-estimate? Although aware of the theoretical need to have \(n_1=o(n)\), what happens if we choose \(n_1=n\)?

2.2 An Algorithm for the Adaptive EVI-Estimation

The estimates \((\hat{\beta }, \hat{\rho })\), of the vector \((\beta , \rho )\) of second-order parameters, are the ones already used in previous papers:

  1. 1.

    Given a sample \((x_1, \dots , x_n)\), consider the observed values of the \(\rho \)-estimators \(\hat{\rho }_\tau (k)\), introduced and studied in Fraga Alves et al.[12], for tuning parameters \(\tau =0\) and \(\tau =1\).

  2. 2.

    Select \(\left\{ \hat{\rho }_\tau (k)\right\} _{k\in {\mathcal K}}\), with \(\mathcal{K}=([n^{0.995}],\ [n^{0.999}])\), and compute their median, denoted by \(\eta _\tau \), \(\tau =0,1\).

  3. 3.

    Next compute \(I_\tau :=\sum _{k\in \mathcal{K}}\left( \hat{\rho }_\tau (k)-\eta _\tau \right) ^2\), \(\tau =0, 1\), and choose the tuning parameter \(\tau ^*=0\) if \(I_0 \le I_1\); otherwise, choose \(\tau ^*=1\).

  4. 4.

    Work with \(\hat{\rho }\equiv \hat{\rho }_{{\tau ^*}}=\hat{\rho }_{{\tau ^*}}(k_1)\) and \(\hat{\beta }\equiv \hat{\beta }_{{\tau ^*}}:=\hat{\beta }_{ \hat{\rho }_{{\tau ^*}}}(k_1)\), \(k_1=[n^{0.999}]\) and \(\hat{\beta }_{\hat{\rho }}(k)\) given in Gomes and Martins [13].

Now, and with \(\hat{\xi }_{k,n}^\textrm{H}\) and \(\hat{\xi }_{k,n}^\textrm{PPWM}\) respectively defined in (6) and (7), the algorithm goes on with the following steps:

  1. 5.

    Compute \(\hat{\xi }_{k,n}^{\bullet }\), \(k=1,\dots ,n-1\), \(\bullet =\textrm{H}\ \text{ and/or }\ \textrm{PPWM}\).

  2. 6.

    Next, consider a sub-sample size \(n_1=o(n)\), and \(n_2=[n_1^2/n]+1\).

  3. 7.

    For l from 1 until B, generate independently B bootstrap samples \((x_1^{*},\dots ,x_{n_2}^{*})\) and \((x_1^{*}, \dots ,x_{n_2}^{*},x_{n_2+1}^{*},\dots ,x_{n_1}^{*})\), of sizes \(n_2\) and \(n_1\), respectively, from the empirical CDF, \(F_n^{*}(x)=\frac{1}{n}\sum _{i=1}^{n} I_{\{X_i\le x\}}\), associated with the observed sample \((x_1, \dots , x_n)\).

  4. 8.

    Denoting by \(T_{k,n}^{\bullet ,*}\) the bootstrap counterpart of \(T_{k,n}^{\bullet }\), defined in (11), obtain \((t_{k,n_1,l}^{\bullet , *},\ t_{k,n_2,l}^{\bullet , *})\), \(1\le l \le B\), the observed values of the statistic \(T_{k,n_i}^{\bullet , *},\ i=1,2\). For \(k=2,\dots ,n_i-1\), compute

    $$ \textrm{MSE}^{\bullet , *}(n_i,k)=\frac{1}{B}\sum \limits _{l=1}^B\big (t_{k,n_i,l}^{\bullet , *}\big )^2,\ $$

    and obtain

    $$ \hat{k}_{{0|T}}^{\bullet , *}(n_i) := \arg \min _{1< k < n_i} \textrm{MSE}^{\bullet , *} (n_i,k),\quad i=1,2. $$
  5. 9.

    Compute the threshold estimate \(\hat{k}_{0}^{\bullet , *}\), in (12).

  6. 10.

    Finally obtain

    $$\hat{\xi }^{\bullet ,*}\equiv \hat{\xi }^{\bullet ,*}({n;n_1}) = \hat{\xi }_{\hat{k}_{{0|T}}^{\bullet ,*}(n;n_1), n},$$

    already provided in (13).

Such an algorithm needs to be computationally validated, a topic we deal with in the next section. Further, note that bootstrap confidence intervals (CIs) are easily associated with the estimates presented through the replication of this algorithm r times.

3 A Small-Scale Simulation Study

In this section, we have implemented a multi-sample Monte Carlo simulation experiment of size 1000, to obtain the distributional behaviour of the EVI adaptive bootstrap estimates \(\hat{\xi }^{H,*}\) and \(\hat{\xi }^{PPWM,*}\) in (6) and (7), respectively. We have considered a re-sample of size \(n_1=[n^{0.955}]\) for samples of size \(n=100\), 200, 500, 750, 1000, 2000 and 5000 from the following models:

  • the Fréchet model, with d.f.

    $$F(x)=\exp (-x^{-1/\xi }),\quad x>0,\quad \xi >0,$$

    with \(\xi =0.25\) (\(\rho =-1\));

  • the Burr model, with d.f.

    $$F(x)=1-(1+x^{-\rho /\xi })^{1/\rho },\quad x>0,$$

    with \((\xi ,\rho )=(0.25,-0.75)\);

  • the Half-\(t_4\) model, i.e., the absolute value of a Student’s t with \(\nu =4\) degrees of freedom (\(\xi =0.25\), \(\rho =-0.5\)).

In Table 2 we present, for the above mentioned models, the multi-sample simulated (mean) double bootstrap optimal sample fraction (OSF), the mean (E) and median (med) of the EVI-estimates and the simulated RMSE for both EVI-estimators, as a function of the sample size n. The less biased EVI-estimate and the smallest RMSE is presented in bold. Although both estimators over-estimate the EVI, the consideration of the PPWM EVI-estimator leads to a less biased EVI-estimate, as expected. The PPWM estimation can also lead to a smaller RMSE, for models with \(|\rho |<1\).

Table 2 Simulated OSF, EVI-estimates (provided by the mean and median) and RMSE estimates obtained through the Hill and PPWM estimators and the double bootstrap methodology

4 A Case Study

Here, the performance of the adaptive double bootstrap procedure is illustrated through the analysis of a real dataset. The analysis was made in R software with the computer code developed in Caeiro and Gomes [14]. We used the dataset AutoClaims from a motor insurance portfolio. The data is available in the R package insuranceData [15]. The variable of interest is the amount paid on a closed claim, in dollars. There are \(n = 6773\) claims available. Since large claims are a topic of great concern in the Insurance Industry, accurate modelling of the right tail of the underlying distribution is extremely important. The Histogram and the Pareto Quantile-Quantile (QQ) Plot, in Fig. 2, are compatible with a Pareto-type underlying distribution.

Fig. 2
figure 2

Histogram and Pareto QQ plot for the AutoClaims dataset

In Fig. 3, we present the EVI-estimates provided by the Hill and the PPWM EVI-estimators in (6) and (7), respectively. Both estimators are upward biased for large k.

Fig. 3
figure 3

Estimates of the EVI for the AutoClaims dataset

In Fig. 4, as a function of the sub-sample size \(n_1\), ranging from \(n_1=3990\) until \(n_1=6700\), we picture, at the left, the estimates \(\hat{k}_{0}^{\bullet ,*}(n_1)/n\) of the optimal sample fraction, \(k_0^\bullet /n\), for the adaptive double bootstrap estimation of \(\xi \) through the H and the PPWM estimators. Associated bootstrap EVI-estimates are pictured at the right. Contrarily to the bootstrap Hill, the bootstrap PPWM EVI-estimates are quite stable as a function of the sub-sample size \(n_1\) (see Fig. 4, right).

Fig. 4
figure 4

Estimates of the OSF’s \(\hat{k}_{0}^{\bullet ,*}/n\) (left) and the bootstrap adaptive extreme value index estimates \(\hat{\xi }^{\bullet ,*}\) (right), as functions of the sub-sample size \(n_1\), for the amount paid on a closed claim

For a re-sample size \(n_1=[n^{0.955}] = 4554\), and \(B=250\) bootstrap generations, we were led to \(\hat{k}_{0}^{H,*} = 67\) and to \(\hat{\xi }^{H,*} = 0.3463\). This same algorithm applied to the PPWM estimator provide the bootstrap estimates \(\hat{k}_{0}^{PPWM,*} = 88\) and \(\hat{\xi }^{PPWM,*}=0.3301\).

5 Conclusions

In this paper we addressed the adaptive estimation of the EVI with the double bootstrap methodology associated to the Hill and the PPWM estimators. The presented simulation study shows that the adaptive PPWM EVI-estimator is usually less biased and provides a similar or a smaller RMSE than the adaptive Hill EVI-estimator. Moreover, the efficiency of the adaptive PPWM estimator relatively to the adaptive Hill estimator seems to improve as the asymptotic bias of estimators increases (as \(\rho \) increases). Further research concerning the sensitivity of the method on the choice of \(n_1\) will be addressed in the future.