1 Introduction

Motivation

Looking for the most accurate possible evaluation of the distribution of the sum of random variables, or vectors, or processes, with unknown distributions, has always been a classical problem in the probabilistic and statistical literature, with various answers depending on the given framework and on the specific application in view. On one hand, (uni- or multivariate) Central Limit Theorems (CLT) or Functional ones prove, under finite variance for the sum components or/and additional conditions, the asymptotic Gaussian behavior of the sum with some rate of convergence, focusing on the ’body’ of the distribution. When considering heavy-tailed marginal distributions, Generalized CLT with the convergence to stable distributions, handle the case of infinite variance (see e.g. Samorodnitsky and Taqqu (1994), Petrov (1995), and references therein), while, in the case of finite variance, an alternative way is to consider trimmed sums, removing extremes from the sample, to improve the rate of convergence; see e.g. Mori (1984), Hahn (1991) and references therein. When interested in tail distributions, CLTs may give poor results, especially when considering heavy tails. That is why different approaches have been developed, among which large deviation theorems (see e.g. Petrov (1975), Borovkov (2020) for light tails and Mikosch and Nagaev (1998), Foss et al. (2013), Lehtomaa (2017) for heavy tails, and references therein), extreme value theorems (EVT) focusing on the tail only (see e.g. Embrechts et al. (1997), de Haan and Ferreira (2006), Resnick (2007)), and hybrid distributions combining (asymptotic) distributions for both the main and extreme behaviors when considering independent random variables (see e.g. Csörgö et al. (1988), Zaliapin et al. (2005), Kratz (2014), Müller (2019), we use the name given in Kratz (2014) for this type of hybrid distribution/method/approach, namely Normex distribution/method/approach. Recall briefly the idea of Normex (for ’Norm(al)-Ex(tremes)’) method. It consists of rewriting the sum of random variables as the sum of their ordered statistics, and splitting it into two main parts, a trimmed sum removing the extremes, and the extremes. Using that the trimmed sum of the first \(n-k-1\) ordered statistics is conditionnally independent of the k largest order statistics, given the \((n-k)\)-th order statistics, we can express the distribution of the sum, integrating w.r.t. to the \((n-k)\)-th order statistics and using a CLT for the conditional trimmed sum, and an EVT one for the k largest order statistics. Note that a benefit of Normex approach is that it does not require any condition on the existence of moments, as the CLT applies on truncated random variables.

The following example (as developed in Kratz (2014)), simulating identically distributed and independent (iid) Pareto(\(\alpha\)) (such that \(\overline{F}(x) = x^{-\alpha }\) for \(x>1\)) random variables (rv), with \(\alpha =2.3\) (finite variance, but no third moment), illustrates perfectly the adding value of combining main and extreme behaviors.

Indeed, on the QQ-plots given in Fig. 1, we can compare the fit of the distribution of the empirical sample with the following three distributions: the Gaussian one (CLT approach), the Fréchet one (EVT approach), the hybrid one (i.e. Normex, combining the CLT and the distribution of the maximum). For the hybrid Normex distribution, we may consider either the exact distribution of the maximum, or its asymptotic approximation, the Fréchet distribution. Since both provide the same plot, we display it in the third plot on the right. Note that we choose a rather small number of components in the sum, \(n=52\) (corresponding to the aggregation of data on 1 year (or 52 weeks) as in financial applications), also to illustrate the speed of convergence when using asymptotic theorems. We observe that the CLT approach does not provide a sharp evaluation, even in the body of the distribution, due to this choice of n, which cannot compensate yet the fact that the distribution of the rv is asymmetric and skewed; increasing n will of course improve the fit in the body of the distribution. Given the fact that the Pareto distribution belongs to the Fréchet maximum domain of attraction, using the Fréchet distribution for the distribution of the sum of Pareto rv’s gives a very sharp approximation in the tail (from the 93% quantile), but not for the average behavior, as expected. Finally, a perfect match between empirical quantiles and Normex ones is observed for the whole distribution in the right plot, even for a small number of summands.

Fig. 1
figure 1

QQ-plots for the empirical distribution (sample size \(= 10^7\)) of the sum of 52 iid Pareto(\(\alpha = 2.3\)) rv and three different approximations of the sum distribution using: CLT, EVT and Normex, respectively. The red circles and numbers denote the extreme quantiles and their levels

Goal of the study

It is natural to extend the normex approach to a multivariate framework. With this goal of proposing a multi-normex method and distribution, we consider iid random vectors \(\mathbf {X}_1, \dots , \mathbf {X}_n\), with parent random vector \(\mathbf {X}\) having a heavy-tailed d-dimensional distribution \(F_{\mathbf {X}}\) and density \(f_{\mathbf {X}}\) (when existing). Note that there are different ways to define multivariate extremes (see e.g. chap. 8 in Beirlant et al. (2004)). The chosen way in this paper is w.r.t. the norm \(\Vert \cdot \Vert\) in \(\mathbb {R}^d\), meaning that the ordered (w.r.t. the norm) vector of \((\mathbf {X}_1, \dots , \mathbf {X}_n)\), denoted by \((\mathbf {X}_{(1)}, \dots , \mathbf {X}_{(n)})\), satisfies

$$\begin{aligned} \Vert \mathbf {X}_{(1)}\Vert \leqslant \Vert \mathbf {X}_{(2)}\Vert \leqslant \dots \leqslant \Vert \mathbf {X}_{(n)} \Vert . \end{aligned}$$
(1.1)

So, assuming \(F_{\mathbf {X}}\) heavy-tailed means that \(\Vert \mathbf {X}\Vert\) is a regularly varying rv with \(\alpha >0\), denoted by \(\Vert \mathbf {X}\Vert \in \mathcal{RV}\mathcal{}_{-\alpha }\), i.e. such that \(\displaystyle \lim _{t\rightarrow \infty } {\mathbb {P}} \left( \, \Vert \mathbf {X}\Vert>tx \, \right) /{\mathbb {P}} \left( \, \Vert \mathbf {X}\Vert >t \, \right) = x^{-\alpha }\), for \(x>0\).

We propose two versions of multi-normex. The first one, named d-Normex, is a natural extension to any dimension d of the univariate (\(d=1\)) normex method as developed in Kratz (2014): We approximate the distribution of the trimmed sum via the CLT and consider the distribution of the maximum \(\mathbf {X}_{(n)}\). This latter distribution is approximated via the Extreme Value (EV) theorem in the second multi-normex version, named MRV-Normex.

Aiming at proving the benefit of using a multi-normex distribution for a better fit of the whole (unknown) heavy-tailed distribution \(F_{\mathbf {X}}\), we focus analytically on the case \(\alpha \in (2,3]\) (when \(\Vert \mathbf {X}\Vert\) has a finite second moment, but no third moment) to compare the rates of convergence when using the CLT and the multi-normex approach, respectively. Note our focus on heavy tailed distributions (i.e. distributions belonging to the max domain of attraction of Fréchet), where the impact of using Normex distribution will be much stronger than in the light tail case (because of the one big jump principle), in particular for risk analysis and management. We prove that the normex approach leads, as expected, to a better speed of convergence for evaluating the distribution of the sum than the CLT does, for such type of heavy-tailed distributions. When varying the fatness of the tail measured by \(\alpha >0\), we draw this comparison numerically, using geometrical multivariate quantiles (see e.g. Chaudhuri (1996), Dhar et al. (2014) or a brief description in Kratz and Prokopenko (2021)).

Structure of the paper

In Section 2, besides general notation, we recall the normex approach and the generalized Berry-Esseen inequality. Then we give a specific result on conditional distributions of order statistics, which will be needed for the construction of multi-normex distributions. The two next sections develop the two multi-normex versions, d-Normex in Section 3 and MRV-Normex in Section 4. These sections have the same structure: we first define the multi-normex distribution, then we study analytically its rate of convergence, before ending with some examples. In Section 5, we consider those examples to study numerically the two versions of multi-normex distribution, comparing them with the empirical distribution of the sum (obtained via simulation) as well as, if relevant, with the Gaussian approximation when applying the CLT. Geometrical multivariate quantiles are computed to this aim and represented on QQ-plots. Section 6 concludes. The proofs of all analytical results are developed in the Appendix. More discussions on the existing literature with respect to our new results, with survey and additional examples or illustrations, can be found in Kratz and Prokopenko (2021).

2 Framework

2.1 Context and notation

Normex idea

The Normex method clearly adapts to a multivariate framework. Using this approach, we split the maximum from the rest of the sum:

$$\begin{aligned} \mathbf {S}_n= \sum _{k=1}^{n-1} \mathbf {X}_{(k)}+\mathbf {X}_{(n)}, \end{aligned}$$
(2.1)

taking into account the ’principle of one big jump’, namely that the asymptotic tail behavior of the sum of heavy-tailed random vectors is led by that of the maximum. Indeed, this principle extends to dimension \(d>1\) by borrowing the multivariate subexponential distribution definition of Samorodnitsky and Sun (2016) (see Section 4 therein):

$${\mathbb {P}} \left( \, \mathbf {S}_n>t\,\mathbf {A} \, \right) \; \underset{t\rightarrow \infty }{\sim } \; {\mathbb {P}} \left( \, \mathbf {X}_{(n)}>t \,\mathbf {A} \, \right) ,$$

where \(\mathbf {A} \subset \mathbb {R}^d\) is open, increasing, such that \(\mathbf {A}^c\) convex and \(\mathbf {0}\notin \bar{\mathbf {A}}\). Note also earlier works on that notion by Cline and Resnick (1992) and Omey (2006).

Combining the CLT for the trimmed sum given the maximum, and the distribution of the maximum or its asymptotic distribution, leads to a multivariate version of Normex distribution. We name this multi-normex distribution as d-Normex, when using the distribution of the maximum, and as MRV-Normex when considering its asymptotic distribution (when rescaled).

General notation

Before defining explicitly both versions of multi-normex distribution, let us introduce some general notation.

Let \(f_{(i)}\) denote the d-dimensional density, when existing, of the ordered vectors \(\mathbf {X}_{(i)}\), for \(i=1,\cdots ,n\). The cumulative distribution function (cdf) of the norm \(\Vert \mathbf {X}\Vert\) is denoted as \(F_{\Vert \mathbf {X} \Vert }(\cdot ),\) which we assume throughout the paper to be absolutely continuous, and its probability density function (pdf) as \(f_{\Vert \mathbf {X} \Vert }(\cdot )\). Nevertheless, some of our results will be stated under the slightly stronger condition \(F_{\mathbf {X}}\) absolutely continuous.

As we will work with truncated multidimensional distributions or vectors, let us introduce the following notions.

For any \(y > 0\), we define the truncated (via the norm) multidimensional distribution \(F_{\mathbf {X} \,\vert \, \Vert \mathbf {X}\Vert }(\cdot \,\vert \,y)\) of \(\mathbf {X}\) on \(\mathbb {R}^d\) as

$$\begin{aligned} F_{\mathbf {X}\,\vert \, \Vert \mathbf {X}\Vert }(\mathbf {B}\,\vert \, y) := {\mathbb {P}} \left( \, \mathbf {X} \in \mathbf {B} \,\vert \, \Vert \mathbf {X}\Vert \leqslant y \, \right) \end{aligned}$$
(2.2)

for any event \(\varvec{B}\) of the Borel sigma-field \(\mathcal{B}(\mathbb {R}^d)\). We denote by \(f_{\mathbf {X} \,\vert \,\Vert \mathbf {X}\Vert }(\cdot )\) its pdf, when existing:

$$\begin{aligned} f_{\mathbf {X}\,\vert \,\Vert \mathbf {X}\Vert }(\mathbf {x}\,\vert \,y)= \frac{f_{\mathbf {X}}(\mathbf {x})\,\mathbb {1}_{\left( \Vert \mathbf {x} \Vert \leqslant y\right) }}{ F_{\Vert \mathbf {X} \Vert }(y)}. \end{aligned}$$
(2.3)

Let \(\overset{\circ }{\mathbf {X}}_y \in \mathbb {R}^d\) denote the random vector with distribution \(F_{\overset{\circ }{\mathbf {X}}_y }\) on the \((d-1)\)-sphere \(\mathcal {S}_y = \left\{ \mathbf {x} \in \mathbb {R}^d\,:\, \Vert \mathbf {x}\Vert = y \right\}\) for \(y > 0\), defined by

$$\begin{aligned} F_{\overset{\circ }{\mathbf {X}}_y }(\mathbf {B}_y)= {\mathbb {P}} \left( \, \overset{\circ }{\mathbf {X}}_y \in \mathbf {B}_y \, \right) := {\mathbb {P}} \left( \, \mathbf {X} \in \mathbf {B}_y \,\vert \, \Vert \mathbf {X}\Vert = y \, \right) , \end{aligned}$$
(2.4)

where \(\mathbf {B}_y\) belongs to the trace Borel \(\sigma -\)algebra on \(\mathcal {S}_y\). With this definition, for any \(\mathbf {A} \subseteq \mathbb {R}^d\), we can write

$${\mathbb {P}} \left( \, \mathbf {X} \in \mathbf {A} \, \right) = {\mathbb {E}} \left[ {\mathbb {E}} [\mathbb {1}_{\mathbf {X} \in \mathbf {A}} \,\vert \, \Vert \mathbf {X}\Vert ] \right] = \int _0^{\infty } {\mathbb {E}} [\mathbb {1}_{\mathbf {X} \in \mathbf {A}}\,\vert \, \Vert \mathbf {X}\Vert =y]\, f_{\Vert \mathbf {X}\Vert }(y)dy,$$

thus

$$\begin{aligned} {\mathbb {P}} \left( \, \mathbf {X} \in \mathbf {A} \, \right) = \int \limits _{\mathbb {R}_+} {\mathbb {P}} \left( \, \overset{\circ }{\mathbf {X}}_y \in \mathbf {A}_y \, \right) f_{\Vert \mathbf {X}\Vert }(y) \mathrm {d}y,\quad \text{ where }\quad \mathbf {A}_y:= \mathbf {A}\cap \mathcal {S}_y. \end{aligned}$$
(2.5)

Generalized Berry-Esseen inequality

As we are going to compare the rates of convergence when using, respectively, the CLT and the Normex method, let us recall the rate of convergence in the CLT provided by the Generalized Berry-Esseen inequality (see e.g. Corollary 18.3 of Bhattacharya and Rao (2010)), when assuming a ’moderate’ heavy tail.

Proposition 2.1

  (Generalized Berry-Esseen inequality)

Let \(\mathbf {X}_{1}, \ldots , \mathbf {X}_{n}\) be i.i.d. centered random vectors with parent random vector \(\mathbf {X}\) with values in \(\mathbb {R}^{d}\), with positive-definite covariance matrix \(\Sigma\). If

$$\begin{aligned} {\mathbb {E}} \left\| \mathbf {X}\right\| ^{\alpha }<\infty , \quad \text {for some}\;\;\alpha \in [2,3], \end{aligned}$$

then

$$\begin{aligned} \sup _{\mathbf {B} \in \mathcal {C}}\left| {\mathbb {P}} \left( \, \mathbf {S}_n \in \mathbf {B} \, \right) -\Phi _{\mathbf {0}, \Sigma }(\mathbf {B})\right| \leqslant c \, {\mathbb {E}} \left( \left\| \Sigma ^{-1/2} \mathbf {X}\right\| ^{\alpha }\right) \,n^{-(\alpha -2) / 2}, \end{aligned}$$
(2.6)

where \(\mathcal {C}\) is the class of all Borel-measurable convex subsets of \(\mathbb {R}^{d}\), c is a positive universal constant, and \(\Phi _{\mathbf {0}, \Sigma }\) is the cdf of the centered normal multivariate distribution with covariance matrix \(\Sigma\).

Note that the (non-generalized) Berry-Esseen inequality, which holds for any \(\alpha \geqslant 3\), corresponds to (2.6) when taking \(\alpha = 3\).

2.2 A preliminary result on order statistics

We present a simple but elegant result on order statistics, needed for the proofs of the theorems, and also of interest in itself, completing the vast literature on order statistics. Its proof is given in Appendix 1.

Lemma 2.1

The distribution of the order statistics \(\mathbf {X}_{(1)}, \dots , \mathbf {X}_{(n - 1)}, \mathbf {X}_{(n )}\), conditionally on the event \(\Vert \mathbf {X}_{(n)}\Vert = y\), is the distribution of the \(n-1\) ordered statistics from the truncated distribution \(F_{\mathbf {X}\,\vert \,\Vert \mathbf {X}\Vert }(\cdot \,\vert \, y)\), and of an independent random vector \(\overset{\circ }{\mathbf {X}}_y\) defined on the \((d-1)\)-sphere \(\mathcal {S}_y\), for \(y > 0\):

$$\begin{aligned} \mathcal {L}\left( \mathbf {X}_{(1)}, \dots , \mathbf {X}_{(n -1)}, \mathbf {X}_{(n)} \,\big \vert \, \Vert \mathbf {X}_{(n)}\Vert = y \right) = \mathcal {L}\left( \mathbf {Y}_{(1)}, \dots , \mathbf {Y}_{(n - 1)}\right) \times \mathcal {L} \left( \overset{\circ }{\mathbf {X}}_y \right) , \end{aligned}$$
(2.7)

where \(\mathbf {Y}_1, \dots , \mathbf {Y}_{n- 1}\) are i.i.d. random vectors with multidimensional truncated distribution \(F_{\mathbf {X}\,\vert \,\Vert \mathbf {X}\Vert }(\cdot \,\vert \, y)\) defined in (2.2), and the random vector \(\overset{\circ }{\mathbf {X}}_y\) has the distribution \(F_{\overset{\circ }{\mathbf {X}}_y}(\cdot )\) defined in (2.4).

In particular, we have

$$\begin{aligned} \mathcal {L}\left( \mathbf {X}_{(1)}, \dots , \mathbf {X}_{(n -1)} \,\big \vert \, \Vert \mathbf {X}_{(n)}\Vert = y \right) = \mathcal {L}\left( \mathbf {Y}_{(1)}, \dots , \mathbf {Y}_{(n - 1)} \right) . \end{aligned}$$
(2.8)

3 A first multi-normex version: d-Normex

We start building a first multi-normex version, using the Normex approach (2.1), then approximating the distribution of the trimmed sum via the CLT and keeping the distribution of the maximum \(\mathbf {X}_{(n)}\). It is a natural extension to any dimension d of the univariate (\(d=1\)) Normex distribution as developed in Kratz (2014). Note that, when turning to data, the distribution of \(\mathbf {X}_{(n)}\) may be approximated e.g. via simulations or, as will be done in the MRV-Normex, using another asymptotic theorem, the Extreme Value (EV) one.

3.1 Definition and Rate of Convergence

Definition 3.1

The so-called \(\mathbf {d}\)-Normex distribution function is defined, for \(\mathbf {B} \subset \mathbb {R}^d\), as:

$$\begin{aligned} G_n(\mathbf {B})= {\mathbb {P}} \left( \, \mathbf {Z}+ \mathbf {X}_{(n)} \in \mathbf {B} \, \right) \end{aligned}$$

where \(\mathbf {Z}\) is, conditionally on the event \((\Vert \mathbf {X}_{(n)}\Vert = y)\), a Gaussian random vector with mean \((n-1)\mathbf {\mu }(y)\) and covariance matrix \((n-1) \Sigma (y)\), the functions \(\mathbf {\mu }(\cdot )\) and \(\Sigma (\cdot )\) are, respectively, the mean vector and covariance matrix of the truncated distribution \(F_{\mathbf {X}\,\vert \, \Vert \mathbf {X}\Vert }\) defined in (2.2).

Another way to formulate Definition 3.1 is the following:

$$\begin{aligned} G_n(\mathbf {B})= \int _{\mathbb {R}^d} f_{(n)}(\mathbf {x}) \;\Phi _{(n-1)\mathbf {\mu }(\Vert \mathbf {x}\Vert ),\, (n-1) \Sigma (\Vert \mathbf {x}\Vert ) }\left( \mathbf {B} - \mathbf {x}\right) \,\mathrm {d}\mathbf {x}, \end{aligned}$$
(3.1)

where \(\Phi _{m, \Gamma }\) denotes the cdf of the Gaussian vector with mean m and covariance matrix \(\Gamma\).

Let us turn to the evaluation of the Normex distribution for approximating the distribution of the sum of iid random vectors, studying its rate of convergence. We do it analytically. Although the multi-normex construction holds for any \(\alpha >0,\) we state the result when assuming the same condition on moments as in the generalized Berry-Esseen inequality, namely \(\alpha \in (2,3]\), to be able to compare the results and show explicitly the benefit of using Normex approximation. Then we show numerically the general (for any \(\alpha >0\)) good fit of d-Normex in an example (see Section 3.2).

The analytical result given in Theorem 3.1 shows that applying Normex method rather than the multivariate CLT improves, as expected, the accuracy of the evaluation of the (tail) distribution of the sum of heavy tailed vectors, with a better rate of convergence than the one of the CLT whenever the shape parameter \(\alpha \in (2,3]\).

Theorem 3.1

Let \(\mathbf {X}_{1}, \ldots , \mathbf {X}_{n}\) be i.i.d. random vectors with parent random vector \(\mathbf {X}\) with values in \(\mathbb {R}^{d}\) such that:

  • (C1) For all \(y >0\), the truncated (w.r.t. the norm) distribution \(F_{\mathbf {X}\,\vert \, \Vert \mathbf {X}\Vert }(\cdot \,\vert \, y)\) defined in (2.2) is nondegenerate (i.e. for all \(y > 0\) there is no hyperplane \(\mathcal {H} \subset \mathbb {R}^d\) such that \({\mathbb {P}} \left( \, \mathbf {X} \in \mathcal {H}\,\vert \, \Vert \mathbf {X}\Vert \leqslant y \, \right) = 1\)).

  • (C2) The distribution of the rv \(\Vert \mathbf {X}\Vert\) is absolutely continuous and regularly varying at infinity: \(\Vert \mathbf {X}\Vert \in \mathcal{RV}\mathcal{}_{-\alpha }\), with \(\alpha >0\).

Then, for any \(\alpha \in (2,3]\), there exists a slowly varying function \(L(\cdot )\) such that

$$\begin{aligned} \sup _{\mathbf {B} \in \mathcal {C}} \vert {\mathbb {P}} \left( \, \mathbf {S_n} \in \mathbf {B} \, \right) - G_n(\mathbf {B}) \vert \leqslant L(n) \, n^{- \frac{1}{2} + \frac{3-\alpha }{\alpha }}, \end{aligned}$$

where \(\mathcal {C}\) is the class of all Borel-measurable convex subsets of \(\mathbb {R}^{\mathbf {d}}.\)

Let us briefly indicate how to reach the upper bound of this main result; for more details, see the proof developed in Appendix 2. First, we use the law of total probability conditioning by \(\mathbf {X}_{(n)}\). Second, we apply the (non-generalized) Berry-Esseen inequality for the truncated r.v. \(\{\mathbf {Y}_i\}_{i \leqslant n-1}\) given the event \((\Vert \mathbf {X}_{(n)}\Vert = y)\). The right-hand side of the inequality is of the order of \(\frac{1}{\sqrt{n}} {\mathbb {E}} \Vert \mathbf {Y}\Vert ^{3}\), which is equivalent to \(\frac{1}{\sqrt{n}} {\mathbb {E}} \Vert \mathbf {X}_{(n)}\Vert ^{3-\alpha }\) whenever \(\alpha \leqslant 3\). Finally, to derive the upper bound of the main result, we use that \(\Vert \mathbf {X}_{(n)}\Vert\) is of the order \(n^{1/\alpha }\) under (C2).

Remark 3.1

  1. (i)

    We consider the case \(\alpha \in (2,3]\) since it is the condition under which the Generalized Berry-Esseen holds. For \(\alpha >3\), the bound given in Theorem 2.1 is the same as that of Berry-Esseen inequality, making the analytical comparison useless. Indeed, in such a case, the bound \(\frac{1}{\sqrt{n}} {\mathbb {E}} \Vert \mathbf {Y}\Vert ^{3}\) reduces simply to the order \(\frac{1}{\sqrt{n}}\) (see (8.6) in the proof), giving back the same rate as for the CLT. It means to look for an alternative way if we want to study analytically the Normex rate of convergence. We might use Edgeworth expansions, but it evolves too heavy computations (as we could experience for rv (case \(d=1\)), conditioning on \(X_{(n)}\)). This is why we show numerically the benefit of using Normex distribution, as illustrated in Section 5.

  2. (ii)

    The rate of convergence given in Theorem 3.1 is better than the one provided in the generalized Berry-Esseen inequality (Proposition 2.1), whenever \(\alpha \in (2,3)\) (and whatever n), as \(\frac{\alpha - 2}{2} < \frac{1}{2} - \frac{3-\alpha }{\alpha }\). Note also that, in the case \(\alpha = 3\) and \({\mathbb {E}} \Vert \mathbf {X}\Vert ^3 = \infty\), the inequality in Theorem 3.1 is slightly sharper than the inequality that can be obtained by the Berry-Esseen theorem (replacing \(n^{-1/2 + \varepsilon }\), \(\varepsilon > 0\), by \(L(n)n^{-1/2}\)).

  3. (iii)

    One can apply the Normex method with any norm on \(\mathbb {R}^d\), for instance the \(L^1\) norm defined, for \(\mathbf {x}=(x_1,\cdots ,x_d) \in \mathbb {R}^d\), by \(\displaystyle \Vert \mathbf {x}\Vert _1:= \sum _{i=1}^d \vert x_i \vert\). In such a case, for positive random variables, Condition (C2) translates into the assumption

    $$(C2^*) \qquad S_d:= \sum _{i=1}^d X^{(i)} \in \mathcal{RV}\mathcal{}_{-\alpha },$$

    where \(X^{(i)}\), for \(i=1,\cdots , d\), denote the components of \(\varvec{X}\). We may want to relate this \(\mathcal{RV}\mathcal{}\) property on the sum, with conditions on the random vector itself. This topic has already been investigated in the literature; see e.g. Basrak et al. (2002), Barbe et al. (2006), Mainik and Embrechts (2013), Cuberos et al. (2015). For instance, assuming \(\varvec{X}\) multivariate regularly varying, \(\varvec{X}\in \mathcal {MRV}_{-\alpha }(b,\nu )\) implies that the sum \(S_d\in \mathcal{RV}\mathcal{}_{-\alpha }(b)\). We will come back on the MRV notion in Section 4.

3.2 Example of the Multivariate Pareto-Lomax distribution

Let us consider a \(d-\)dimensional random vector \(\mathbf {X} = (X^{(1)}, \dots , X^{(d)})\) having a multivariate Pareto-Lomax\((\alpha )\) distribution, with \(\alpha >0\), i.e. with survival distribution function defined, for any non-negative real numbers \(x_1, \dots , x_d\), by

$$\begin{aligned} \overline{F}_{\mathbf {X}}(x_1, \dots , x_d) := {\mathbb {P}} \left( \, X^{(i)} > x_i,\, i=1,\cdots ,d \, \right) =\left( 1 + \sum _{i=1}^d x_i\right) ^{-\alpha }, \end{aligned}$$
(3.2)

from which marginal distributions, expectation and covariance matrix of a multivariate Pareto-Lomax vector follow via straightforward computations. We consider the case \(\alpha \in (2,3]\), as in Theorem 3.1 (even if the construction holds for any \(\alpha > 0\)). As d-Normex can be applied for any norm on \(\mathbb {R}^d\), we choose the example of the \(L_1\) norm \(\displaystyle \Vert \mathbf {x}\Vert _1:= \sum _{i=1}^d x_i\) to simplify the computations. We also take the example of \(d=3\) for illustration.

We can express the cdf of the rv \(\Vert \mathbf {X}\Vert\), for \(y>0\), as

$$\begin{aligned} F_{\Vert \mathbf {X}\Vert }(y)&= {\mathbb {P}} \left( \, \Vert \mathbf {X}\Vert \leqslant y \, \right) \\&= 1 - (1+y)^{-\alpha } - \alpha \,y (1 + y)^{-(\alpha +1)} -\frac{\alpha (\alpha +1)}{2}\, y^2(1+y)^{-(\alpha + 2)}. \end{aligned}$$

Now, let us compute the moments for the d-dimensional truncated Pareto-Lomax random vector, denoted by \(\mathbf {Y}\), having cdf \(F_{\mathbf {X}\,\vert \, \Vert \mathbf {X}\Vert }(\cdot \,\vert \,y)\) (see Definition 2.2), expectation \(\mathbf {\mu }(y)\) and covariance matrix \(\Sigma (y)\). We have, for any \(j \in \left\{ 1,2 \right\}\), if \(\alpha \ne 1\) (which is our case),

$$\begin{aligned} \begin{aligned} \mathbf {\mu }_j(y)&:= {\mathbb {E}} [Y_j] = {\mathbb {E}} \left[ X_j\,\big \vert \, \Vert \mathbf {X}\Vert \leqslant y \right] \\&= \dfrac{\left( y+1\right) ^{-\alpha -2}\big (\left( y+1\right) ^{\alpha +2}-\left( \alpha +2\right) y\left( 1+y\left( \alpha +1\right) \left( \alpha y+3\right) /6\right) -1\big )}{\left( \alpha -1\right) F_{\Vert \mathbf {X}\Vert }(y)}, \end{aligned} \end{aligned}$$

and, if \(\alpha \ne 1,2\) (also our case),

$$\begin{aligned} \begin{aligned}&{\mathbb {E}} \left[ Y_1 Y_2\right] = {\mathbb {E}} \left[ X_1 X_2\,\big \vert \, \Vert \mathbf {X}\Vert \leqslant y \right] = \frac{1}{F_{\Vert \mathbf {X}\Vert } (y)} \dfrac{1}{\left( \alpha -2\right) \left( \alpha -1\right) }\Bigl (1 - \left( y+1\right) ^{-\alpha -2}\times \Bigr .\\&\qquad \qquad \qquad \Bigl . \big (\left( \alpha +2\right) y\left( \left( \alpha +1\right) y\left( \alpha y \left( \left( \alpha -1\right) y/24+1/6\right) +1/2\right) +1\right) -1\big )\Bigr ), \end{aligned} \end{aligned}$$
$$\begin{aligned} \begin{aligned}&{\mathbb {E}} \left[ Y_1^2\right] = {\mathbb {E}} \left[ X_1^2\,\big \vert \, \Vert \mathbf {X}\Vert \leqslant y \right] =\frac{1}{F_{\Vert \mathbf {X}\Vert } (y)} \dfrac{2}{\left( \alpha -2\right) \left( \alpha -1\right) }\Bigg ( 1 - \left( y+1\right) ^{-\alpha -2}\times \\&\qquad \qquad \qquad {\bigg [\left( \alpha +2\right) y\Big (y\left( \alpha +1\right) \big (\alpha y\left( y\left( \alpha -1\right) /24+1/6\right) +1/2\big )+1\Big )+1\bigg ]\Bigg )}, \end{aligned} \end{aligned}$$

from which can be deduced the covariance matrix \(\Sigma (y)=(\Sigma _{ij}(y))_{i,j}\).

Therefore, the Gaussian cdf \(\displaystyle \Phi _{(n-1)\mathbf {\mu }(\Vert \mathbf {x}\Vert ),\, (n-1) \Sigma (\Vert \mathbf {x}\Vert ) }\) introduced in Definition 3.1 is explicitly determined, and so is the d-Normex distribution \(G_n\) defined in (3.1).

4 MRV-Normex

Here we investigate a more universal version of multi-normex, named MRV-Normex, using an asymptotic theorem for the maximum, namely the Extreme Value (EV) one. Given our focus on the sum of iid heavy-tailed (w.r.t. the norm) random vectors, we consider the standard extreme value theory (EVT) framework of multivariate regularly varying (MRV), a natural extension of the regular variation in a multivariate framework. In fact, to obtain the rate of convergence of this multi-normex approximation, we assume slightly stronger assumption than MRV, asking for a uniform asymptotic independence of the polar coordinates of the random vector, as made explicit in Condition \((M_{\Theta })\) of Theorem 4.1.

4.1 Rate of convergence in the EV Theorem: Discussion of its assumptions

In order to obtain the rate of convergence for the MRV-Normex approximation of the sum, we first need to discuss the rate of convergence in the Extreme Value Theorem to control the difference between the norm of the maximum \(\Vert \mathbf {X}_{(n)}\Vert\) and the limit Fréchet distribution.

Let \(\left\{ X_n, n \geqslant 1 \right\}\) be i.i.d. random variables with c.d.f. \(F_X\). By the EV Theorem, \(F_X\) belongs to the maximum domain of attraction (MDA) of an extreme-value distribution \(G_{\gamma }\), with \(\gamma \in \mathbb {R}\), i.e. there exist normalizing constants \(a_n > 0\) and \(b_n \in \mathbb {R}\) such that

$$\begin{aligned} {\mathbb {P}} \left( \, \max _{1\leqslant i\leqslant n} X_i\leqslant a_{n} x+b_{n} \, \right) =F_X^n(a_{n} x+b_{n}) \,\underset{n\rightarrow \infty }{\longrightarrow } \, G_{\gamma }(x). \end{aligned}$$
(4.1)

Note that the limit in (4.1) remains unchanged when replacing \((a_n)\) and \((b_n)\) with \((\tilde{a}_n)\) and \((\tilde{b}_n)\), as long as

$$\begin{aligned} \lim _{n \rightarrow \infty }\frac{\tilde{a}_n}{ a_n} = 1 \quad \text {and}\quad \lim _{n \rightarrow \infty }\frac{b_n - \tilde{b}_n}{a_n} = 0. \end{aligned}$$
(4.2)

Further discussion on the choice of \((a_n)\) and \((b_n)\) can be found in Kratz and Prokopenko (2021). Let us introduce the real function g defined on \(\mathbb {R}^+\) by :

$$\begin{aligned} g:=\left( \frac{1}{-\log F_X}\right) ^{\leftarrow } \end{aligned}$$
(4.3)

(\(^{\leftarrow }\) denoting the left-continuous inverse function).

It is straightforward to show that the convergence (4.1) is equivalent to

$$\begin{aligned} \lim _{t \rightarrow \infty } \frac{g(t x)-g(t)}{a(t)}=\frac{x^{\gamma }-1}{\gamma }, \quad \forall x > 0, \end{aligned}$$
(4.4)

for some \(\gamma \in \mathbb {R}\) and auxiliary positive function a defined on \(\mathbb {R}^+\). (The function g is said to be of extended regular variation, \(g\in E\mathcal{RV}\mathcal{}_\gamma (a)\)).

To describe the rate of convergence in the EV theorem, we refer to two studies developed for \(F_X\) belonging to any MDA, under slightly different assumptions (discussed in Kratz and Prokopenko (2021)), namely Falk and Marohn (1993) with a direct condition on the derivative of the distribution \(F_X\), and de Haan and Resnick (1996) assuming a second-order von Mises condition on g defined in (4.3). Focusing here on the case of \(F_X\in\)MDA(Fréchet), we look for a condition on \(F_X\) involving \(\mathcal{RV}\mathcal{}\) properties to replace the second-order von Mises condition on g and to retrieve the exact rate of convergence described in de Haan and Resnick (1996). This is presented in Proposition 4.1.

Proposition 4.1

Suppose \(\bar{F}_X\in \mathcal{RV}\mathcal{}_{-\alpha }\), \(\alpha >0\), and \(F_X\) is twice differentiable with pdf \(f_X\). The rate of convergence for the EV Theorem as given in de Haan and Resnick (1996), Theorem 4.1, holds when replacing the condition \(g\in 2\!-\!von\,Mises(-\alpha ,-\rho )\) (g being defined in (4.3)), where \(\rho >0\), by the condition

$$\begin{aligned} f_X(t) = c t^{-\alpha - 1} (1 + h(t)), \quad \text {with} \quad h\in \mathcal{RV}\mathcal{}_{-\beta }, \;\beta > 0. \end{aligned}$$
(4.5)

Namely, there exists a constant \(C>0\) (that is defined explicitly) such that

$$\begin{aligned} \lim _{n \rightarrow \infty } \frac{\sup \limits _{A \in B(\mathbb {R})}\left| {\mathbb {P}} \left( \, a_{n}^{-1}\big (\max \limits _{1\leqslant i \leqslant n} X_i - b_{n}\big ) \in A \, \right) -G_{\gamma }(A)\right| }{|A(n)|} = C, \end{aligned}$$
(4.6)

where \(a_n = n g'(n)\) and \(b_n = g(n)\).

Under these assumptions, the function \(\displaystyle A(t):=\frac{t \, g^{\prime \prime }(t)}{g^{\prime }(t)}-\gamma +1\) belongs to the \(\mathcal{RV}\mathcal{}_{-\rho }\) class with \(\rho := - \min \left\{ 1, \frac{\beta }{\alpha } \right\}\).

Indeed, since \(F_X\in\)MDA(Fréchet), the Potter bounds can be directly established from a \(2\mathcal{RV}\mathcal{}\) condition on \(g'\) via Proposition 4 in Hua and Joe (2011), which in turn follows from a \(2\mathcal{RV}\mathcal{}\) condition on \(f_X\) by Lemma 4.1 (which proof is provided in the Appendix 3). This latter condition is equivalent to our assumption (4.5), as stated in Lemma 3 from Hua and Joe (2011). Once we have the Potter bounds, we can replicate exactly the proof of Theorem 4.1 as given in de Haan and Resnick (1996), obtaining the same rate of convergence. In fact, Hua and Joe (2011) proved their results, Lemma 3 and Proposition 4, in the case \(\alpha < 0\), as they used them for \(g(t) = \bar{F}(t)\). But their proof can be repeated line by line for an arbitrary \(\alpha \in \mathbb {R}\).

Lemma 4.1

If \(\displaystyle f_X \in 2\mathcal{RV}\mathcal{}_{-\alpha - 1, -\beta }\), with \(\alpha >0\) and \(\beta >0\), then the derivative \(g'\) of g defined in (4.3), satisfies \(g'\in 2\mathcal{RV}\mathcal{}_{\frac{1}{\alpha }- 1,\, \rho }\), where \(\rho :=-\min \left\{ 1, \frac{\beta }{\alpha } \right\} (<0)\).

4.2 Rate of convergence for MRV-Normex

First, let us recall the MRV definition based on the pseudo-polar representation.

Definition 4.1

The random vector \(\varvec{X}\in \mathcal {MRV}_{-\alpha }\), with \(\alpha >0\), if there exists a d-dimensional random vector \(\varvec{\Theta }\) with values in the unit sphere \(\mathcal{S}_{1}\) in \(\mathbb {R}^d\) w.r.t. the norm \(\Vert \cdot \Vert\), such that, \(\forall t >0\),

$$\begin{aligned} \frac{{\mathbb {P}} \left( \, \Vert \varvec{X}\Vert>t\,u,\;\varvec{X}/\Vert \varvec{X}\Vert \in \cdot \, \right) }{{\mathbb {P}} \left( \, \Vert \varvec{X}\Vert >u \, \right) } \, {\mathop {\rightarrow }\limits ^{v}}\, t^{-\alpha }\,{\mathbb {P}} \left( \, \varvec{\Theta }\in \cdot \, \right) \quad \text {as} \quad u\rightarrow \infty . \end{aligned}$$
(4.7)

Using this definition of MRV, in particular the random vector \(\varvec{\Theta }\), we can define the MRV-Normex distribution as follows.

Definition 4.2

The so-called MRV-Normex distribution function is defined, for \(\mathbf {B} \subset \mathbb {R}^d\), by:

$$\begin{aligned} GM_n(\mathbf {B}):= {\mathbb {P}} \left( \, H_{\alpha ,n}\, \varvec{\Theta }+ Z \in \mathbf {B} \, \right) . \end{aligned}$$
(4.8)

with \(H_{\alpha ,n}:=a_n H_\alpha + b_n\) where the random variable \(H_\alpha\) (with \(\alpha >0\)) is Fréchet distributed (i.e. \(\displaystyle {\mathbb {P}} \left( \, H_\alpha \leqslant x \, \right) =e^{-x^{-\alpha }}\), for \(x>0\)) and independent of the random vector \(\varvec{\Theta }\) introduced in (4.7), the normalizing sequences satisfy the standard conditions of the EV theorem, namely \(a_n=c\,n^{1/\alpha }\) with \(c^{\,\alpha } := \lim \limits _{y \rightarrow \infty } y^{\alpha }\bar{F}_{\Vert X\Vert }(y)\), and \(b_n=0\). The d-dimensional random vector Z, also assumed to be independent of \(\varvec{\Theta }\), is, conditionally to the event \((H_{\alpha ,n} = y)\), with \(y>0\), normally \(\mathcal {N}_{(n-1)\mathbf {\mu }(y),\, (n-1) \Sigma (y) }\)-distributed, the mean vector and covariance matrix being those of the truncated distribution \(F_{\mathbf {X}\,\vert \, \Vert \mathbf {X}\Vert }(\cdot \vert y)\) defined in (2.2).

The MRV-Normex cdf can be rewritten as

$$\begin{aligned} GM_n(\mathbf {B}):= \int \limits _{0}^{\infty } f_{ H_{\alpha ,n}}(y) \,{\mathbb {P}} \left( \, y \, \varvec{\Theta }+ Z_y \in \mathbf {B} \, \right) \mathrm {d}y, \end{aligned}$$
(4.9)

where \(Z_y\) is \(\mathcal {N}_{(n-1)\mathbf {\mu }(y),\, (n-1) \Sigma (y)}\)-distributed and independent of \(\mathbf {\Theta }\).

Rate of convergence for the MRV-Normex approximation

We have the following result, which proof can be found in Appendix 3.

Theorem 4.1

Let \(\mathbf {X}_{1}, \ldots , \mathbf {X}_{n}\) be i.i.d. random vectors with parent random vector \(\mathbf {X}\) having values in \(\mathbb {R}^{d}\). Assume the following conditions:

  • (C1) given in Theorem 3.1 (namely, \(F_{\mathbf {X}\,\vert \, \Vert \mathbf {X}\Vert }(\cdot \,\vert \,y)\) non degenerate \(\forall y > 0\));

  • \((M_{\Vert \cdot \Vert })\) The distribution of the rv \(\Vert \mathbf {X}\Vert\) is absolutely continuous and its pdf \(f_{\Vert \mathbf {X}\Vert }\in 2\mathcal{RV}\mathcal{}_{-\alpha -1,-\beta }\) with \(\alpha > 0\), \(\beta > 0\);

  • \((M_{\Theta })\) There exists a function A such that \(A(t) \rightarrow 0\), \(A(t) \in \mathcal{RV}\mathcal{}_{-\rho }\) with \(\rho > 0\), and

    $$\begin{aligned} \sup \limits _{\mathbf {B} \in \mathcal {S}_1} \left| {\mathbb {P}} \left( \, \frac{\varvec{X}}{\Vert \varvec{X}\Vert }\in \mathbf {B}\,\big \vert \, \Vert \varvec{X}\Vert >t \, \right) - {\mathbb {P}} \left( \, \varvec{\Theta }\in \mathbf {B} \, \right) \right| \,\underset{t\rightarrow \infty }{\sim } \, A(t), \end{aligned}$$

    where the supremum is taken over all measurable subsets of \(\mathcal {S}_1.\)

Then, for any \(\alpha \in (2,3]\), \(\beta >0\) and \(\rho > 0\), there exists a slowly varying function \(L(\cdot )\) such that

$$\begin{aligned} \sup _{\mathbf {B} \in \mathcal {C}} \vert {\mathbb {P}} \left( \, \mathbf {S_n} \in \mathbf {B} \, \right) - GM_n(\mathbf {B}) \vert \, \leqslant \,\left( n^{- \frac{1}{2} + \frac{3-\alpha }{\alpha } } + n^{ - \frac{\rho }{\alpha }} + n^{-\min (1,\frac{\beta }{\alpha })}\right) L(n), \end{aligned}$$
(4.10)

where \(\mathcal {C}\) is the class of all Borel-measurable convex subsets of \(\mathbb {R}^{\mathbf {d}}\) and \(GM_n\) is defined in (4.9).

Remark 4.1

  1. 1.

    Compared with d-Normex (Theorem 3.1), there are two additional error terms in (4.10), price to pay for using the approximation for the maximum \(\Vert \mathbf {X}_{(n)}\Vert\). Nevertheless, if \(\rho > \alpha\) and \(\beta > \alpha\), then (4.10) rewrites as

    $$\sup _{\mathbf {B} \in \mathcal {C}} \vert {\mathbb {P}} \left( \, \mathbf {S_n} \in \mathbf {B} \, \right) - GM_n(\mathbf {B}) \vert \, \leqslant \,n^{- \frac{1}{2} + \frac{3-\alpha }{\alpha }} L(n),$$

    providing the same rate of convergence given in Theorem 3.1.

  2. 2.

    If the supremum considered in \((M_{\Theta })\) has a faster speed of convergence than that of \(\mathcal{RV}\mathcal{}\), then we can set \(\rho = \infty\) and exclude the term \(n^{ - \frac{\rho }{\alpha }}\) from inequality (4.10).

  3. 3.

    Discussion on Condition \((M_{\Theta })\) (see Kratz and Prokopenko (2021) for further details and proofs of the following statements):

    • Assuming \(\Vert \mathbf {X}\Vert \in \mathcal{RV}\mathcal{},\) Condition \((M_{\Theta })\), which requires uniform convergence, is closely related to the MRV definition (4.7). Replacing this technical condition \((M_{\Theta })\) with (4.7) might be investigated further.

    • If the norm \(\Vert \mathbf {X}\Vert\) and the direction \(\mathbf {X}/ \Vert \mathbf {X}\Vert\) of \(\mathbf {X}\) are independent, then \((M_{\Theta })\) is satisfied.

    • Assuming that the density \(f_{\mathbf {X}}(\mathbf {x})\) depends only on the norm \(\Vert \mathbf {x}\Vert ,\) does not guarantee that the distribution of \(\mathbf {X}/ \Vert \mathbf {X}\Vert\) is uniform on the unit sphere \(\mathcal {S}_1\). It is uniform on \(\mathcal {S}_1\) for \(L^p\)- norms (or their weighted versions) if and only if \(p = 1,2,\infty\) (as a measure on the unit sphere is not proportional to a measure on the unit ball for \(p \ne 1,2,\infty\)).

4.3 Examples

Let us develop two examples such that the marginal distributions are Pareto-Lomax, as in Example 3.2, so that it allows comparison with d-Normex. We consider two cases for the parent random vector \(\mathbf {X}\), when assuming its components to be, on one hand, independent, on the other hand, related with a survival Clayton copula. These are standard examples considered in the actuarial and risk literature (see e.g. Das and Kratz (2020)), in particular in reinsurance context for the Clayton copula is (see e.g. Dacorogna et al. (2018) and references therein). This second example includes itself two cases, when the polar coordinates of the considered vector \(\mathbf {X}\) are dependent (but asymptotically independent), and when they are independent. This latter case corresponds to Example 3.2.

We check that the conditions of Theorem 4.1 are satisfied whenever \(\alpha \in (2,3]\) (recall that this constraint on \(\alpha\) appears only for the analytical comparison with the generalized Berry-Esseen inequality), but apply the MRV-Normex distribution for any positive \(\alpha\), as the construction via asymptotic theorems remains valid whatever the value of this parameter.

4.3.1 Independent Pareto-Lomax marginals

Assume the components of the random vector \(\mathbf {X}\) to be iid with Pareto-Lomax(\(\alpha\)) distribution ((3.2) with \(d=1\)). Then its (non truncated) moments remain the same as in Example 3.2 and its covariance matrix is diagonal. As we developed Example 3.2 with the \(L_1\)-norm, let us switch here to the \(L^\infty\)-norm \(\Vert \cdot \Vert _{\infty }\), more convenient in terms of computations in this framework.

The cdf of the norm of the vector \(\mathbf {X}\) being, for \(\alpha >0\),

$$\begin{aligned} F_{\Vert \mathbf {X}\Vert _\infty }(y) = {\mathbb {P}} \left( \, \max \left\{ X^{(1)}, \cdots , X^{(d)} \right\} \leqslant y \, \right) = (1 - (1+y)^{-\alpha })^d, \quad \text {for} \; y > 0, \end{aligned}$$

straightforward calculations give the following expressions for the truncated moments:

$$\begin{aligned} \begin{aligned} \mu ^{(1)}(y)&= {\mathbb {E}} \left( X^{(1)}\,\vert \, \Vert \mathbf {X}\Vert _\infty \leqslant y\right) = {\mathbb {E}} \left( X^{(1)}\,\vert \, X^{(1)} \leqslant y\right) \\&= \frac{1}{1 - (1+y)^{-\alpha }} \left( \frac{1}{\alpha -1}\left( 1 - (1+y)^{-\alpha + 1} \right) - y(1+y)^{-\alpha }\right) , \end{aligned} \end{aligned}$$
$$\begin{aligned} {\mathbb {E}} \left( \left( X^{(1)}\right) ^2\,\big \vert \, \Vert \mathbf {X}\Vert _\infty \leqslant y\right) = \dfrac{2-(1+y)^{-\alpha }\left( \alpha y\left( \left( \alpha -1\right) y+2\right) +2\right) }{\left( \alpha -2\right) \left( \alpha -1\right) \left( 1 - (1+y)^{-\alpha }\right) }, \end{aligned}$$

and 0 for the truncated covariances. When looking for the distribution of \(\mathbf {\Theta }\), notice that, for any \(i\ne j\), for any \(\varepsilon _i > 0\) and \(\varepsilon _j > 0\), we have

$$\lim _{n \rightarrow \infty } {\mathbb {P}} \left( \, X^{(i)}> \varepsilon _i \,t, \,X^{(j)}> \varepsilon _j\, t\;\big \vert \, \Vert \mathbf {X}\Vert _\infty > t \, \right) = 0.$$

Therefore, the distribution of the random vector \(\mathbf {\Theta }\) is discrete on the unit sphere \(\mathcal {S}_1\), with values given by the basis vectors \(\mathbf {e}_i = (0,\cdots ,1,\cdots ,0)\) (where 1 is for the i-th component). It is straightforward to verify that \(F_{\Vert \mathbf {X}\Vert }\in 2\mathcal{RV}\mathcal{}_{-\alpha ,-\alpha }\), so that \((M_{\Vert \cdot \Vert })\) is satisfied, and that Condition \((M_{\Theta })\) holds with auxiliary function \(A(\cdot ) \in \mathcal{RV}\mathcal{}_{-\alpha }\). Finally, one may chose the normalizing sequences as \(a_n = (d \,n)^{1/\alpha }\) and \(b_n = -1\) (see Kratz and Prokopenko (2021) for further details). The numerical implementation of this MRV-Normex approximation is developed in Section 5, along with that of the d-Normex one, for any positive \(\alpha\), both multi-normex methods being compared to the Gaussian approximation whenever \(\alpha \geqslant 2\). The QQ-plots are drawn in Fig. 5.

4.3.2 Pareto-Lomax marginal distribution with survival Clayton copula

We introduce in this example some dependence among the components of \(\mathbf {X}\), choosing a survival Clayton copula (so, with upper tail dependence). To lighten the expressions of the computed moments, we choose \(d=2\). We consider \(\mathbf {X}=\left( X_{1}, X_{2}\right)\) with identical Pareto-Lomax \((\alpha , 1)\) marginal distributions, \(\alpha >1\), i.e. \(\bar{F}_{1}(x)=\bar{F}_{2}(x)=(1+x)^{-\alpha }, \ \forall x>0,\) and survival Clayton copula on \([0,1]^{2},\) with parameter \(\theta >0\), defined by

$$\mathbb {P}\left( X_{1}>x_{1}, X_{2}>x_{2}\right) =\left[ \left( 1+x_{1}\right) ^{\alpha \theta }+\left( 1+x_{2}\right) ^{\alpha \theta }-1\right] ^{-1 / \theta },$$

with pdf

$$f\left( x_{1}, x_{2}\right) = \frac{\alpha ^{2}(1+\theta )\left( 1+x_{1}\right) ^{\alpha \theta -1}\left( 1+x_{2}\right) ^{\alpha \theta -1}}{\left( \left( 1+x_{1}\right) ^{\alpha \theta } +\left( 1+x_{2}\right) ^{\alpha \theta }-1\right) ^{\frac{1}{\theta }+2}}.$$

Considering the \(L^\infty\)-norm, \(\Vert \cdot \Vert _{\infty }\), the survival cdf of the norm of the vector is:

$$\begin{aligned} \overline{F}_{\Vert \mathbf {X}\Vert _{\infty }}(t) = {\mathbb {P}} \left( \, \max \left\{ X_1, X_2 \right\} > t \, \right) = 2(1+t)^{-\alpha } -(2(1+t)^{\alpha \theta } - 1)^{-1/\theta }. \end{aligned}$$

We computed the truncated moments (of order 1 and 2), using an integral calculator (based on Maxima, a computer algebra system developed by W. Schelter, MIT), providing explicit but long expressions (displayed in Kratz and Prokopenko (2021)). For positive \(u_1\) and \(u_2\) such that \(\max \left\{ u_1,u_2 \right\} \geqslant 1\), we can write

$$\frac{ f_{\mathbf {X}}(t \mathbf {u})\, t^d }{ \overline{F}_{\Vert \mathbf {X}\Vert _\infty }(t) } = \frac{\alpha ^2(1+ \theta ) \left( u_1 + \frac{1}{t} \right) ^{\alpha \theta - 1} \left( u_2 + \frac{1}{t} \right) ^{\alpha \theta - 1}}{\left( 2 - 2^{-1/\theta }\right) (1 + O(\frac{1}{t}))\left( \left( u_1 + \frac{1}{t} \right) ^{\alpha \theta } + \left( u_2 + \frac{1}{t} \right) ^{\alpha \theta }\right) ^{1/\theta +2 } }$$

from which we deduce the limit as \(t\rightarrow \infty\), namely

$$\lim \limits _{t \rightarrow \infty } \frac{ f_{\mathbf {X}}(t \mathbf {u})\, t^d }{ \overline{F}_{\Vert \mathbf {X}\Vert _\infty }(t) } = \frac{\alpha ^2(1+ \theta ) u_1^{\alpha \theta - 1} u_2^{\alpha \theta - 1}}{\left( 2 - 2^{-1/\theta }\right) \left( u_1^{\alpha \theta } + u_2^{\alpha \theta }\right) ^{1/\theta +2 } } =: f_{\Theta }(\mathbf {u}).$$

Note that, for \(\Vert \cdot \Vert = \Vert \cdot \Vert _\infty\), the function \(\frac{ f_{\mathbf {X}}(t \mathbf {u}) t^d }{ \overline{F}_{\Vert \mathbf {X}\Vert }(t) }\) depends on t, therefore the rv \(\Vert \mathbf {X}\Vert\) and random vector \(\mathbf {X}/ \Vert \mathbf {X}\Vert\) are not independent. They will be independent when replacing the \(L^\infty\)-norm with the \(L^1\)-norm (\(\Vert \cdot \Vert = \Vert \cdot \Vert _1\)) and choosing \(\alpha \theta = 1\); this corresponds to the Pareto-Lomax Example 3.2. Turning to the conditions of Theorem 4.1, it is straightforward to check that \(F_{\Vert \mathbf {X}\Vert }\in 2\mathcal{RV}\mathcal{}_{-\alpha , -\min (\alpha \theta , 1)}\), so that \((M_{\Vert \cdot \Vert })\) is satisfied. Some computations are required for Condition \((M_{\Theta })\). We keep the maximum norm, i.e. \(\Vert \cdot \Vert = \Vert \cdot \Vert _\infty\), so that we exhibit an example with dependence between the polar coordinates (but with asymptotic independence), but consider the case \(\alpha \theta = 1\) to simplify the computations. We obtain:

$$\begin{aligned} \begin{aligned}&\sup _B \left| {\mathbb {P}} \left( \, \frac{\mathbf {X}}{\Vert \mathbf {X}\Vert } \in B \,\vert \, \Vert \mathbf {X}\Vert \geqslant t \, \right) - {\mathbb {P}} \left( \, \Theta \in B \, \right) \right| = \frac{1}{2} \int \limits _{\Vert \mathbf {u}\Vert \geqslant 1} \left| \frac{ f_{\mathbf {X}}(t \mathbf {u}) t^d }{\overline{F}_{\Vert \mathbf {X}\Vert } (t)} - f_{\Theta }(\mathbf {u}) \right| \mathrm {d}\mathbf {u} \\&=\frac{\alpha (\alpha +1)}{2t} \int \limits _{\Vert \mathbf {u}\Vert \geqslant 1} t\,\left| \frac{ c_{\alpha }\vert \mathbf {u}\vert ^{\alpha +2} - \overline{F}(t) t^{\alpha } ( \vert \mathbf {u}\vert + \frac{1}{t})^{\alpha + 2} }{ \overline{F}(t) \, t^{\alpha }( \vert \mathbf {u}\vert + \frac{1}{t})^{\alpha + 2} \,c_{\alpha } \vert \mathbf {u}\vert ^{\alpha +2}}\right| \mathrm {d}\mathbf {u}, \end{aligned} \end{aligned}$$
(4.11)

where \(c_{\alpha } = (2 - 2^{-\alpha })\) and \(\vert \mathbf {u}\vert = u_1 + u_2\).

We can easily find an upper bound of type \(c/\vert \mathbf {u}\vert ^{\alpha +2}\) for the integrand (4.11), and, noticing that this integrand converges, as \(t\rightarrow \infty\), to \(\frac{ \left| \hat{c}_{\alpha }\vert \mathbf {u}\vert + c_{\alpha }(\alpha +2) \right| }{ c_{\alpha }^2 \vert \mathbf {u}\vert ^{\alpha +3}}\) with \(\hat{c}_{\alpha } := 2^{-\alpha - 1} - 2\), we can conclude, via the dominated convergence theorem, that the last integral in (4.11) converges to

$$\begin{aligned} \int \limits _{\Vert \mathbf {u}\Vert \geqslant 1} \frac{ \left| \hat{c}_{\alpha }\vert \mathbf {u}\vert + c_{\alpha }(\alpha +2) \right| }{ c_{\alpha }^2 \vert \mathbf {u}\vert ^{\alpha +3}} \mathrm {d}\mathbf {u} < \infty . \end{aligned}$$
(4.12)

Combining (4.11) and (4.12) provides that Condition \((M_{\Theta })\) holds with \(A(t) = {C_{\alpha }}/{t}\) for some constant \(C_{\alpha } \in (0,\infty )\). As in the previous example (case of independent components), one may choose the normalizing sequences \(a_n = (c_\alpha n)^{1/\alpha }\) and \(b_n = -1\). We refer to the next section for the numerical implementation of this example; see Fig. 6a and b for the QQ-plots (see Kratz and Prokopenko (2021) for additional illustrations).

5 QQ-plots of the various examples, illustrating both versions of the multi-normex method

5.1 Construction of the QQ-plots

Considering the examples given so far, we illustrate the benefit of the multi-normex method on \(d-\)dimensional QQ-plots based on geometrical quantiles. We refer mainly to Dhar et al. (2014) for definitions and detailed explanations; for a brief overview, see Kratz and Prokopenko (2021). Here are a few first key ideas of these objects, to help interpret the plots displayed in this section. The geometrical quantile, as given in Definition 5.1, is a generalization to higher dimension of the 1-dimensional quantile that can be defined as the solution of some optimization problem. So, one can formulate the same optimization problem in the \(d-\)dimensional case, for which the solution will be named geometrical quantile.

Definition 5.1

(Geometrical quantile, see Chaudhuri (1996)) For a random vector \(\mathbf {X}\) with a probability distribution F on \(\mathbb {R}^{d}\), the \(\mathbf{d}\) -dimensional spatial quantile or geometrical quantile \(Q_{F}(\mathbf {u})=\left( Q_{F, 1}(\mathbf {u}), \ldots , Q_{F, d}(\mathbf {u})\right)\) is defined as

$$\begin{aligned} Q_{F}(\mathbf {u})=\arg \min _{\mathbf {Q} \in \mathbb {R}^{d}} {\mathbb {E}} \{ \Vert \mathbf {X}-\mathbf {Q}\Vert - \Vert \mathbf {X}\Vert - \langle \mathbf {u},\mathbf {Q} \rangle \}, \end{aligned}$$
(4.13)

with \(\mathbf {u} \in B^{d}:=\left\{ \mathbf {v} \in \mathbb {R}^{d},\Vert \mathbf {v}\Vert <1\right\}\) and \(\langle \cdot ,\cdot \rangle\) denoting the inner product.

In this way, the geometrical quantile is a point of \(\mathbb {R}^d\). While 1-dimensional quantiles are parameterized by the interval (0, 1), multidimensional ones are parameterized by a \(d-\)dimensional unit ball \(B^{d}\). In this context, we refer to vectors of the unit ball as levels. Although geometrical quantiles reflect the structure of a \(d-\)dimensional distribution, they are abstract objects and do not have a nice interpretation as 1-dimensional quantiles do. Only the median has a geometrical sense: given a random vector, the median is a point of \(\mathbb {R}^d\) such that the overall sum of the distances from this point to all values of the random vector is minimum (note that distances are multiplied by the ’probabilities’ that the vector takes the considered values, respectively). Moreover, if the vector has a finite second moment, then its extreme quantiles will share the same speed of convergence towards infinity as any vector having the same covariance matrix (see Girard and Stupfler (2015, 2017)). Nevertheless, we can construct QQ-plots with these geometrical quantiles, with the aim at comparing \(d-\)dimensional distributions by comparing the plots between each other. The construction of QQ-plots is similar as in the 1-dimensional case. Considering two distributions on \(\mathbb {R}^d\), we can solve the optimization problem for a fixed number of levels, say N, and obtain two sets of N geometrical quantiles: \(\left\{ \mathbf {q}_1, \dots , \mathbf {q}_N \right\}\) for the first distribution, and \(\left\{ \mathbf {q}'_1, \dots , \mathbf {q}'_N \right\}\) for the second. Then we draw a 2-dimensional QQ-plot for each component of the \(\mathbb {R}^d\)-geometrical quantiles to get a visualization, obtaining d QQ-plots: \(\left\{ \left( q_{i,j}, q'_{i,j}\right) ; j=1,\cdots ,N\right\}\), for \(i=1,\cdots ,d\). In Dhar et al. (2014), Theorem 2.2, the authors showed that for all \(i = 1,\cdots ,d\), any points (pairs of quantiles) in the i-th 2-dimensional plot will lie close to a straight line with slope 1 and intercept 0 if and only if the two distributions are equal. We are going to apply this result in our case, to check how the Normex distribution approximates the distribution of the sum.

Turning to our previous examples, we consider multidimensional distributions with Pareto-Lomax(\(\alpha\)) marginals, varying their heaviness through the parameter \(\alpha \in \left\{ 1.5, 2.3, 3.5 \right\}\), with the different structures of dependence given in Examples 3.2  & 4.3. We choose \(d\in \{2,3\}\) and \(n = 52\) (a rather small number to better illustrate the fast convergence of Normex). For each case, we evaluate both multi-normex distributions and the normal distribution (except for \(\alpha =1.5\)) obtained by the CLT; those distributions are evaluated empirically, via simulations, each sample being of size \(10^7\). Then we proceed to their comparisons via the QQ-plots. We construct the QQ-plots in 4 main steps:

  1. 1.

    Simulate all the distributions to be compared: the distribution of the sum \(\mathbf {S}_{n}\), the normal distribution from CLT, the d-Normex distribution \(G_n\), and the MRV-Normex distribution \(MG_n\). Namely,

    1. (a)

      To obtain a simulated sample of size \(10^7\) for the sum \(\mathbf {S}_{n}\), we simulate \(n\times 10^7\) random vectors \(\mathbf {X}\) from the considered distribution.

    2. (b)

      To obtain a simulated sample from the d-Normex distribution defined in (3.1): First, we build \(n = 52\) samples (of size \(10^7\)) from the d-dimensional multivariate Pareto distribution, from which we deduce a sample (of size \(10^7\)) for the (d-dimensional) maximum \(\mathbf {X}_{(n)}\) (see (1.1)). Second, for each element of the latter sample, we calculate its norm \(y = \Vert \mathbf {X}_{(n)}\Vert\) and simulate a normal vector with mean \((n-1)\mu (y)\) and covariance matrix \((n-1)\Sigma (y)\) (described in Definition 3.1), collecting then \(10^7\) Gaussian vectors. Finally, we sum maximum and normal vectors to produce a sample (of size \(= 10^7\)) from the d-Normex distribution.

    3. (c)

      In order to simulate the MRV-Normex distribution defined in (4.8), we start simulating a sample (of size \(10^7\)) for the vector \(\mathbf {\Theta }\) (representing the direction) and an independent sample for the Fréchet distributed rv \(H_{\alpha ,n}\) introduced in (4.9). Next, we collect the \(10^7\) normal vectors with mean \((n-1)\mu (y)\) and covariance matrix \((n-1)\Sigma (y)\), where, now, \(y =H_{\alpha ,n}\). Finally, we aggregate all the constructed samples according to (4.8).

  2. 2.

    Fix the set of levels \(\mathcal {L} \subset \left\{ \mathbf {v} \in \mathbb {R}^{d},\Vert \mathbf {v}\Vert <1\right\}\) with different lengths and directions. For \(d=3\), we choose 10 lengths, \(||\mathbf {v}|| \in \left\{ 0 , 0.2 , 0.4 , 0.6 , 0.8 , 0.9 , 0.9225, 0.945 , 0.9675, 0.99 \right\}\) (half for the body of the distribution and half for its tail), and all the directions with angle (or factor) \(\frac{\pi }{4}\) (to uniformly cover possible directions). It represents a total of 235 vectors. For \(d=2\), we choose 19 lengths and 8 directions with factor \(\frac{\pi }{4}\), meaning 145 vectors.

  3. 3.

    Calculate the geometrical quantiles for the simulated samples of all considered distributions. It means to solve the numerically optimization problem (5.1) for each empirical distribution and for all levels \(\mathbf {v}\) from the set \(\mathcal {L}\).

  4. 4.

    Draw a QQ-plot for the three pairs (or two, when the CLT cannot be applied): (sum, CLT), (sum, d-Normex) and (sum, MRV-Normex).

Note that the numerical implementation has been performed with Python (SciPy library). The scipy.optimize.minimize function based on the quasi-Newton method of Broyden, Fletcher, Goldfarb, and Shanno (see p.136 in Nocedal and Wright (2006)) has helped solve the numerical optimization problem (5.1) for empirical distributions. The gradient for the function has been calculated analytically. The computation time of one geometrical quantile was, on average, 53 seconds (on the computer i7 2GHz, 16 GB RAM).

5.2 Multivariate Pareto-Lomax distribution - Example 3.2

We first show the QQ-plots for the multivariate Pareto distribution, developed in Section 3.2, considering the dimension \(d= 3\), the number of summands \(n=52\), and varying \(\alpha\). We choose a small number of summands to highlight the performance of multi-normex methods, even in this case. The QQ-plots are given on a same row for each of the three components, and for each given approximation method (CLT, d-Normex and MRV-Normex). As expected, the plots are similar componentwise. A zoom of the center of the graph is given in the upper left corner of each plot, since it is the only region where the quantiles are so concentrated, making it more difficult to judge if they form a straight line. It should be noted that the center corresponds not only to levels with length less than or equal to 0.9 (marked in blue), but also to extreme levels (in red) with length greater than 0.9.

Case \(\alpha \in (2,3]\)

Figure 3 exhibits the QQ-plots in the case \(\alpha = 2.3,\) i.e. in the framework of the multi-normex theorems, when using the Gaussian approximation (via the CLT) and both multi-normex approximations. We observe that all the points (pairs of quantiles) for d-Normex and MRV-Normex lie much closer to the line with slope 1 and intercept 0 than for the Gaussian distribution. Thus, we can see numerically that both multi-normex approximations better describe the distribution of the vectorial sum \(\mathbf {S}_n\), as proved analytically in Theorems 3.1 and 4.1. While the QQ-plots look quite similar for the two versions of multi-normex approximations, when zooming, the fit is slightly better when using the d-Normex distribution than the MRV-Normex one that uses the Fréchet distribution for the distribution of the rescaled maximum.

Arbitrary \(\alpha > 0\)

Here we consider other examples of heavy tails than the case \(\alpha \in (2,3]\), to illustrate the benefit of using Normex distributions rather than applying the generalized CLT with a Gaussian distribution (for finite variance) or a stable one (when the variance is infinite). We do it numerically as the upper bound of the Berry-Esseen inequality does not allow us to compare analytically the rates of convergence in terms of the number of summands n. Nevertheless, the constant given in terms of \(\alpha\) and the number k of largest order statistics when trimming the sum, may make a difference, as noticed in the 1-dimensional case (see Kratz (2014)). We give two examples, when \(\alpha =1.5\) (Fig. 2), case where the summands have no variance, and when \(\alpha =3.5\) (Fig. 4), i.e. beyond the frame of the generalized Berry-Esseen inequality. In Fig. 2, we only have two rows as the CLT does not apply and we did not build the QQ-plot for the stable distribution. The fit looks very good for both multi-normex methods, with barely no difference of fit between the two, as shown in the zoomed part. Whenever \(\alpha =3.5\), we again clearly observe in Fig. 4 an overall fit that gives the advantage to the multi-normex distributions, but with more difference, when zooming, between d-normex and MRV-normex than in the previous cases \(\alpha =1.5\) and \(\alpha =2.3\).

5.3 Pareto-Lomax marginal distributions with various dependence structures - Examples 4.3

Here, we consider the examples developed in Section 4.3, with Pareto-Lomax(\(\alpha\)) marginal distributions, various dependence structures and the norm \(\Vert \cdot \Vert _{\infty }\) instead of \(\Vert \cdot \Vert _1\) as in the previous subsection. Figure 5 displays QQ-plots for independent components of the vector \(\mathbf {X}\), taking \(\alpha =2.3\) to be in the half-closed interval (2, 3] considered in our theorems. Here also, the good fit of the multi-normex distributions appears clearly, in particular when comparing with the Gaussian approximation. Nevertheless, the difference between the two multi-normex distributions is more pronounced in the center, as can be observed in the zoomed part.

When the dependence structure is given via a survival Clayton copula with parameter \(\theta\), we provide the QQ-plots in Fig. 6a assuming \(\alpha \theta = 0.5\). The previous observations hold too, with an increasing difference in the center between the two multi-normex distributions. The case \(\alpha \theta = 1\) is illustrated in Fig. 6b. Note that this choice of \(\theta =1/\alpha\) is a standard example in the literature as it makes analytical computations much more tractable. It would also correspond to Example 3.2 if choosing the \(L_1\)-norm \(\Vert \cdot \Vert _1\), allowing then a comparison of the results obtained respectively with the two norms. Recall that the analytical results (Theorems 3.1 and 4.1) are independent of the norm; it can also be observed numerically on (ranked) scatter plots (see Kratz and Prokopenko (2021)).

Fig. 2
figure 2

3-dimensional QQ-plots for the empirical distribution (sample size \(= 10^7\)) of the sum of 52 iid trivariate Pareto (\(\alpha = 1.5\)) random vectors, with two different approximations of the sum distribution: d-Normex (first row) and MRV-Normex (second row). Each column corresponds to a component (from the 1st to the 3rd). The red points on the plots correspond to extreme geometric quantiles (when the norm of the parameterized vectors is greater than 0.9)

Fig. 3
figure 3

3-dimensional QQ-plots for the empirical distribution (sample size \(= 10^7\)) of the sum of 52 iid trivariate Pareto (\(\alpha = 2.3\)) random vectors, with three different approximations of the sum distribution: CLT (first row), d-Normex (second row) and MRV-Normex (third row). Each column corresponds to a component (from the 1st to the 3rd). The red points on the plots correspond to extreme geometric quantiles (when the norm of the parameterized vectors is greater than 0.9)

Fig. 4
figure 4

3-dimensional QQ-plots for the empirical distribution (sample size \(= 10^7\)) of the sum of 52 iid trivariate Pareto (\(\alpha = 3.5\)) random vectors, with three different approximations of the sum distribution: CLT (first row), d-Normex (second row) and MRV-Normex (third row). Each column corresponds to a component (from the 1st to the 3rd). The red points on the plots correspond to extreme geometric quantiles (when the norm of the parameterized vectors is greater than 0.9)

Fig. 5
figure 5

3-dimensional QQ-plots for the empirical distribution (sample size \(= 10^7\)) of the sum of 52 iid random vectors with independent Pareto-Lomax components of the vectors (\(\alpha = 2.3\)), with three different approximations of the sum distribution: CLT (first row), d-Normex (second row) and MRV-Normex (third row). Each column corresponds to a component (from the 1st to the 3rd). The red points on the plots correspond to extreme geometric quantiles (when the norm of the parameterized vectors is greater than 0.9)

Fig. 6
figure 6

2-dimensional QQ-plots for the empirical distribution (sample size \(= 10^7\)) of the sum of 52 iid random vectors with Pareto-Lomax marginal distributions (\(\alpha = 2.3\)) joint by a Clayton copula with parameter \(\theta\) such that \(\alpha \theta =0.5\) for the case (a), and \(\alpha \theta =1\) for the case (b). The three rows correspond to the approximations of the sum distribution: CLT (first row), d-Normex (second row) and MRV-Normex (third row). For each case (a) and (b), the two columns correspond to the 1st and the 2nd components. The red points on the plots correspond to extreme geometric quantiles (when the norm of the parameterized vectors is greater than 0.9)

6 Conclusion

The purpose of this study was to build a sharp approximation of the whole distribution of the sum of iid random vectors under the presence of heavy tails. It has been done by extending the normex approach from a univariate to a multivariate framework, combining mean and extreme behaviors. We proposed two possible multi-normex distributions, named d-Normex and MRV-Normex. Both rely on the Gaussian distribution for describing the mean behavior, via the CLT, while the difference between the two versions comes from using the EV theorem or the exact distribution for the maximum. The main theorems establish the rate of convergence of each version of the multi-normex distributions towards the distribution of the sum. This is done analytically whenever the shape parameter \(\alpha\) of the tail of the marginal distribution belongs to the interval (2, 3], making the comparison with the generalized Berry-Esseen inequality relevant. For the MRV-Normex, second order regular variation conditions are needed to obtain the main theorem. Numerical comparisons are developed for any value of \(\alpha\), for both multi-normex distributions, considering examples with different structure of dependence for the random vectors. Illustrations are made through multidimensional QQ-plots based on geometrical quantiles.

We focused on the case of heavy tailed random vectors, as it is of most interest in the risk literature. Nevertheless, this method could be extended to light tails vectors (whenever \(1/\alpha =0\)) as the rate of convergence for the EV theorem is also known in such a case. It would then require to introduce a specific metric to evaluate the error for the whole distribution taking into account the impact of extremes. Moreover, the MRV-Normex approach has been developed conditioning on the norm of the maximum. It could be done conditioning on the maximum itself (a vector). To widen the applicability of the multi-normex methods, simple approximations of truncated moments could also be suggested (as numerical ones or evaluating them using e.g. a Pareto approximation). Finally, generalization of multi-normex distributions will be studied when introducing dependence between random vectors, then considering random processes. We intend to explore such topics in the near future. In the meantime, we are developing the statistical side of multi-normex, including building a statistical package.