1 Introduction

Risk measurement, with its crucial importance for financial institutions such as banks, insurance companies and investment funds, has drawn a lot of attention in both academia and industry over the past several decades. Although a financial risk, often modelled by a probability distribution, cannot be characterized by a single number, sometimes one needs to assign a number to a risk position. The determination of regulatory capital is one such example, the ranking of risks another. For such purposes, quantitative tools that map risks to numbers were introduced, and they are called risk measures.

Over the past three decades, value-at-risk (VaR) became the benchmark (Jorion [22]). Expected shortfall (ES), an alternative to VaR which is coherent (Artzner et al. [3]), is arguably the second most popular risk measure in use. In two recent consultative documents BCBS [4, 5], the Basel Committee on Banking Supervision proposed to take a move from VaR to ES for the measurement of market risk in banking. Under Solvency 2 and the Swiss Solvency Test, the same discussion takes place within insurance regulation; see, for instance, Sandström [34] and SCOR [35]. As a consequence, there have been extensive debates on issues related to diversification, aggregation, economic interpretation, optimization, extreme behaviour, robustness and backtesting of VaR and ES. We omit a detailed analysis here and refer to Embrechts et al. [15], Emmer et al. [16] and the references therein.

Here are some of the issues raised. VaR is not coherent, but it is elicitable (see Gneiting [18, Theorem 9]; that paper also contains some earlier references), easy to backtest and more robust with respect to statistical uncertainty, as argued in Gneiting [18] and Cont et al. [10]. ES is coherent, but not elicitable and difficult to backtest. There have been extensive discussions on the problematic diversification and aggregation issues of VaR due to its lack of subadditivity; see, for example, Embrechts et al. [14]. Daníelsson et al. [11] argue that the violation of subadditivity for VaR is rare in practice. VaR, being a quantile, does not address the crucial “what if” question, i.e. what are the consequences if a particular rare event (measured by VaR) occurs. Whereas this was clear since its introduction within the financial industry around 1994, it took some serious financial crises to bring this issue fully onto the regulatory agenda.

The importance of robustness properties of risk measures has only fairly recently become a focal point of regulatory attention. By now, numerous academic as well as applied papers address the topic. Conflicting views typically result from different notions of robustness; Embrechts et al. [15] contains a brief discussion and some references. In the present paper, the measurement of aggregated risk positions under uncertainty with respect to the dependence structure of the underlying risk factors will be discussed. We show that ES enjoys a new property of aggregation-robustness which VaR generally does not have.

The mathematical property of (non-)subadditivity of a risk measure becomes relevant upon analysing the aggregate position of a portfolio. As is often the case in practice, the dependence structure among individual risks in a portfolio is difficult to obtain from a statistical point of view, while the marginal distributions of the individual risks (assets) may typically be easier to model; see, for instance, Embrechts et al. [14] and Bernard et al. [7]. Modelling a high-dimensional dependence structure is well known to be data-costly, and dimension reduction techniques such as vine copulas, hierarchical structures and very specific parametric models often have to be implemented. Whereas such simplifying techniques in general create computational and modelling ease, they typically involve considerable model uncertainty. This leads to a notion of dependence uncertainty (DU) in risk aggregation, a concept of main interest for this paper.

From a mathematical or statistical point of view, it is clearly better to look at robustness properties of a model at the level of the joint distribution of the risk factors. The main reason for separating the two (marginals, dependence) is because of processes in practice, where indeed the two are often modelled separately. This is particularly true in a stress testing environment.

Hence for this paper, we introduce the notion of aggregation-robustness to study properties of risk measures for aggregation in the presence of dependence uncertainty. The new notion is based on the classic notion of robustness for statistical functionals in, e.g. Huber and Ronchetti [21]. However, as opposed to the conclusions in Cont et al. [10], we show that when model uncertainty lies solely at the level of the dependence structure, coherent distortion risk measures (such as ES) are continuous with respect to weak convergence of the underlying distributions, whereas VaR in general is not. This result supports the use of ES for risk aggregation, especially when statistical information on marginal distributions is reliable.

Under DU, the attainable values of a risk measure lie in an interval. This interval can be seen as a measurement of model uncertainty for that risk measure. When a risk measure is used to quantify the riskiness of an aggregate position of a portfolio, the ratio between the risk measure of the aggregate risk and the sum of the risk measures of the marginal risks is called a diversification ratio. The diversification ratio measures how good the risks in a portfolio hedge (compensate for) each other. With only models for marginal distributions available, the diversification ratio also takes values in a DU-interval.

To study the DU-interval of VaR and ES and their diversification ratios, one needs to calculate the worst-case and best-case values of VaR and ES under dependence uncertainty. Due to the subadditivity of ES, the worst-case value of ES is the sum of the ES of the marginal risks. However, the other three quantities (best- and worst-case VaR, best-case ES) are in general unknown. Partial results exist. The worst-case value of VaR for \(n=2\) was given in Makarov [27] based on early results in multivariate probability theory. Embrechts and Puccetti [13] gave a dual bound for the worst-case VaR for \(n\geqslant 3\) in the homogeneous model, i.e. when all marginal risks have the same distribution. Partial solutions for the worst-case and best-case values of VaR are to be found in Wang et al. [40], Puccetti and Rüschendorf [30] and Bernard et al. [7], based on the notion of complete mixability (CM) introduced in Wang and Wang [37]. A fast algorithm to numerically calculate the worst-case and best-case values of VaR under general conditions was introduced in Embrechts et al. [14]; this is the so-called rearrangement algorithm (RA). For the best-case ES, some partial analytical results can be found in Bernard et al. [7] and Cheung and Lo [9], and a numerical procedure was proposed by Puccetti [29].

In most of the existing analytical results, it is assumed that the marginal distributions are identical (homogeneous case), with some extra conditions on the shape of the underlying risk factor densities (assumed to exist). In this paper, we relax the assumptions on the marginal distributions. Instead of explicit values for the worst-case and best-case VaR, we obtain approximations. The new results obtained can be used within a discussion on capital requirement; they, moreover, yield a DU-interval for VaR and its diversification ratio.

Further understanding of the worst-case VaR can be obtained through the asymptotic behaviour as the number of risks in the portfolio grows to infinity, i.e. for a large portfolio regime. In the homogeneous case, Puccetti and Rüschendorf [31] obtained an asymptotic equivalence between the worst-case VaR and the worst-case ES under dependence uncertainty, and this under a strong condition on the identical marginal distributions. The required condition was later weakened by Puccetti et al. [32] (based on further results on complete mixability) and Wang [39] (based on a duality result obtained in Rüschendorf [33]). It was finally removed by Wang and Wang [38] (based on the notion of extreme negative dependence). When the marginal distributions are not identical, Puccetti et al. [32] also obtained the asymptotic equivalence under the assumption that only finitely many different choices of the marginal distributions can appear; this mathematically allows a reduction to the case of identical marginal distributions. In this paper, we give a unifying result on this asymptotic equivalence, by allowing the marginal distributions to be arbitrary. Only weak uniformity conditions on the moments of the marginal distributions are required for our results to hold. These conditions are easily justified in practice and are necessary for the most general equivalence to hold. The new results lead to the asymptotic DU-spread of VaR and ES, and show that VaR in general yields a larger DU-spread compared to ES.

The rest of the paper is organized as follows. In Sect. 2, we introduce the notion of aggregation-robustness and show that ES is aggregation-robust but VaR is not. In Sect. 3, we give new bounds on the diversification ratios under dependence uncertainty, and establish an asymptotic equivalence between VaR and ES under a worst-case scenario. The dependence uncertainty spreads of VaR and of ES are derived and compared in Sect. 4. In Sect. 5, numerical examples are presented to illustrate our results. Section 6 draws some conclusions. All proofs are put in the Appendix.

Throughout the paper, we let \((\varOmega, \mathcal{A}, \mathbb {P})\) be an atomless probability space and \(L^{0}:=L^{0}(\varOmega, \mathcal{A}, \mathbb {P})\) the set of all (equivalence classes of) real-valued random variables on that probability space. Elements of \(L^{0}\) are often referred to as risks. Their distribution functions are simply referred to as distributions. We write \(X\sim F\) to denote \(F(x)=\mathbb {P}[X\leqslant x]\), \(x\in \mathbb {R}\). We also denote the generalized inverse function of \(F\) by \(F^{-1}\), that is, \(F^{-1}(p)=\inf\{t\in \mathbb {R}:F(t)\geqslant p\}\) for \(p\in(0,1]\), and \(F^{-1}(0)=\inf\{t\in \mathbb {R}:F(t)> 0\}\).

2 Robustness of VaR and ES for risk aggregation

2.1 Robustness of risk measures

The robustness of a statistical functional or an estimation procedure describes the sensitivity to underlying model deviations and/or data changes. Different definitions and interpretations of robustness exist in the literature; see, for example, Huber and Ronchetti [21] from a purely statistical perspective, Hansen and Sargent [20] in the context of economic decision making, and Ben-Tal et al. [6] within optimization. In statistics, robustness mainly concerns the so-called distributional (or Hampel–Huber) robustness: the statistical consequences when the shape of the actual underlying distribution deviates slightly from the assumed model.

A risk measure \(\rho\) is a function which maps a risk in a set \(\mathcal{X}\) to a number, \(\rho: \mathcal{X}\rightarrow(-\infty ,+\infty]\), where \(\mathcal{X}\subset L^{0}\) typically contains \(L^{\infty}\) and is closed under addition and positive scalar multiplication. A risk measure is law-invariant if it only depends on the distribution of the risk. We omit the general introduction of risk measures, and refer the interested reader to Föllmer and Schied [17, Chap. 4]. Since law-invariant risk measures are a specific type of statistical functionals, their robustness properties are already extensively studied in the statistical literature; see, e.g. Huber and Ronchetti [21, Chap. 3].

In this paper, we focus on the two most popular risk measures: value-at-risk (VaR) at confidence level \(p\), defined as

$$ \mathrm {VaR}_{p}(X)= \inf\{x\in \mathbb {R}: \mathbb {P}[X\leqslant x]\geqslant p\},\quad p\in(0,1), X\in L^{0}, $$

and the expected shortfall (ES) at confidence level \(p\), defined as

$$ \mathrm {ES}_{p}(X)= \frac{1}{1-p}\int_{p}^{1} \mathrm {VaR}_{q}(X)\mathrm{d}q,\quad p\in(0,1), X\in L^{0}. $$
(2.1)

Clearly, \(\mathrm {VaR}_{p}(X)=F^{-1}(p)\) for \(p\in(0,1)\), where \(X\sim F\). Though typically it is assumed in (2.1) that \(\mathbb {E}[\vert X \vert]<\infty\), we may occasionally allow that \(\mathrm {ES}_{p}(X)=\infty\) for some \(X\). On the other hand, \(\mathrm {VaR}_{p}(X)\) is always a finite number for all \(X\in L^{0}\). Both risk measures occur most frequently in the setting of solvency requirements for financial institutions; hence the appearance of “regulatory risk measures” in the title of the paper.

It is often argued in the literature that quantile-based risk measures such as VaR are more robust when compared to mean-based risk measures such as ES; the notion of robustness used most often is Hampel’s (Hampel [19]). ES is only robust with respect to stronger metrics (e.g. the Wasserstein distance); arguments of this type can be found for instance in Cont et al. [10], Kou and Peng [23] and Emmer et al. [16]. More general results on continuity of law-invariant risk measures with respect to certain metrics on sets of probability measures are provided in Krätschmer et al. [25]. It is well known that the qualitative robustness of a statistical estimator, as in Hampel [19], is equivalent to the continuity of the corresponding risk measure at the true distribution, with respect to the weak topology. In general, to analyse statistical robustness of a given risk measure, one typically studies continuity properties of that risk measure in a given metric \(d\), say. It would hence be proper in such a case to talk about \(d\)-robustness. Hence we say that a law-invariant risk measure \(\rho\) is \(d\)-robust at a distribution \(F\) if \(d(F_{n},F)\to0\) implies that \(\rho(X_{n}) \to\rho(X)\), where \(X_{n} \sim F_{n}, n=1,2,\dots{}\), and \(X\sim F\). For example, the Lévy distance is used in Cont et al. [10] to measure the difference between any two univariate distributions \(F\) and \(G\) by

$$ d(F,G):=\inf\{\varepsilon >0: F(x-\varepsilon )-\varepsilon < G(x)< F(x+\varepsilon )+\varepsilon ,\ \forall x\in \mathbb {R}\}. $$

Note that the Lévy distance metrizes the weak topology on the set of distributions. A law-invariant risk measure is said to be robust in Hampel’s sense if it is continuous with respect to convergence in distribution. Other metrics can also be used for the analysis of robustness; see Krätschmer et al. [24, 25] and Cambou and Filipović [8]. It is a very classical result that the \(p\)th (lower) quantile functional \(F\mapsto F^{-1}(p)\) (and so \(\mathrm {VaR}_{p}\)) is weakly continuous at each \(F_{0}\) for which the mapping \(s\mapsto F_{0}^{-1}(s)\) is continuous at \(s = p\). A more general result can be found, for instance, in van der Vaart [36, Lemma 21.2]. In Krätschmer et al. [25], it is argued that Hampel’s notion of (statistical) robustness is less relevant for risk management. Using a different definition, they introduce a continuous scale of robustness.

In the following, we introduce a new, in our opinion practically relevant notion of robustness for risk aggregation, which turns out to favour ES over VaR.

2.2 Aggregation-robustness

In this section, we show that VaR is more sensitive to model uncertainty at the level of dependence than ES. For single risks \(X_{i}\), \(i=1,\dots,n\), the aggregate risk \(S\) is simply defined as \(S=X_{1}+\cdots+X_{n}\). Often in practice, a joint model of \(X_{1},\dots,X_{n}\) is chosen in two stages: \(n\) marginal distributions \(F_{1},\dots,F_{n}\) and a dependence structure (often through a copula \(C\)). Whereas the modelling of marginal distributions is fairly standard, the dependence structure can be really difficult to model, statistically estimate and test. Considerable model uncertainty, which is often different in nature from the model uncertainty of marginal distributions, arises from modelling the dependence structure. In the following, we study sensitivity with respect to uncertainty in the dependence structure; for the purpose of this paper, we assume the marginal distributions \(F_{1},\dots,F_{n}\) are given.

When the dependence structure between the risks is unknown, the possible distributions of \(S\) form a set. We denote the \((F_{1},\dots ,F_{n})\)-admissible class as

$$ \mathfrak{S}_{n}(F_{1},\dots,F_{n})=\{X_{1}+\cdots+X_{n}: X_{i}\sim F_{i},\ i=1,\dots ,n\}, $$

or simply \(\mathfrak{S}_{n}=\mathfrak{S}_{n}(F_{1},\dots,F_{n})\) if \((F_{1},\dots ,F_{n})\) is clear from the context. \(\mathfrak{S}_{n}\) is the set of all possible aggregate risks. Risk aggregation with dependence uncertainty concerns the probabilistic and statistical behaviour of \(S\in\mathfrak{S}_{n}\); in particular, \(\mathfrak{S}_{n}\) is closed with respect to the weak topology (see Bernard et al. [7]). We say that an admissible class \(\mathfrak{S}_{n}\) is compatible with a risk measure \(\rho: \mathcal {X}\rightarrow(-\infty,+\infty]\) if \(X_{i} \in \mathcal {X}\), \(X_{i}\sim F_{i}\) (note that this implies \(\mathfrak{S}_{n}\subset \mathcal {X}\) since \(\mathcal {X}\) is closed under addition) and \(\rho(X_{i})<\infty\), for \(i=1,\dots,n\).

Definition 2.1

(Aggregation-robustness)

We say that a law-invariant risk measure \(\rho: \mathcal {X}\rightarrow (-\infty,+\infty]\) is aggregation-robust if \(\rho\) is continuous with respect to weak convergence in each admissible class \(\mathfrak{S}_{n}\) compatible with \(\rho\).

The robustness character of Definition 2.1 in intuitively clear. If the joint distributions of \((X_{1},\dots,X_{n})\) and \((Y_{1},\dots ,Y_{n})\) are close according to the Lévy metric, then the distributions of \(X_{1}+\cdots+X_{n}\) and \(Y_{1}+\cdots+Y_{n}\) are also close according to the Lévy metric. As a consequence, \(\rho\) is insensitive to small perturbations of the joint distribution of the underlying risk factors, keeping the marginal distributions of the individual risks fixed. It is clear that Hampel’s robustness, as discussed above, without the restriction of risks being in a common admissible class, implies aggregation-robustness. When the dependence structure is modelled by copulas, our definition of robustness implies that a risk measure is insensitive to the copula of the individual risks when the marginal distributions are assumed to be known. The fact that in Definition 2.1 we look at risks in \(\mathfrak{S}_{n}\) reflects our interest in aggregation and diversification. One could, of course, look at other functional-robustness definitions beyond aggregation (summation).

Example 2.2

(VaR is not aggregation-robust) For \(t \in [0,1]\), let \(X_{t}\) and \(Y_{t}\) have the joint distribution \(C_{t}\) given by

$$\begin{aligned} C_{t}(x,y) & = txy+ (1-t)\big(\max\big\{ \min\{x,1/2\}+\min\{y,1/2\}-1/2,0\big\} \\ &\phantom{= txy+ (1-t)\big(}+\max\{x+y-3/2,0\}\big) \end{aligned}$$

for \(x,y\in[0,1]\). It is easy to see that \(X_{t}\) and \(Y_{t}\) are both U\([0,1]\)-distributed, hence \(C_{t}\) is a copula, for \(t\in[0,1]\). Note that \(C_{t}\), \(t\in (0,1)\), is a mixture of the independence copula \(C_{1}\) and another copula

$$\begin{aligned} C_{0}: [0,1]^{2} \rightarrow& [0,1], \\ (x,y) \mapsto& \max\big\{ \min\{x,1/2\}+\min\{y,1/2\}-1/2,0\big\} +\max\{ x+y-3/2,0\}. \end{aligned}$$

\(C_{0}\) is the ordinal sum of two Fréchet lower copulas; see Nelsen [28, Sect. 3.2.2].

It is immediate that the distribution of \(X_{t}+Y_{t}\) for \(t\in(0,1]\) is symmetric, centred at 1, with positive density on the interval \((1/2,3/2)\). Thus, \(\mathrm {VaR}_{1/2}(X_{t}+Y_{t})=1\). It is also straightforward that \(X_{0}+Y_{0}\) is a discrete random variable on \(\{1/2,3/2\}\) with \(\mathrm {VaR}_{1/2}(X_{0}+Y_{0})=1/2\). As a consequence,

$$ \mathrm {VaR}_{1/2}(X_{0}+Y_{0})\ne\lim_{t\rightarrow0 }\mathrm {VaR}_{1/2}(X_{t}+Y_{t}). $$

Based on the simple fact that \(X_{t}+Y_{t}\rightarrow X_{0}+Y_{0}\) weakly as \(t\) goes to zero, we conclude that \(\mathrm {VaR}_{1/2}\) is not aggregation-robust.

To build an example for \(\mathrm {VaR}_{p}\), \(p\in(1/2,1)\), let \(A\) be an event of probability \(2-2p\), independent of \(X_{t}\) and \(Y_{t}\), and let \(Z_{t}=\mathrm {I}_{A} X_{t}\), \(W_{t}=\mathrm {I}_{A} Y_{t}\) for each \(t\in[0,1]\). By construction it is clear that \(Z_{t}\), \(W_{t}\), \(t\in[0,1]\), are all identically distributed, and

$$ \mathrm {VaR}_{p}(Z_{t}+W_{t})=\mathrm {VaR}_{1/2} (X_{t}+Y_{t}), \quad t\in[0,1]. $$

Analogously to the above argument, we have \(d(Z_{t}+W_{t},Z_{0}+W_{0})\rightarrow0\) as \(t\) goes to zero, but \(\mathrm {VaR}_{p}(Z_{0}+W_{0})\ne\lim_{t\rightarrow0 }\mathrm {VaR}_{p}(Z_{t}+W_{t})\). Putting a negative sign in front of \(Z_{t}\) and \(W_{t}\), we obtain that \(\mathrm {VaR}_{p}\), \(p\in(0,1/2)\), is also discontinuous in an admissible class. This shows that \(\mathrm {VaR}_{p}\) is not aggregation-robust for any \(p\in(0,1)\).

The non-aggregation-robustness of \(\mathrm {VaR}_{p}\) essentially comes from the fact that it is not continuous with respect to weak convergence (Hampel’s robustness). Suppose that \(\mathrm {VaR}_{p}\) is not continuous at some distribution, say \(F_{0}\). One may find \(F_{n}, n\in \mathbb {N}\), which converge to \(F_{0}\) weakly, but \(F_{n}^{-1}(p),n\in \mathbb {N}\), do not converge to \(F_{0}^{-1}(p)\); if in addition, such \(F_{n},n\in \mathbb {N}\), and \(F_{0}\) lie in the same admissible class, then \(\mathrm {VaR}_{p}\) is not aggregation-robust. That leads to the construction in Example 2.2.

In the above example, the joint distribution \(C_{t}\) with a small \(t>0\) can be seen as the joint distribution \(C_{0}\) influenced by a small perturbation. It is moreover worth noting that in Example 2.2, the marginal distributions of \(X_{t}\) and \(Y_{t}\) are continuous with positive densities. Hence, even if the true marginal distributions are known to have positive densities, VaR can still be discontinuous in aggregation. When one considers absolutely continuous models for a single risk, one has Hampel’s robustness for \(\mathrm {VaR}_{p}\); however, considering absolutely continuous marginal models is not sufficient for the aggregation-robustness of \(\mathrm {VaR}_{p}\). On the other hand, we shall see that ES is aggregation-robust, although it is well known to be non-robust in Hampel’s sense (see Cont et al. [10]) since it is discontinuous at any distribution with respect to the weak topology.

For generality, we study the aggregation-robustness of distortion risk measures, defined as

$$ \rho(X)=\int_{\mathbb {R}} x\mathrm{d}h\big(F(x)\big),\quad X\in \mathcal {X}, X\sim F, $$
(2.2)

where \(\mathcal {X}\) is a set of random variables such that the integral in (2.2) is properly defined, and \(h:[0,1]\rightarrow[0,1]\) is a nondecreasing function with \(h(0)=0\), \(h(1)=1\); \(h\) is called the distortion function of \(\rho\). If \(h\) has left limits and is right-continuous, i.e. \(h\) is a distribution function on \([0,1]\), then

$$ \rho(X)=\int_{0}^{1} F^{-1}(t)\mathrm{d} h(t),\quad X\in \mathcal {X}, X\sim F. $$
(2.3)

See Wang et al. [41] for distortion risk measures in the context of insurance premium calculations, Kusuoka [26] for their connection with coherent risk measures, and Cont et al. [10] for their robustness properties. A distortion risk measure \(\rho\) is coherent if and only if \(h\) is convex, in which case \(\rho\) is called a spectral risk measure; see Acerbi [2]. Distortion risk measures are also closely related to \(L\)-statistics; see Huber and Ronchetti [21, Sect. 3.3]. For \(p\in (0,1)\), \(\mathrm {VaR}_{p}\) and \(\mathrm {ES}_{p}\) are special cases of distortion risk measures, with distortion functions \(h(t)=\mathrm {I}_{\{t\geqslant p\}}, t\in[0,1]\), and \(h(t)=\mathrm {I}_{\{t\geqslant p\}}(t-p)/(1-p), t\in[0,1]\), respectively.

Note that \(\mathcal {X}\) has to be closed under addition, hence it may fail to contain all \(X\) such that the integral in (2.2) is properly defined. For coherent distortion risk measures, one may consider the set \(\mathcal {X}_{0}\) defined by

$$ \mathcal {X}_{0}=\{X\in L^{0}: \mathbb {E}[X^{-}]< \infty\}\supset L^{1}. $$

It is easy to check by the convexity of \(h\) that all coherent distortion risk measures are properly defined on \(\mathcal {X}_{0}\). Our main result on aggregation-robustness now becomes:

Theorem 2.3

All coherent distortion risk measures on \(\mathcal {X}_{0}\) with a continuous distortion function are aggregation-robust.

As a coherent distortion risk measure has a convex distortion function, assuming continuity only excludes a jump of the distortion function at 1. Theorem 2.3 says that when the model uncertainty lies at the level of dependence but not at the level of the marginal distributions, coherent distortion risk measures such as ES are continuous with respect to weak convergence.

Our result can be interpreted as follows. For a distribution \(F\) and a random variable \(X \sim F\), even adding constraints on marginal distributions of the aggregation model of \(X\), \(F\mapsto \mathrm {VaR}_{p}(X)\) is still not continuous (with respect to weak convergence), whereas \(F\mapsto \mathrm {ES}_{p}(X)\) is continuous with these constraints. It should not be interpreted as an argument against the classic continuity results of VaR, noting that VaR is continuous at most commonly used distributions in financial risk management.

Remark 2.4

Cont et al. [10] also introduced the notion of \(\mathcal{C}\)-robustness, where \(\mathcal{C}\) is a set of distributions. A risk measure \(\rho\) is \(\mathcal{C}\)-robust if \(\rho\) is continuous in \(\mathcal{C}\) with respect to the Lévy distance; see Cont et al. [10, Proposition 2]. Using this notion, \(\mathrm {VaR}_{p}\) is \(\mathcal{C}_{p}\)-robust, where \(\mathcal{C}_{p}\) is the set of distributions \(F\) for which \(F^{-1}\) is continuous at \(p\). If we denote by \(\mathfrak{D}(\mathfrak{S}_{n})\) the set of all possible distributions for an admissible class \(\mathfrak{S}_{n}\), then \(\rho\) is aggregation-robust if and only if \(\rho\) is \(\mathfrak{D}(\mathfrak{S}_{n})\)-robust for all possible choices of \(n\in \mathbb {N}\) and \(\mathfrak{D}(\mathfrak{S}_{n})\), in which \(\mathfrak{S}_{n}\) is compatible with \(\rho\).

In the case \(\mathcal {X}=L^{\infty}\), we obtain that a continuous distortion function is a necessary and sufficient condition for the aggregation-robustness of distortion risk measures.

Theorem 2.5

A distortion risk measure on \(\mathcal {X}=L^{\infty}\) is aggregation-robust if and only if its distortion function \(h\) is continuous on \([0,1]\).

Finally, we remark that it would be of much interest to characterize aggregation-robust statistical functionals (risk measures) other than the class of distortion risk measures. Such a characterization is beyond the scope of this paper, and we leave it for future work.

3 Bounds on VaR aggregation

In Sect. 2, we mainly looked at the sensitivity properties of risk measures on aggregated risks under small changes of the underlying dependence assumptions. In this section, for VaR, we concentrate on deviations (possibly) far away from some true underlying, though unknown, dependence structure. Such results can be used to analyse extreme scenarios for risk aggregation and may be helpful in order to determine conservative capital requirements under model (i.e. dependence) uncertainty; for a real life example on this, see Aas and Puccetti [1].

3.1 Aggregation and diversification under dependence uncertainty

We start with the motivating notion of diversification ratio, which is closely related to the aggregation of VaR. Given a portfolio consisting of individual risks \(X_{1},\dots,X_{n}\), the diversification ratio of VaR at confidence level \(p\in(0,1)\) is defined as

$$ \varDelta ^{p}_{n}=\frac{ \mathrm {VaR}_{p}(X_{1}+\cdots+X_{n})}{\sum_{i=1}^{n}\mathrm {VaR}_{p}(X_{i})}. $$

The diversification ratio measures a kind of diversification benefit, and is, for instance, widely used in operational risk (see the examples in Embrechts et al. [14]). In the latter context, \(X_{i}\) corresponds to next year’s operational risk loss in business line \(i\), \(i=1,\dots,n\) (\(n=8\), typically); often explicit models for the loss-dependence among business lines are not available. For capital charge purposes, one estimates the total capital requirement for the superposition of the risks in each business line. One then typically adds up the risk measures across all business lines, and multiplies by a factor which is an estimate of \(\varDelta _{n}^{p}\). For this purpose, one needs a joint model of the risks \(X_{1},\dots,X_{n}\).

With a known joint distribution of \((X_{1},\dots,X_{n})\), \(\varDelta _{n}^{p}\) may be calculated theoretically. If \(\varDelta _{n}^{p}\leqslant1\), we say there is a diversification benefit in the portfolio; if \(\varDelta _{n}^{p}\geqslant 1\), we say there is a diversification penalty. When \(F_{1},\dots,F_{n}\) are known and the joint model of \((X_{1},\dots,X_{n})\) is unspecified, the worst diversification ratio is defined as

$$\begin{aligned} \overline{\varDelta }^{p}_{n} =&\frac{\sup\{\mathrm {VaR}_{p}(X_{1}+\cdots+X_{n}): X_{i}\sim F_{i}, i=1,\dots,n\} }{\sum _{i=1}^{n}\mathrm {VaR}_{p}(X_{i})} \\ =&\frac{\sup\{\mathrm {VaR}_{p}(S): S\in\mathfrak{S}_{n}\}}{\sum_{i=1}^{n}\mathrm {VaR}_{p}(X_{i})}. \end{aligned}$$

By definition, \(\overline{\varDelta }^{p}_{n}\geqslant 1\) if \(\sum_{i=1}^{n}\mathrm {VaR}_{p}(X_{i})> 0\). In the following, we denote the comonotonic VaR by \(\mathrm {VaR}_{p}^{+}(S_{n})\), i.e.

$$ \mathrm {VaR}_{p}^{+}(S_{n})=\sum_{i=1}^{n}\mathrm {VaR}_{p}(X_{i}). $$

Note here that \(S_{n}\) is symbolic and does not represent a particular random variable. The calculation of \(\overline{\varDelta }^{p}_{n}\), as a measure of the worst-case diversification effect of VaR, serves two purposes:

  • (Conservative capital requirement) \(\overline{\varDelta }^{p}_{n}\mathrm {VaR}_{p}^{+}(S_{n})\) can be used as the most conservative capital requirement in the case of given (or estimated) marginal distributions \(F_{1},\dots,F_{n}\) of the individual risks.

  • (Measurement of model uncertainty) If \(\overline{\varDelta }^{p}_{n}\) is small, then the model uncertainty is small, and the risk measure \(\mathrm {VaR}\) is considered as less problematic in risk aggregation; capital requirement principles based on \(\mathrm {VaR}_{p}^{+}\) become more plausible. If \(\overline{\varDelta }^{p}_{n}\) is large, then the model uncertainty is severe, and arguments of diversification benefit need to be taken with care.

The best diversification ratio, replacing the sup by an inf, can be studied similarly. Since we are more interested in the worst case (corresponding to a conservative capital requirement), we omit a discussion of the best diversification ratio.

In the recent literature (see, for instance, Embrechts et al. [15]), it was shown that the value of \(\overline {\varDelta }^{p}_{n}\) is closely related to the risk measure ES. Denote the worst-case ES by \(\overline {\mathrm {ES}}_{p}(S_{n})=\sup\{\mathrm {ES}_{p}(S): S\in \mathfrak{S}_{n}\}\); since ES is subadditive and comonotonic additive, we have that

$$ \overline {\mathrm {ES}}_{p}(S_{n})=\sum_{i=1}^{n}\mathrm {ES}_{p}(X_{i})=\mathrm {ES}_{p}^{+}(S_{n}), $$

where the latter +-notation is in line with the notation used for the comonotonic VaR case. Since VaR is bounded by ES, the worst-case VaR is bounded by the worst-case ES. If \(\mathrm {VaR}^{+}_{p}(S_{n})>0\), we have for \(\overline {\varDelta }^{p}_{n}\) the direct upper bound

$$ 1\leqslant \overline {\varDelta }^{p}_{n}\leqslant\frac{\mathrm {ES}^{+}_{p}(S_{n})}{\mathrm {VaR}^{+}_{p}(S_{n})}=\frac{\overline {\mathrm {ES}}_{p}(S_{n})}{\mathrm {VaR}^{+}_{p}(S_{n})}. $$
(3.1)

See also Embrechts et al. [15] for a discussion on this upper bound. Later in this section, we show that the second inequality in (3.1) is asymptotically sharp as \(n\rightarrow\infty\).

By definition, calculation of the worst diversification ratio is equivalent to the calculation of the worst-case VaR

$$ \overline{\mathrm {VaR}}_{p}(S_{n}):=\sup\{\mathrm {VaR}_{p}(S): S\in\mathfrak{S}_{n}\}. $$
(3.2)

For the history and a general discussion on problems related to (3.2) from the perspective of quantitative risk management, we refer to Embrechts et al. [15]. When \(F_{1}=F_{2}=\cdots=F_{n}=:F\), i.e. in the homogeneous case, Wang et al. [40] obtained \(\overline{\mathrm {VaR}}_{p}(S_{n})\) for \(F\) with a tail-decreasing density. If \(F_{1},\dots,F_{n}\) are not identical, explicit calculations of \(\overline {\mathrm {VaR}}_{p}(S_{n})\) and \(\overline {\varDelta }^{p}_{n}\) are not available in general. Embrechts et al. [14] introduced the rearrangement algorithm to numerically calculate \(\overline {\mathrm {VaR}}_{p}(S_{n})\) based on a discretized approximation.

Regarding the asymptotic behaviour of \(\overline {\mathrm {VaR}}_{p}(S_{n})\) and \(\overline {\varDelta }^{p}_{n}\), Puccetti and Rüschendorf [31] obtained that, as \(n\rightarrow\infty\),

$$ \frac{\overline {\mathrm {VaR}}_{p}(S_{n})}{\overline {\mathrm {ES}}_{p}(S_{n})}\rightarrow1, $$
(3.3)

in the homogeneous case under a restrictive condition on the marginal distributions. See also Wang [39] and Wang and Wang [38] for weaker conditions so that (3.3) holds. Puccetti et al. [32] considered the case when there are finitely many different marginal distributions in the sequence \(F_{1},F_{2},\dots\) and obtained the same equivalence (3.3). A consequence of (3.3) is that

$$ \lim_{n\rightarrow\infty} \overline {\varDelta }^{p}_{n}=\lim _{n\rightarrow\infty}\frac{\overline {\mathrm {ES}}_{p}(S_{n})}{\mathrm {VaR}^{+}_{p}(S_{n})}, $$
(3.4)

provided that the right-hand limit exists. That is, the second inequality in (3.1) is asymptotically sharp. However, as mentioned above, the existing results only deal with the (almost) homogeneous case, and some specific assumptions on the marginal distributions need to be imposed. Later in this section, we provide analytical approximations for \(\overline {\mathrm {VaR}}_{p}(S_{n})\) and \(\overline {\varDelta }^{p}_{n}\). Based on these results, we give a proof of (3.3) and (3.4) under very general conditions and, moreover, obtain a rate of convergence.

3.2 Bounds on VaR aggregation for a finite number of risks

In this section, we give inequalities for the worst-case and best-case VaR and its diversification ratio. For a distribution \(F_{i}\), define

$$ \mu^{(i)}_{p,q}=\frac{1}{q-p}\int_{p}^{q} F_{i}^{-1}(t)\mathrm{d}t, $$

for \(1\geqslant q>p\geqslant 0\), \(i=1,\dots,n\). Note that \(\mu^{(i)}_{0,q}\) and \(\mu ^{(i)}_{p,1}\) might be infinite. Using the above notation, it is immediate that

$$ \overline {\mathrm {ES}}_{p}(S_{n})=\sum_{i=1}^{n} \mathrm {ES}_{p}(X_{i})=\sum_{i=1}^{n} \mu^{(i)}_{p,1}. $$

For future discussion, we also denote the best-case VaR by \(\underline {\mathrm {VaR}}_{p}(S_{n})\), that is,

$$ \underline{\mathrm {VaR}}_{p}(S_{n})=\inf_{S\in\mathfrak{S}_{n}}\mathrm {VaR}_{p}(S), $$

and the best-case ES by \(\underline{\mathrm {ES}}_{p}(S_{n})\), that is,

$$ \underline{\mathrm {ES}}_{p}(S_{n})=\inf_{S\in\mathfrak{S}_{n}}\mathrm {ES}_{p}(S). $$

Analytical formulas for each of \(\overline {\mathrm {VaR}}_{p}(S_{n})\), \(\underline{\mathrm {VaR}}_{p}(S_{n})\) and \(\underline{\mathrm {ES}}_{p}(S_{n})\) are not available under general assumptions on the marginal distributions; see Bernard et al. [7] and Embrechts et al. [15] for existing results.

The following theorem contains our main result regarding approximations of \(\overline {\mathrm {VaR}}_{p}(S_{n})\) and \(\underline {\mathrm {VaR}}_{p}(S_{n})\).

Theorem 3.1

For any distributions \(F_{1},\dots,F_{n}\), we have for \(p\in(0,1)\) that

$$ \sup_{q\in(p,1]}\left\{\sum _{i=1}^{n}\mu ^{(i)}_{p,q}-\max_{i=1,\dots,n}\big(F_{i}^{-1}(q)-F_{i}^{-1}(p)\big)\right\} \leqslant \overline {\mathrm {VaR}}_{p}(S_{n})\leqslant \overline {\mathrm {ES}}_{p}(S_{n}), $$
(3.5)

and

$$ \sum_{i=1}^{n}\mu^{(i)}_{0,p} \leqslant \underline {\mathrm {VaR}}_{p}(S_{n})\leqslant\inf_{q\in[0,p)}\left\{\sum_{i=1}^{n}\mu ^{(i)}_{q,p}+\max _{i=1,\dots,n}\big(F_{i}^{-1}(q)-F_{i}^{-1}(p)\big)\right\}. $$
(3.6)

In particular, if \(F_{1},\dots,F_{n}\) are supported on \([a,b]\), \(a< b\), \(a,b\in \mathbb {R}\), then

$$ \overline {\mathrm {ES}}_{p}(S_{n})-(b-a) \leqslant \overline {\mathrm {VaR}}_{p}(S_{n}) \leqslant{ \overline {\mathrm {ES}}_{p}(S_{n})}. $$

Note that in the case when all marginal distributions are bounded, \(\overline {\mathrm {VaR}}_{p}(S_{n})\) and \(\overline {\mathrm {ES}}_{p}(S_{n})\) differ by at most a constant which does not depend on \(n\). Theorem 3.1 can also be formulated for the worst diversification ratio of VaR.

Corollary 3.2

For any distributions \(F_{1},\dots,F_{n}\), suppose that \(\mathrm {VaR}^{+}_{p}(S_{n})>0\). We have for \(p\in(0,1)\) that

$$ \sup_{q\in(p,1]}\frac{\sum _{i=1}^{n}\mu ^{(i)}_{p,q}-\max_{i=1,\dots,n}(F_{i}^{-1}(q)-F_{i}^{-1}(p))}{\mathrm {VaR}^{+}_{p}(S_{n})}\leqslant \overline {\varDelta }^{p}_{n}\leqslant\frac{\overline {\mathrm {ES}}_{p}(S_{n})}{\mathrm {VaR}^{+}_{p}(S_{n})}. $$
(3.7)

In particular, if \(F_{1},\dots,F_{n}\) are supported in \([a,b]\), \(a< b\), \(a,b\in \mathbb {R}\), then

$$ \frac{\overline {\mathrm {ES}}_{p}(S_{n})}{\mathrm {VaR}^{+}_{p}(S_{n})}-\frac {b-a}{\mathrm {VaR}^{+}_{p}(S_{n})}\leqslant \overline {\varDelta }^{p}_{n}\leqslant\frac{\overline {\mathrm {ES}}_{p}(S_{n})}{\mathrm {VaR}^{+}_{p}(S_{n})}. $$
(3.8)

In the homogeneous case \(F:=F_{1}=F_{2}=\cdots{}\), the left- and right-hand sides of (3.8) both converge to \(\frac{\mathrm {ES}_{p}(X)}{\mathrm {VaR}_{p}(X)}\) as \(n\rightarrow\infty\), where \(X\sim F\), assuming \(\mathrm {VaR}_{p}(X) \ne0\). In the following, we study the limit, as \(n\) goes to infinity, of the worst- and best-case VaR and its diversification ratio under general marginal assumptions.

3.3 Asymptotic equivalence and limit of the worst diversification ratio

Based on Theorem 3.1, we now derive the asymptotic equivalence between the worst-case VaR and the worst-case ES under very weak general conditions. For an asymptotic analysis, some uniformity conditions on \(F_{i},i\in \mathbb {N}\), need to be imposed. In what follows, \(X_{i}\) is any random variable with distribution \(F_{i}\), \(i\in \mathbb {N}\). Define the following conditions, for some \(p\in(0,1)\) and \(k>1\):

$$\begin{aligned} &\mathbb {E}\big[|X_{i}-\mathbb {E}[X_{i}]|^{k}\big]< M \quad \mbox{ for some }M>0, \end{aligned}$$
(3.9)
$$\begin{aligned} &\liminf_{n\rightarrow\infty}n^{-1/k}\sum_{i=1}^{n}\mathrm {ES}_{p}(X_{i})=+\infty, \end{aligned}$$
(3.10)
$$\begin{aligned} & C_{0}:=\liminf_{n\rightarrow\infty}\frac{1}{n}\sum_{i=1}^{n}\mathrm {ES}_{p}(X_{i})>0. \end{aligned}$$
(3.11)

The above conditions only concern the moments of \(F_{i}\), \(i\in \mathbb {N}\), and they are quite weak and commonly satisfied. Condition (3.9) is a uniform boundedness condition, ensuring that the aggregate portfolio \(S_{n}\) does not contain a single risk with a too heavy tail that dominates the other risks. Condition (3.10) is imposed to guarantee that the average ES of the sequence of risks does not vanish to zero too fast. The condition (3.11) is a stronger version of (3.10). In particular, in the homogeneous case when \(F_{i}\), \(i\in \mathbb {N}\), are identical, \(\mathrm {ES}_{p}(X_{1})>0\) implies (3.11) and hence it also implies (3.10). We also remark that condition (3.12) below is stronger than condition (3.9):

$$ \mathbb {E}[|X_{i}|^{k}] \mbox{ is uniformly bounded in } i. $$
(3.12)

Theorem 3.3

Suppose that the distributions \(F_{i},i\in \mathbb {N}\), satisfy (3.9) and (3.10) for some \(p\in(0,1)\) and \(k>1\). Then

$$\lim_{n\rightarrow\infty}\frac{\overline {\mathrm {VaR}}_{p}(S_{n})}{\overline {\mathrm {ES}}_{p}(S_{n})}= 1. $$
(3.13)

If in addition (3.10) is replaced by (3.11), then for sufficiently large \(n\),

$$1\geqslant \frac{\overline {\mathrm {VaR}}_{p}(S_{n})}{\overline {\mathrm {ES}}_{p}(S_{n})}\geqslant 1-Cn^{-1+1/k}, $$
(3.14)

where

$$ C=\left(\frac{1}{1-p}\frac{k}{k-1}+1\right) \frac {M^{1/k}}{C_{0}}+\varepsilon >0, $$

\(M\) is given in (3.9), \(C_{0}\) is given in (3.11), and \(\varepsilon \) is any fixed positive real number.

Theorem 3.3 establishes the asymptotic equivalence of the worst-case ES and the worst-case VaR for risk aggregation for general, possibly inhomogeneous portfolios. As mentioned in Sect. 3.1, homogeneous or almost-homogeneous cases for which (3.13) holds were previously obtained in the literature. While existing methods of proof were mainly based on the theory of complete mixability, an extension using the same techniques to arbitrarily many different marginal distributions was not possible.

Similarly to Theorem 3.3, we can obtain the limit of the best-case VaR bounds. In the following, we define the left-tail ES (LES) as

$$ \mathrm {LES}_{p}(X)=\frac{1}{p}\int_{0}^{p} \mathrm {VaR}_{q}(X)\mathrm{d}q=-\mathrm {ES}_{1-p}(-X), $$

and denote its best-case value under dependence uncertainty by

$$ \underline {\mathrm {LES}}_{p}(S_{n}):=\inf_{S\in\mathfrak{S}_{n}} \mathrm {LES}_{p}(S)=\sum_{i=1}^{n} \mathrm {LES}_{p}(X_{i})=\sum_{i=1}^{n} \mu_{0,p}^{(i)}, $$

where the second equality can be seen from the symmetry between ES and LES. For the best-case VaR bounds, we use a slightly different set of conditions. For some \(p\in(0,1)\) and \(k>1\), we introduce

$$\begin{aligned} &\liminf_{n\rightarrow\infty}n^{-1/k}\sum_{i=1}^{n}\mathrm {LES}_{p}(X_{i})=+\infty, \end{aligned}$$
(3.15)
$$\begin{aligned} &C_{0}:=\liminf_{n\rightarrow\infty}\frac{1}{n}\sum_{i=1}^{n}\mathrm {LES}_{p}(X_{i})>0. \end{aligned}$$
(3.16)

The following corollary is obtained from Theorem 3.3 by symmetry:

Corollary 3.4

Suppose that the distributions \(F_{i},i\in \mathbb {N}\), satisfy (3.9) and (3.15) for some \(p\in (0,1)\) and \(k>1\), then

$$ \lim_{n\rightarrow\infty}\frac{\underline {\mathrm {VaR}}_{p}(S_{n})}{\underline {\mathrm {LES}}_{p}(S_{n})}=1.$$

If, in addition, (3.15) is replaced by (3.16), then for sufficiently large \(n\),

$$1\geqslant \frac{\underline {\mathrm {VaR}}_{p}(S_{n})}{\underline {\mathrm {LES}}_{p}(S_{n})}\geqslant 1-Cn^{-1+1/k}, $$

where

$$ C=\left(\frac{1}{1-p}\frac{k}{k-1}+1\right) \frac {M^{1/k}}{C_{0}}+\varepsilon >0, $$

\(M\) is given in (3.9), \(C_{0}\) is given in (3.16), and \(\varepsilon \) is any fixed positive real number.

Remark 3.5

The conditions (3.15) and (3.16) are slightly stronger than (3.10) and (3.11), respectively, and this asymmetry is due to the fact that we mainly consider the cases when the aggregate risk measures LES and ES are positive. The asymmetry can be trivially removed by assuming

$$ \liminf_{n\rightarrow\infty}\Biggl|\frac{1}{n}\sum_{i=1}^{n}\mathrm {LES}_{p}(X_{i})\Biggr|>C_{0} $$

instead of (3.15).

Finally, we remark that the limit of \(\overline {\varDelta }^{p}_{n}\) as \(n\rightarrow\infty\) can be obtained directly from Theorem 3.3. Suppose the continuous distributions \(F_{i},i\in \mathbb {N}\), satisfy (3.9) and (3.10) for some \(p\in(0,1)\) and \(k>1\). Then, as \(n\rightarrow\infty\),

$$ \overline{\varDelta }_{n}^{p}\frac{\mathrm {VaR}^{+}_{p}(S_{n})}{\overline {\mathrm {ES}}_{p}(S_{n})}\rightarrow1. $$

If in addition \(R_{p}:=\displaystyle\lim_{n\rightarrow\infty}\frac {\overline {\mathrm {ES}}_{p}(S_{n})}{\mathrm {VaR}^{+}_{p}(S_{n})}\) exists in \([1,\infty]\), then \(\varDelta _{n}^{p} \rightarrow R_{p}\) as \(n\rightarrow\infty\).

4 Uncertainty spread of VaR and ES

In addition to the distribution-wise continuity as discussed in Sect. 2, in this section, based on results obtained in Sect. 3, we study the uncertainty spread of VaR and ES when the dependence structure is unspecified. This quantifies the magnitude of dependence uncertainty in a model for risk aggregation. We show that VaR generally exhibits a larger spread compared to ES. This result suggests that VaR is more sensitive to dependence uncertainty compared to ES. For \(p\in(0,1)\), we define the dependence uncertainty spread (DU-spread) of \(\mathrm {VaR}_{p}\) as

$$ \overline{\mathrm {VaR}}_{p}(S_{n})-\underline{\mathrm {VaR}}_{p}(S_{n}), $$

and of \(\mathrm {ES}_{p}\) as

$$ \overline{\mathrm {ES}}_{p}(S_{n})-\underline{\mathrm {ES}}_{p}(S_{n}). $$

See Embrechts et al. [15] for a discussion on the DU-spread of VaR and its relevance in risk management.

By definition, \(\mathrm {ES}_{p}(X)\geqslant \mathrm {VaR}_{p}(X)\) for any risk \(X\), and the inequality is strict when \(X\) has a continuous distribution. Naturally, when switching from \(\mathrm {VaR}\) to \(\mathrm {ES}\) for the purpose of capital requirement, one should consider a lower confidence level for \(\mathrm {ES}\). In the most recent consultative document BCBS [5], it was proposed that for internal risk models, \(\mathrm {VaR}_{0.99}\) should be replaced by \(\mathrm {ES}_{0.975}\) which often yields a similar value to \(\mathrm {VaR}_{0,99}\) for light-tailed risks. Under the Swiss Solvency Test (SST), \(\mathrm {VaR}_{0.995}\) is used to compare with \(\mathrm {ES}_{0.99}\) to calculate the capital requirement for the change in the risk bearing capital (RBC) over a one-year period; see EIOPA [12, p. 32]. Kou and Peng [23] also proposed that in order to compare with \(\mathrm {ES}_{p}\), one could use the corresponding median shortfall (MS), which is the median of the conditional tail distribution above \(\mathrm {VaR}_{p}\) and hence satisfies

$$ \mathrm{MS}_{p}(X)=\mathrm {VaR}_{(p+1)/2}(X); $$

thus it is consistent with the SST regime. Hence it may be useful to compare the DU-spread of \(\mathrm {VaR}_{q}\) and that of \(\mathrm {ES}_{p}\) for \(q\geqslant p\). The following proposition compares the DU-spread of \(\mathrm {VaR}_{q}\) and that of \(\mathrm {ES}_{p}\) in the asymptotic sense. In what follows, we denote by \(\mu_{n}\) the summation of the means of \(F_{1},\dots,F_{n}\), assumed to exist. We need an additional condition to avoid degenerate cases: for some \(p\in(0,1)\),

$$\begin{aligned} \liminf_{n\rightarrow\infty} \frac{1}{\mu_{n}}{ \sum_{i=1}^{n}\mathrm {ES}_{p} (X_{i})}>1. \end{aligned}$$
(4.1)

Theorem 4.1

Suppose \(1>q\geqslant p>0\).

  1. (i)

    Suppose that the distributions \(F_{i},i\in \mathbb {N}\), satisfy (3.9), (3.15) and (4.1). Then

    $$\begin{aligned} \liminf_{n\rightarrow\infty}\frac{\overline{\mathrm {VaR}}_{q}(S_{n})-\underline {\mathrm {VaR}}_{q}(S_{n})}{\overline{\mathrm {ES}}_{p}(S_{n})-\underline {\mathrm {ES}}_{p}(S_{n})} =&\liminf_{n\rightarrow\infty}\frac{\overline {\mathrm {ES}}_{q}(S_{n})-\underline {\mathrm {LES}}_{q}(S_{n})}{\overline {\mathrm {ES}}_{p}(S_{n})-\underline {\mathrm {ES}}_{p}(S_{n})} \\ \geqslant & \liminf_{n\rightarrow\infty}\frac{\overline {\mathrm {ES}}_{q}(S_{n})-\mu_{n}}{\overline {\mathrm {ES}}_{p}(S_{n})-\mu_{n}}\geqslant 1. \end{aligned}$$
    (4.2)
  2. (ii)

    Suppose that the distributions \(F_{i},i\in \mathbb {N}\), are identical and equal to a non-degenerate distribution \(F\), and \(\mathbb {E}[|X|^{k}]<\infty\) for some \(k>1\), where \(X\sim F\). Then

    $$ \liminf_{n\rightarrow\infty }\frac {\overline{\mathrm {VaR}}_{q}(S_{n})-\underline{\mathrm {VaR}}_{q}(S_{n})}{\overline{\mathrm {ES}}_{p}(S_{n})-\underline {\mathrm {ES}}_{p}(S_{n})}\geqslant \frac{\mathrm {ES}_{q}(X)-\mathrm {LES}_{q}(X)}{\mathrm {ES}_{p}(X)-\mathbb {E}[X]}\geqslant 1. $$
    (4.3)

Theorem 4.1 suggests that VaR is overall more sensitive to dependence uncertainty for large \(n\), compared to ES. Numerical evidence of the comparison of the DU-spreads for VaR and ES at the same level can be found in Sect. 5, even for small values of \(n\). Note that although the DU-spread of ES is smaller than that of VaR asymptotically, both risk measures have a rather large uncertainty spread in general, suggesting that dependence uncertainty in risk aggregation must be taken with care, no matter whether ES or VaR is chosen as the underlying risk measure; see Aas and Puccetti [1] for values in the context of a real life example.

Remark 4.2

In the homogeneous case, for any continuous distribution \(F\), the limit of the DU-spread ratio in (4.3) is strictly greater than 1 since \(\mathrm {LES}_{q}(X)<\mathbb {E}[X]\) and \(\mathrm {ES}_{q}(X)>\mathrm {ES}_{p}(X)\). In the case \(q=p\), we note that for light-tailed risks \(X\), \(\mathrm {LES}_{p}(X)\) is slightly smaller than \(\mathbb {E}[X]\); for heavy-tailed risks \(X\), \(\mathrm {LES}_{p}(X)\) can be significantly smaller than \(\mathbb {E}[X]\), leading to a much larger DU-spread of VaR. From Theorem 4.1, we can also see that, approximately, the \(\mathrm {VaR}_{q}\) interval under DU is \([\sum _{i=1}^{n}\mathrm {LES}_{q}(X_{i}),\ \sum_{i=1}^{n}\mathrm {ES}_{q}(X_{i})]\) and the \(\mathrm {ES}_{p}\) interval under DU is given by \([\sum_{i=1}^{n} \mathbb {E}[X_{i}],\ \sum_{i=1}^{n}\mathrm {ES}_{p}(X_{i})]\).

In the following, we give a result for finite \(n\), in the case of bounded risks. A proof can be directly obtained from Theorem 3.1.

Corollary 4.3

Suppose that \(1>q\geqslant p>0\), the distributions \(F_{1},\dots,F_{n}\) are supported in \([a,b]\), \(a< b\), \(a,b\in \mathbb {R}\), and

$$ \sum_{i=1}^{n} \left(\mathrm {ES}_{q}(X_{i})+\mathbb {E}[X_{i}]-\mathrm {ES}_{p}(X_{i})-\mathrm {LES}_{q}(X_{i})\right)>2(b-a), $$
(4.4)

where \(X_{i}\sim F_{i}\), \(i=1,\dots,n\). Then

$$ \frac{\overline{\mathrm {VaR}}_{q}(S_{n})-\underline{\mathrm {VaR}}_{q}(S_{n})}{\overline {\mathrm {ES}}_{p}(S_{n})-\underline {\mathrm {ES}}_{p}(S_{n})}>1. $$

Note that in Corollary 4.3, since \(\mathrm {ES}_{q}(X_{i})\geqslant \mathrm {ES}_{p}(X_{i})\) and \(\mathbb {E}[X_{i}]\geqslant \mathrm {LES}_{q}(X_{i})\), the left-hand side of (4.4) is the summation of \(n\) nonnegative terms, while the right-hand side of (4.4) is a constant. Hence (4.4) holds for \(n\) sufficiently large as long as the summation of the left-hand side of (4.4) diverges as \(n\rightarrow\infty\).

We remark that it remains theoretically unclear under what conditions the DU-spread of \(\mathrm {VaR}_{q}\) is larger than (or equal to) that of \(\mathrm {ES}_{p}\) for finite \(n\) and \(q\geqslant p\). In all our numerical examples (see Sect. 5 below), \(\mathrm {VaR}_{q}\) always has a larger DU-spread than \(\mathrm {ES}_{p}\).

5 Numerical examples

As suggested by BCBS [5], the risk measure \(\mathrm {ES}_{0.975}\) is a candidate proposed to replace \(\mathrm {VaR}_{0.99}\). The SST (see EIOPA [12]) used \(\mathrm {VaR}_{(1+p)/2}\) to compare with \(\mathrm {ES}_{p}\). Based on such considerations, we provide in this section the worst- and best-case values of \(\mathrm {VaR}_{0.99}\), \(\mathrm {VaR}_{0.9875}\), \(\mathrm {VaR}_{0.975}\) and \(\mathrm {ES}_{0.975}\) for different portfolios under dependence uncertainty. We compare the DU-spread of VaR and ES in each model, and also look at the influence of the number \(n\) of risks in the portfolio. The numerical calculation is carried out through the rearrangement algorithm (RA) described in Embrechts et al. [14], with discretization step \(\varDelta x=10^{-6}\). The following three models are considered, and the results for \(n=5,10,20\) are reported in Tables 1, 2, 3.

  1. (A)

    (Mixed portfolio) \(S_{n}=X_{1}+\cdots+X_{n}\), where \(X_{i}\sim\mathrm{Pareto}(2+0.1i)\), \({i=1,\ldots,5}\), and \(X_{i}\sim\mathrm{Exp}(i-5)\), \({i=6,\ldots,10}\), and \(X_{i}\sim\mathrm{lognormal} (0,(0.1(i-10))^{2})\), \(i=11,\ldots,20\).

  2. (B)

    (Light-tailed portfolio) \(S_{n}=Y_{1}+\cdots+Y_{n}\), where \(Y_{i}\sim\mathrm{Exp}(i), {i=1,\ldots,5}\); \(Y_{i}\sim\mathrm{Weibull}(i-5,1/2)\), \({i=6,\ldots,10}\); \(Y_{i}\stackrel{d}{=}Y_{i-10}, i=11,\ldots,20\).

  3. (C)

    (Pareto portfolio) \(S_{n}=Z_{1}+\cdots+Z_{n}\), where \(Z_{i}\sim \mathrm {Pareto}(1.5)\), \(i=1,\dots,20\).

Table 1 Bounds obtained with RA (\(\varDelta x=10^{-6}\)), model (A), mixed portfolio
Table 2 Bounds obtained with RA (\(\varDelta x=10^{-6}\)), model (B), light-tailed portfolio
Table 3 Bounds obtained with RA (\(\varDelta x=10^{-6}\)), model (C), Pareto portfolio

From Tables 13, we have the following observations:

  1. (i)

    The worst-case VaR at level 0.975 and the worst-case ES at level 0.975 are very close, even for small values of \(n\), in all models considered (cf. Theorem 3.3, (3.13)).

  2. (ii)

    The ratio between the worst-case VaR at level 0.975 and the worst-case ES at level 0.975 goes to 1 as \(n\) grows large. In the heavy-tailed model (C), the convergence is relatively slow (cf. Theorem 3.3, (3.14)).

  3. (iii)

    The DU-spreads of \(\mathrm {VaR}_{0.99}\), \(\mathrm {VaR}_{0.985}\) and \(\mathrm {VaR}_{0.975}\) are larger than those of \(\mathrm {ES}_{0.975}\) in all considered models (cf. Theorem 4.1).

  4. (iv)

    In the heavy-tailed model (C), the DU-spreads of VaR are significantly larger than those of ES (cf. Remark 4.2).

6 Conclusion

In this paper, we have considered the risk measures VaR and ES under dependence uncertainty. We have introduced the notion of aggregation-robustness and have shown that all coherent distortion risk measures, including ES, are aggregation-robust, but VaR is not. We have also derived bounds for the worst- and best-case VaR in aggregation and its diversification ratio under dependence uncertainty. An asymptotic equivalence between VaR and ES for inhomogeneous portfolios under the weakest so far known conditions on the marginal distributions has been established. It has been shown that when the number of risks in aggregation is large, VaR generally exhibits a larger uncertainty spread compared to ES at the same or a lower confidence level. Numerical examples have been provided to support our theoretical results. The main results in this paper suggest that ES is less sensitive with respect to dependence uncertainty in aggregation, and it typically has a smaller uncertainty spread compared to VaR.