Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

In this chapter, we consider multivariate models for the joint distribution of several risk factors such as returns or log returns for different assets, zero rate changes for different maturity times, changes in implied volatility, and losses due to defaults on risky loans. Our aim is to specify a good model for the future value g(X) of a portfolio, where the function g is known and its argument X is a random vector of, for instance, log returns and zero rate changes over a given future time period. Since the function g is known, what remains is to make a good choice of probability distribution for random vector X.

The first sections, Sects. 9.19.3, present spherical and elliptical distributions and their applicability in a wide range of problems in risk management. Elliptical distributions provide convenient and flexible multivariate models. This set of models includes the multivariate normal model but allows for a much wider range of tail behavior and dependence properties.

Elliptical distributions have the following important property: if X has an elliptical distribution, then the distribution of any linear combination w T X of its components is known. This property is useful because if X represents the returns of the financial assets in a portfolio, then we know the distribution of every linear portfolio. The property is useful even if we do not model the returns directly with an elliptical distribution. Suppose that X represents a vector of log returns, zero rate changes, etc. and is modeled by an elliptical distribution. If the portfolio value at some future time is given by g(X), then a first order Taylor approximation of g around the mean vector μ = E[X] leads to the first-order approximation

$$g(\mathbf{X}) \approx g(\mathbf{\mu }) +\sum\limits_{k=1}^{d} \frac{\partial g} {\partial {x}_{k}}(\mathbf{\mu })({X}_{k} - {\mu }_{k}).$$

The right-hand side is a linear combination of the components of X whose distribution therefore is known. Thus, whenever linearization of the nonlinear function g is justified, we can approximate the probability distribution of g(X) analytically.

An important property of spherically distributed random vectors is that they can be decomposed into a product of a radial part and an independent angular part that is uniformly distributed on a sphere. This property makes it easy to simulate from a spherical (elliptical) distribution in any dimension. In particular, we can approximate the probability distribution of g(X) arbitrarily well by simulating a large enough sample from X and consider the resulting empirical distribution of the simulated outcomes of g(X).

A series of applications of elliptical distribution in risk management, including risk aggregation, solvency computations for an insurance company, and hedging of options, is presented in Sect. 9.3.

Then we turn our attention to multivariate models for random vectors that do not show signs of elliptical symmetry, and the notion of copula is introduced in Sect. 9.4. On the one hand, the copula is just a multivariate distribution function appearing in the representation of a multivariate distribution function in terms of its (continuous) marginal distribution functions. On the other hand, the copula may be identified as the dependence structure of a multivariate distribution, and by varying the copula for a random vector X for which the distributions of the components X k are held fixed, we may understand better the effect of the dependence between the X k on the distribution for the future portfolio value g(X). We rarely have sufficient information to accurately specify the copula of a random vector X, and by varying the copula within a set of copula functions, we may study the robustness of the distribution of the portfolio value g(X) to misspecifications of the dependence between the components of X. Moreover, the representation of a multivariate model for X in terms of a copula and distribution functions for the X k is useful for simulation from the distribution of X: an outcome from X is constructed as an outcome from the copula together with an application of the quantile transform.

Finally, in Sect. 9.5, we consider the effect of dependence modeling for large homogeneous portfolios. We consider a high-dimensional random vector X with equally distributed components and study the effect of the dependence between the components on the distribution of the sum of the components of X.

1 Spherical Distributions

A random vector Y has a spherical distribution in d if its distribution is spherically symmetric. In other words, its distribution is invariant under rotations and reflections. Linear transformations that represent rotations and reflections correspond to multiplication by orthogonal matrices. Recall that a matrix O is orthogonal if it has real entries and OO T = I, where I is the identity matrix. Formally, Y has a spherical distribution if

$$\mathbf{O}\mathbf{Y}\stackrel{\mathrm{d}}{ =}\mathbf{Y}\quad { for every orthogonal matrix }\mathbf{O}.$$
(9.1)

Figure 9.1 shows scatter plots of samples from two spherical distributions.

Fig. 9.1
figure 1

Left plot: sample of size 3,000 from bivariate standard normal distribution. Right plot: sample of size 300 from uniform distribution on the unit circle

Three examples of spherical distributions are presented below. Before presenting the examples, let us recall the definition of the multivariate normal distribution.

  1. (1)

    A random vector Z has standard normal distribution N d (0, I) if \(\mathbf{Z} = {({Z}_{1},\ldots, {Z}_{d})}^{\mathrm{T}}\), where \({Z}_{1},\ldots, {Z}_{d}\) are independent and N(0, 1)-distributed.

  2. (2)

    A random vector X is N d (μ, Σ)-distributed if \(\mathbf{X}\stackrel{\mathrm{d}}{ =}\mathbf{\mu } + \mathbf{A}\mathbf{Z}\), where AA T = Σ and Z is N d (0, I)-distributed.

Example 9.1 (Standard normal distribution). 

The first example of a spherical distribution is the standard normal distribution N d (0, I). Let Z have a N d (0, I) distribution, and let O be an arbitrary orthogonal matrix. By property (2) above, OZ has the distribution N d (0, OO T). Since OO T = I, we conclude that Z satisfies (9.1). The left plot in Fig. 9.1 shows a scatter plot of a sample from N d (0, I).

Example 9.2 (Standard normal variance mixture). 

Another example of a spherical distribution is obtained by multiplying a N d (0, I)-distributed random vector Z by an independent nonnegative random variable W. Notice that, for any orthogonal matrix O,

$$\mathbf{O}W\mathbf{Z} = W\mathbf{O}\mathbf{Z}\stackrel{\mathrm{d}}{ =}W\mathbf{Z},$$

where the last equality follows since Z is spherically distributed.

The uniform distribution on the unit sphere \({\mathbb{S}}^{d-1} =\{ \mathbf{x} \in {\mathbb{R}}^{d} : \vert \mathbf{x}\vert = 1\}\), where | x | 2 = x T x, assigns equal probability to any two subsets of \({\mathbb{S}}^{d-1}\) with the same surface area.

Example 9.3 (Uniform distribution on the unit sphere). 

A third example of a spherical distribution is the uniform distribution on the unit sphere, i.e., the probability mass is distributed uniformly on the unit sphere \({\mathbb{S}}^{d-1}\). Let U be uniformly distributed on the unit sphere and consider a subset A of the unit sphere. For any orthogonal matrix O it holds that

$$\mathrm{P}(\mathbf{O}\mathbf{U} \in A) = \mathrm{P}(\mathbf{U} \in {\mathbf{O}}^{-1}A) = \mathrm{P}(\mathbf{U} \in {\mathbf{O}}^{\mathrm{T}}A) = \mathrm{P}(\mathbf{U} \in A),$$

where the last equality holds because O is an orthogonal matrix and therefore A and O T A have the same surface area. Therefore, U is spherically distributed. The right plot in Fig. 9.1 shows a sample from the uniform distribution on the unit circle.

The following property is a key property of spherical distributions.

Proposition 9.1.

If a is an arbitrary vector in ℝd and Y is spherically distributed and of the same dimension, then \({\mathbf{a}}^{\mathrm{T}}\mathbf{Y}\stackrel{\mathrm{d}}{ =}\vert \mathbf{a}\vert {Y }_{1}\) .

Proof. Take a0, let \(\mathbf{u} = \mathbf{a}/\vert \mathbf{a}\vert \), and pick an orthogonal matrix O whose first row is equal to u T. Since \(\mathbf{O}\mathbf{Y}\stackrel{\mathrm{d}}{ =}\mathbf{Y}\), it follows that \({\mathbf{a}}^{\mathrm{T}}\mathbf{Y} = \vert \mathbf{a}\vert {\mathbf{u}}^{\mathrm{T}}\mathbf{Y} = \vert \mathbf{a}\vert {(\mathbf{O}\mathbf{Y})}_{1}\stackrel{\mathrm{d}}{ =}\) | a | Y 1. □

The following property is another key property of spherical distributions.

Proposition 9.2.

If Y is spherically distributed, then \(\mathbf{Y}\stackrel{\mathrm{d}}{ =}R\mathbf{U}\), where \(R\stackrel{\mathrm{d}}{ =}\vert \mathbf{Y}\vert \), U is uniformly distributed on the unit sphere and R and U are independent. Moreover, \(\mathrm{P}(\mathbf{Y}/\vert \mathbf{Y}\vert \in \cdot \,\mid \vert \mathbf{Y}\vert > 0) = \mathrm{P}(\mathbf{U} \in \cdot \,)\).

The proposition provides a way to simulate from a spherical distribution. First draw a vector from the uniform distribution on the unit sphere by sampling from a standard normal distribution and dividing by its norm. Then draw the radial part by sampling from the distribution of | Y | .

To prove Proposition 9.2, we first state and prove the following lemma.

Lemma 9.1.

The uniform distribution on the unit sphere is the unique spherical distribution on the unit sphere.

Proof. Let Z have a spherical distribution on the unit sphere. For any orthogonal matrix O and subset A of \({\mathbb{S}}^{d-1}\) it holds that \(\mathrm{P}(\mathbf{Z} \in \mathbf{O}A) = \mathrm{P}({\mathbf{O}}^{\mathrm{T}}\mathbf{Z} \in {\mathbf{O}}^{\mathrm{T}}\mathbf{O}A) = \mathrm{P}(\mathbf{Z} \in A)\) since Z is spherically distributed and O T is an orthogonal matrix. If Z were not uniformly distributed on \({\mathbb{S}}^{d-1}\), then there would exist a subset A 0 of \({\mathbb{S}}^{d-1}\) and an orthogonal matrix O 0 such that P(ZA)≠P(ZO 0 A 0), which contradicts that Z is spherically distributed. □

Proof  of Proposition 9.2. It is sufficient to show that \(\mathrm{P}(\vert \mathbf{Y}\vert > r,\mathbf{Y}/\vert \mathbf{Y}\vert \in A) = \mathrm{P}(\vert \mathbf{Y}\vert > r)\mathrm{P}(\mathbf{U} \in A)\) for any r ≥ 0 and any subset A of \({\mathbb{S}}^{d-1}\), where U is uniformly distributed on the unit sphere.

We claim that, for any r ≥ 0, I{ | Y | > r}Y ∕ | Y | is spherically distributed. To prove the claim, note that for any orthogonal matrix O it holds that | OY | = | Y | and \(\mathbf{O}\mathbf{Y}\stackrel{\mathrm{d}}{ =}\mathbf{Y}\) and therefore

$$\mathbf{O}I\{\vert \mathbf{Y}\vert > r\}\mathbf{Y}/\vert \mathbf{Y}\vert = I\{\vert \mathbf{O}\mathbf{Y}\vert > r\}\mathbf{O}\mathbf{Y}/\vert \mathbf{O}\mathbf{Y}\vert \stackrel{\mathrm{d}}{ =}I\{\vert \mathbf{Y}\vert > r\}\mathbf{Y}/\vert \mathbf{Y}\vert.$$

To complete the proof of Proposition 9.2, we may without loss of generality take r such that P( | Y | > r) > 0 and note that

$$\begin{array}{rcl} \mathrm{P}(\mathbf{Y}/\vert \mathbf{Y}\vert \in A\mid \vert \mathbf{Y}\vert > r)& =& \mathrm{P}(I\{\vert \mathbf{Y}\vert > r\}\mathbf{Y}/\vert \mathbf{Y}\vert \in A)/\mathrm{P}(\vert \mathbf{Y}\vert > r) \\ & =& \mathrm{P}(I\{\vert \mathbf{Y}\vert > r\}\mathbf{Y}/\vert \mathbf{Y}\vert \in \mathbf{O}A)/\mathrm{P}(\vert \mathbf{Y}\vert > r) \\ & =& \mathrm{P}(\mathbf{Y}/\vert \mathbf{Y}\vert \in \mathbf{O}A\mid \vert \mathbf{Y}\vert > r).\end{array}$$

It now follows from Lemma 9.1 that \(\mathrm{P}(\mathbf{Y}/\vert \mathbf{Y}\vert \in A\mid \vert \mathbf{Y}\vert > r) = \mathrm{P}(\mathbf{U} \in A)\), and therefore \(\mathrm{P}(\vert \mathbf{Y}\vert > r,\mathbf{Y}/\vert \mathbf{Y}\vert \in A) = \mathrm{P}(\vert \mathbf{Y}\vert > r)\mathrm{P}(\mathbf{U} \in A)\).

2 Elliptical Distributions

The multivariate normal distribution is very useful in the construction of multivariate models. Its popularity derives primarily from the fact that it is tractable, allowing for explicit calculations, and it that can be motivated asymptotically by the central limit theorem. For univariate data that show clear signs of symmetry the univariate normal distribution does not necessarily give a good fit to the data. Typically normal tails do not match empirical tails particularly well. Similarly, the multivariate normal distribution is often at best a reasonable first approximation for samples of multivariate observations with clear signs of elliptical symmetry.

A random vector X has a N d (μ, Σ) distribution if

$$\mathbf{X}\stackrel{\mathrm{d}}{ =}\mathbf{\mu } + \mathbf{A}\mathbf{Z},$$
(9.2)

where AA T = Σ and Z has a N d (0, I) distribution. An easy way to obtain a richer class of multivariate distributions, which share many of the tractable properties of the multivariate normal distribution, is to replace the standard normal vector Z in (9.2) by an arbitrary spherically distributed random vector Y. Formally, a random vector X has an elliptical distribution if there exist a vector μ, a matrix A, and a spherically distributed vector Y such that

$$\mathbf{X}\stackrel{\mathrm{d}}{ =}\mathbf{\mu } + \mathbf{A}\mathbf{Y}.$$
(9.3)

The matrix A and the spherical distribution of Y in (9.3) are not determined by the distribution of X: we may replace the pair (A, Y) in (9.3) by (c A, c − 1 Y) for any constant c ∈ (0, ). A matrix Σ satisfying Σ = AA T is called a dispersion matrix of the elliptically distributed vector X. If the covariance matrix Cov(X) exists finitely, then Cov(X) = c Σ for some constant c ∈ (0, ). To verify this claim, we note that, by (9.3) and Proposition 9.2,

$$\mathrm{Cov}(\mathbf{X}) = \mathrm{E}[(\mathbf{X} -\mathbf{\mu }){(\mathbf{X} -\mathbf{\mu })}^{\mathrm{T}}] = \mathrm{E}[{R}^{2}]\mathbf{A}\mathrm{E}[\mathbf{U}{\mathbf{U}}^{\mathrm{T}}]{\mathbf{A}}^{\mathrm{T}} = \frac{\mathrm{E}[{R}^{2}]} {d} \mathbf{\Sigma }.$$

The last equality above can be proven as follows. Consider a standard normally distributed vector Z and recall that Z ∕ | Z | is uniformly distributed on the unit sphere and E[ | Z | 2] = d. Therefore,

$$\mathbf{I} = \mathrm{Cov}(\mathbf{Z}) = \mathrm{E}[\vert \mathbf{Z}{\vert }^{2}]\mathrm{E}[\mathbf{U}{\mathbf{U}}^{\mathrm{T}}] = d\mathrm{E}[\mathbf{U}{\mathbf{U}}^{\mathrm{T}}].$$

For a dispersion matrix Σ with nonzero diagonal entries we define the linear correlation parameter \({\rho }_{ij} = {\Sigma }_{ij}/{({\Sigma }_{ii}{\Sigma }_{jj})}^{1/2}\). If Cov(X) exists finitely, then ρ ij = Cor(X i , X j ), i.e., the linear correlation parameter coincides with the ordinary linear correlation coefficient.

The normal variance mixture distributions are the distributions of random vectors with stochastic representation

$$\mathbf{X}\stackrel{\mathrm{d}}{ =}\mathbf{\mu } + W\mathbf{A}\mathbf{Z},$$
(9.4)

where A and Z are the same as in (9.2) and W is a nonnegative random variable independent of Z. From Example 9.2 it follows that a normal variance mixture distribution is an elliptical distribution. By conditioning on W = w, we see that X | W = w is N d (μ, w 2 Σ)-distributed, which explains the name normal variance mixture. If E[W 2] < , then X has a well-defined mean vector μ = E[X] and covariance matrix

$$\mathrm{Cov}(\mathbf{X}) = \mathrm{E}[(\mathbf{X} -\mathbf{\mu }){(\mathbf{X} -\mathbf{\mu })}^{\mathrm{T}}] = \mathrm{E}[{W}^{2}]\mathbf{A}\mathrm{E}[\mathbf{Z}{\mathbf{Z}}^{\mathrm{T}}]{\mathbf{A}}^{\mathrm{T}} = \mathrm{E}[{W}^{2}]\mathbf{\Sigma }.$$

Example 9.4 (Multivariate Student’s t). 

If we take \({W}^{2}\stackrel{\mathrm{d}}{ =}\nu /{S}_{\nu }\), where S ν has a Chi-square distribution with ν degrees of freedom, then the resulting distribution of \(\mathbf{X} = \mathbf{\mu } + W\mathbf{A}\mathbf{Z}\) is called a multivariate Student’s t distribution with ν degrees of freedom, written t d (μ, Σ, ν). Note that Σ is not the covariance matrix of X. Since \(\mathrm{E}[{W}^{2}] = \nu /(\nu - 2)\) if ν > 2, it follows that \(\mathrm{Cov}(\mathbf{X}) = (\nu /(\nu - 2))\mathbf{\Sigma }\).

For a normally distributed random vector \(\mathbf{X}\stackrel{\mathrm{d}}{ =}\mathbf{\mu } + \mathbf{A}\mathbf{Z}\), where AA T = Σ, any linear combination of the components of X is again normally distributed. That is, for any nonrandom vector w of the same dimension,

$$\begin{array}{rcl}{ \mathbf{w}}^{\mathrm{T}}\mathbf{X}& \stackrel{\mathrm{ d}}{ =}&{ \mathbf{w}}^{\mathrm{T}}\mathbf{\mu } +{ \mathbf{w}}^{\mathrm{T}}\mathbf{A}\mathbf{Z} \\ & = &{ \mathbf{w}}^{\mathrm{T}}\mathbf{\mu } + {({\mathbf{A}}^{\mathrm{T}}\mathbf{w})}^{\mathrm{T}}\mathbf{Z} \\ & \stackrel{\mathrm{d}}{ =}&{ \mathbf{w}}^{\mathrm{T}}\mathbf{\mu } + {({\mathbf{w}}^{\mathrm{T}}\mathbf{\Sigma }\mathbf{w})}^{1/2}{Z}_{ 1}.\end{array}$$

A similar property holds for arbitrary elliptical distributions.

Proposition 9.3.

If X has an elliptical distribution with stochastic representation \(\mathbf{X}\stackrel{\mathrm{d}}{ =}\mathbf{\mu } + \mathbf{A}\mathbf{Y}\), where Y is spherically distributed, then for any vector a of the same dimension \({\mathbf{a}}^{\mathrm{T}}\mathbf{X}\stackrel{\mathrm{d}}{ =}{\mathbf{a}}^{\mathrm{T}}\mathbf{\mu } + {({\mathbf{a}}^{\mathrm{T}}\mathbf{\Sigma }\mathbf{a})}^{1/2}{Y }_{1}\), where Σ = AA T.

The proof is omitted since the result follows immediately from Proposition 9.1 and the defining property (9.3) of elliptical distributions.

As was previously mentioned, normal variance mixture distributions and, more generally, elliptical distributions share many of the attractive properties of normal distributions. However, there are important exceptions. Recall that the components of the N d (μ, Σ)-distributed vector μ + AZ are independent if and only if AA T = Σ is a diagonal matrix, that is, if the components are uncorrelated. This property does not hold for arbitrary normal variance mixture distributions. If \(\mathbf{X}\stackrel{\mathrm{d}}{ =}\mathbf{\mu } + W\mathbf{A}\mathbf{Z}\) with AA T = Σ a diagonal matrix, then the components of X are uncorrelated. If Σ is a diagonal matrix with strictly positive diagonal entries, then \(({X}_{k},{X}_{l})\stackrel{\mathrm{d}}{ =}({\mu }_{k} + W{A}_{k,k}{Z}_{k},{\mu }_{l} + W{A}_{l,l}{Z}_{l})\), where A k, k , A l, l > 0. Clearly, X k and X l are not independent unless W is a constant.

The sum of independent elliptically distributed random vectors with the same (up to a constant factor) dispersion matrix is elliptically distributed.

Proposition 9.4.

If the random vectors X 1 and X 2 in ℝ d are independent and elliptically distributed with common dispersion matrix Σ, then X 1 + X 2 is elliptically distributed.

Proof. For a matrix A such that AA T = Σ we may write \({\mathbf{X}}_{1} +{ \mathbf{X}}_{2}\stackrel{\mathrm{d}}{ =}{\mathbf{\mu }}_{1} +{ \mathbf{\mu }}_{2} + \mathbf{A}({\mathbf{Y}}_{1} +{ \mathbf{Y}}_{2})\) for some independent spherically distributed vectors Y 1 and Y 2. It remains to show that Y 1 + Y 2 is spherically distributed. For every orthogonal matrix O and y in d,

$$\begin{array}{rcl} \mathrm{P}(\mathbf{O}({\mathbf{Y}}_{1} +{ \mathbf{Y}}_{2}) \leq \mathbf{y})& =& \int \mathrm{P}(\mathbf{O}{\mathbf{Y}}_{1} + \mathbf{z} \leq \mathbf{y}\mid \mathbf{O}{\mathbf{Y}}_{2} = \mathbf{z})d{F}_{\mathbf{O}{\mathbf{Y}}_{2}}(\mathbf{z})d\mathbf{z} \\ & =& \int \mathrm{P}({\mathbf{Y}}_{1} + \mathbf{z} \leq \mathbf{y})d{F}_{{\mathbf{Y}}_{2}}(\mathbf{z})d\mathbf{z} \\ & =& \mathrm{P}({\mathbf{Y}}_{1} +{ \mathbf{Y}}_{2} \leq \mathbf{y}), \\ \end{array}$$

i.e., \(\mathbf{O}({\mathbf{Y}}_{1} +{ \mathbf{Y}}_{2})\stackrel{\mathrm{d}}{ =}{\mathbf{Y}}_{1} +{ \mathbf{Y}}_{2}\), from which the conclusion follows. □

Example 9.5 (Summation of log returns). 

Consider a set of identically distributed and uncorrelated random variables \({X}_{1},\ldots, {X}_{n}\) that represent future daily log returns for some asset. Suppose that each log return has a finite mean μ and standard deviation σ. If the log returns are independent, then by the central limit theorem, \({X}_{1} + \cdots + {X}_{n}\) is approximately N(nμ, nσ2)-distributed for n large. If the vector \(\mathbf{X} = {({X}_{1},\ldots, {X}_{n})}^{\mathrm{T}}\) of log returns has an elliptical distribution, then Proposition 9.3 implies that

$${X}_{1} + \cdots + {X}_{n} ={ \mathbf{1}}^{\mathrm{T}}\mathbf{X}\stackrel{\mathrm{ d}}{ =}n\mu + {n}^{1/2}({X}_{ 1} - \mu ).$$

We see that the n-day log return and the 1-day log return belong to the same location-scale family of distributions. For instance, if the 1-day log return has a heavy-tailed Student’s t distribution with a low-degree-of-freedom parameter, then so does the n-day log return.

2.1 Goodness of Fit of an Elliptical Model

Consider a random vector X with an elliptical distribution with representation \(\mathbf{X} = \mathbf{\mu } + \mathbf{A}\mathbf{Y}\), where Y has a spherical distribution and Σ = AA T is invertible. By Proposition 9.3,

$${ \mathbf{w}}^{\mathrm{T}}\mathbf{X}\stackrel{\mathrm{ d}}{ =}{\mathbf{w}}^{\mathrm{T}}\mathbf{\mu } + {({\mathbf{w}}^{\mathrm{T}}\mathbf{\Sigma }\mathbf{w})}^{1/2}{Y }_{ 1}\quad { for all nonrandom vectors }\mathbf{w}\neq \mathbf{0}$$

or, equivalently,

$$\frac{{\mathbf{w}}^{\mathrm{T}}\mathbf{X} -{\mathbf{w}}^{\mathrm{T}}\mathbf{\mu }} {{({\mathbf{w}}^{\mathrm{T}}\mathbf{\Sigma }\mathbf{w})}^{1/2}} \stackrel{\mathrm{d}}{ =}{Y }_{1}\quad { for all nonrandom vectors }\mathbf{w}\neq \mathbf{0}.$$
(9.5)

The property (9.5) can be used to investigate whether or not a multivariate sample is likely to come from an elliptical distribution. Let us illustrate the procedure by an example.

Example 9.6 (Estimation and fit of an elliptical model). 

Consider a sample of size 500 of pairs of daily log returns for the Dow Jones Industrial Average (DJIA) and Nasdaq Composite indices (index values from November 11, 2008 until November 4, 2010). The scatter plot of the pairs of log returns is shown in the upper left plot in Fig. 9.2. The log-return sample is denoted \(\{{\mathbf{x}}_{1},\ldots, {\mathbf{x}}_{500}\}\). We assume initially that the sample can be seen as outcomes from an elliptically distributed vector X and investigate whether this assumption can be rejected or not. If it is not rejected, then we also want to determine the elliptical distribution of X. We assume that the location parameter μ and a scalar multiple of the shape parameter C = c Σ, which is assumed invertible, can be estimated. Note that (9.5) can be expressed as

$$\frac{{\mathbf{w}}^{\mathrm{T}}\mathbf{X} -{\mathbf{w}}^{\mathrm{T}}\mathbf{\mu }} {{({\mathbf{w}}^{\mathrm{T}}\mathbf{C}\mathbf{w})}^{1/2}} \stackrel{\mathrm{d}}{ =}{c}^{-1/2}{Y }_{ 1}\quad { for all nonrandom vectors }\mathbf{w}\neq \mathbf{0}.$$

If the covariance matrix Cov(X) exists finitely, then μ = E[X], and we may take C = Cov(X). Here we estimate μ and C by the sample mean and sample covariance, respectively. The estimates are denoted \(\widehat{\mathbf{\mu }}\) and \(\widehat{\mathbf{C}}\). Consider a large set of vectors \(\{{\mathbf{w}}_{1},\ldots, {\mathbf{w}}_{n}\}\) of unit length. For each w k we construct the sample \(\{{y}_{k,1},\ldots, {y}_{k,500}\}\) by

$${y}_{k,l} = \frac{{\mathbf{w}}_{k}^{\mathrm{T}}{\mathbf{x}}_{l} -{\mathbf{w}}_{k}^{\mathrm{T}}\widehat{\mathbf{\mu }}} {{({\mathbf{w}}_{k}^{\mathrm{T}}\widehat{\mathbf{C}}{\mathbf{w}}_{k})}^{1/2}} \quad { for }k = 1,\ldots, n,\quad l = 1,\ldots, 500.$$

Each such sample can be viewed as a sample from \({c}^{-1/2}{Y }_{1}\). If the data were generated by the elliptical distribution of X, then all the n constructed samples must come from the same distribution, the distribution of \({c}^{-1/2}{Y }_{1}\). By overlaying the n q–q plots of the empirical quantiles for the n samples against the quantiles of a chosen reference distribution, we can check graphically whether the data appear to be consistent with an elliptical distribution or not. Moreover, the distribution of \({c}^{-1/2}{Y }_{1}\) can be estimated from the q–q plots.

Here we take n = 100 and sample the w k randomly from the uniform distribution of the unit sphere by setting \({\mathbf{w}}_{k} ={ \mathbf{z}}_{k}/\vert {\mathbf{z}}_{k}\vert \), where the z k are outcomes of independent N2(0, I)-distributed random vectors. The upper left plot in Fig. 9.2 is a scatter plot of the sample \(\{{\mathbf{x}}_{1},\ldots, {\mathbf{x}}_{500}\}\). The upper right plot in Fig. 9.2 shows the n = 100 q–q plots of the empirical quantiles of the samples \(\{{y}_{k,1},\ldots, {y}_{k,500}\}\) (y-axis) against the quantiles of the standard normal distribution (x-axis). The q–q plots indicate a reasonable fit to a common distribution with heavier tails than the normal distribution.

Fig. 9.2
figure 2

Scatter plot showing pairs (x D , x N ) of DJIA and Nasdaq log returns. Upper right plot: 100 overlaid q–q plots for empirical quantiles for each of 100 samples \(\{{y}_{k,1},\ldots, {y}_{k,500}\}\) (y-axis) against standard normal quantiles (x-axis). The solid curve in the lower plot shows the quantiles of the model for the Nasdaq log returns based on the fitted bivariate Student’s t model (y-axis) against standard normal quantiles (x-axis). The dashed curve in the lower plot shows the polynomial normal quantiles (Example 8.11) fitted to the Nasdaq log returns (y-axis) against the standard normal quantiles (x-axis)

Under the assumption that \(\{{\mathbf{x}}_{1},\ldots, {\mathbf{x}}_{500}\}\) is a sample from the bivariate Student’s t ν distribution with ν > 2 (otherwise it does not make sense to use the sample covariance matrix) and under the assumption that \(\widehat{\mathbf{\mu }} = \mathbf{\mu }\) and \(\widehat{\mathbf{C}} = \mathrm{Cov}(\mathbf{X})\), it holds that all the samples \(\{{y}_{k,1},\ldots, {y}_{k,500}\}\) are samples from the distribution of \({((\nu - 2)/\nu )}^{1/2}Z\), where Z is standard t ν-distributed. Least-squares estimation based on all 100 univariate samples gives the estimate \(\widehat{\nu } \approx 4.09\). The selected model for the sample \(\{{\mathbf{x}}_{1},\ldots, {\mathbf{x}}_{500}\}\) is the distribution \({t}_{2}(\widehat{\mathbf{\mu }},((\widehat{\nu } - 2)/\widehat{\nu })\widehat{\mathbf{C}},\widehat{\nu })\).

The second marginal distribution (a univariate Student’s t distribution) of the bivariate Student’s t distribution for the pair of DJIA and Nasdaq log returns provides a model for the Nasdaq log returns. In the lower plot in Fig. 9.2, we compare this model to the polynomial normal model in Example 8.11. The solid curve in the lower plot is a q–q plot of the quantiles for the model for the Nasdaq log returns (y-axis) against standard normal quantiles (x-axis). The quantiles of the fitted polynomial normal model in Example 8.11 are plotted against the standard normal quantiles as the dashed curve in the lower plot in Fig. 9.2. It is hard to distinguish between the two models.

2.2 Asymptotic Dependence and Rank Correlation

We now introduce general notions of dependence and study them in the context of elliptical distributions.

The first notion of dependence measures the dependence of extreme values and is called tail dependence or asymptotic dependence. Consider a pair (X 1, X 2) of random variables with equally distributed components. We say that X 1 and X 2 are asymptotically dependent in the lower left tail if the limit lim x → − P(X 2xX 1x), the coefficient of lower tail dependence, is strictly positive and asymptotically independent if the limit is zero.

Proposition 9.5.

If (X 1 ,X 2 ) has a bivariate standard normal distribution with linear correlation coefficient ρ < 1, then \(\lim \limits_{x\rightarrow -\infty }\mathrm{P}({X}_{2} \leq x\mid {X}_{1} \leq x) = 0\) .

Proof. First note that \(\mathrm{P}({X}_{2} \leq x\mid {X}_{1} \leq x) = \mathrm{P}({X}_{1} \leq x,{X}_{2} \leq x)/\Phi (x)\) and that \(({X}_{1},{X}_{2})\stackrel{\mathrm{d}}{ =}({Z}_{1},\rho {Z}_{1} + {(1 - {\rho }^{2})}^{1/2}{Z}_{2})\), where Z 1, Z 2 are independent and standard normally distributed. If \(\rho = -1\), then the statement of the proposition holds, so we may without loss of generality assume that | ρ | < 1. We may write

$$\begin{array}{rcl} \mathrm{P}({X}_{1} \leq x,{X}_{2} \leq x)& =& {\int }_{-\infty }^{\infty }\mathrm{P}\left ({Z}_{ 1} \leq x,\rho {Z}_{1} + {(1 - {\rho }^{2})}^{1/2}t \leq x\right )\phi (t)\mathit{dt} \\ & =& {\int }_{-\infty }^{a(x)}\Phi (x)\phi (t)\mathit{dt} +{ \int }_{a(x)}^{\infty }\Phi ((x - {(1 - {\rho }^{2})}^{1/2}t)/\rho )\phi (t)\mathit{dt},\\ \end{array}$$

where \(a(x) = {((1 - \rho )/(1 + \rho ))}^{1/2}x\). Therefore,

$$ \begin{array}{rcl}\lim \limits_{x\rightarrow -\infty }\mathrm{P}({X}_{2} \leq x\mid {X}_{1} \leq x)& =& \lim \limits_{x\rightarrow -\infty }\left (\Phi (a(x))+\frac{{\int }_{a(x)}^{\infty }\Phi ((x-{(1-{\rho }^{2})}^{1/2}t)/\rho )\phi (t)\mathit{dt}} {\Phi (x)} \right ) \\ & =& \lim \limits_{x\rightarrow -\infty }\frac{{\int }_{a(x)}^{\infty }\Phi ((x-{(1-{\rho }^{2})}^{1/2}t)/\rho )\phi (t)\mathit{dt}} {\Phi (x)}.\end{array}$$

Applying l’Hôpital’s rule gives

$$\begin{array}{rcl} & & \lim \limits_{x\rightarrow -\infty }\mathrm{P}({X}_{2}\leq x\mid {X}_{1}\leq x) \\ & & \;={-\lim }_{x\rightarrow -\infty }\frac{\Phi (x)} {\phi (x)}{\left (\frac{1-\rho } {1+\rho }\right )}^{1/2}{+\lim }_{ x\rightarrow -\infty } \frac{1} {\rho \phi (x)}{\int }_{a(x)}^{\infty }\phi ((x-{(1-{\rho }^{2})}^{1/2}t)/\rho )\phi (t)\mathit{dt}.\end{array}$$

We saw in Example 8.1 that \(\Phi (x) \sim -\phi (x)/x\) as x → − , so we only need to compute the last limit given above. By writing up explicitly the standard normal densities and making a substitution of integration variable, we arrive at

$$\frac{1} {\rho \phi (x)}{\int }_{a(x)}^{\infty }\phi ((x - {(1 - {\rho }^{2})}^{1/2}t)/\rho )\phi (t)\mathit{dt} ={ \int }_{-\infty }^{a(x)}\phi (u)du = \Phi (a(x)),$$

which tends to 0 as x → − . □

Unlike the components of a normally distributed random vector, the components of a vector with a bivariate Student’s t distribution are asymptotically dependent. We omit the proof of the following proposition and refer the reader to Sect. 9.6 for further details.

Proposition 9.6.

Let (X 1 ,X 2 ) have an elliptical distribution with linear correlation parameter ρ. If X 1 and X 2 are equally distributed, and if P (X 1 ≤ x) is regularly varying at −∞ with index − α, then

$$\lim \limits_{x\rightarrow -\infty }\mathrm{P}({X}_{2} \leq x\mid {X}_{1} \leq x) = \frac{{\int }_{(\pi /2-\arcsin \rho )/2}^{\pi /2}{(\cos t)}^{\alpha }\mathit{dt}} {{\int }_{0}^{\pi /2}{(\cos t)}^{\alpha }\mathit{dt}}.$$

Zero correlation does not imply asymptotic independence, and covariances and correlations do not provide sufficient information to assess dependence between extreme values. For example, a quadratic hedge—based on a covariance structure—may perform poorly when it matters the most if the liability and the hedging instruments are asymptotically dependent. There are many examples from financial markets of simultaneous extreme price movements for assets whose log returns are weakly correlated between the assets.

Consider an elliptically distributed random vector (X 1, X 2) with a dispersion matrix Σ. Recall that any matrix Σ c = c Σ is a dispersion matrix for (X 1, X 2). However, the linear correlation parameter \(\rho = {\Sigma }_{1,2}/{({\Sigma }_{1,1}{\Sigma }_{2,2})}^{1/2}\) is uniquely determined by the elliptical distribution. Since ρ = Cor(X 1, X 2), whenever Cor(X 1, X 2) exists [the variances Var(X 1) and Var(X 2) are nonzero and finite], we may estimate ρ as the sample correlation coefficient. However, for heavy-tailed data (corresponding to distributions with finite variances) the sample correlation coefficient is an estimator of ρ with a large—or infinite—variance. An alternative approach to estimating the linear correlation parameter ρ is based on estimating another (rank) correlation coefficient called Kendall’s tau, whose value for an elliptical distribution can be expressed in terms of the linear correlation parameter ρ. This approach allows for estimation of ρ also for elliptical distributions whose marginal distributions have infinite variances.

Kendall’s tau for the random vector (X 1, X 2) is defined as

$$\tau ({X}_{1},{X}_{2}) = \mathrm{P}(({X}_{1} - {X^{\prime}}_{1})({X}_{2} - {X^{\prime}}_{2}) > 0) -\mathrm{P}(({X}_{1} - {X^{\prime}}_{1})({X}_{2} - {X^{\prime}}_{2}) < 0),$$
(9.6)

where (X′ 1, X′ 2) is an independent copy of (X 1, X 2).

Proposition 9.7.

Let (X 1 ,X 2 ) have an elliptical distribution with location parameter (μ 1 2 ) and linear correlation parameter ρ. If \(\mathrm{P}({X}_{1} = {\mu }_{1}) = \mathrm{P}({X}_{2} = {\mu }_{2}) = 0\) , then

$$\tau ({X}_{1},{X}_{2}) = \frac{2} {\pi }\arcsin \rho.$$
(9.7)

Proof. Without loss of generality we may consider the case | ρ | < 1. Since \(\mathrm{P}(({X}_{1} - {X^{\prime}}_{1})({X}_{2} - {X^{\prime}}_{2}) = 0) = 0\), we find that

$$\tau ({X}_{1},{X}_{2}) = 2\mathrm{P}(({X}_{1} - {X^{\prime}}_{1})({X}_{2} - {X^{\prime}}_{2}) > 0) - 1.$$

The independence of X = (X 1, X 2)T and X = (X′ 1, X′ 2)T and representation (9.3) imply that

$$(\mathbf{X},\mathbf{X}^{\prime})\stackrel{\mathrm{d}}{ =}(\mathbf{\mu },\mathbf{\mu }) + \mathbf{A}(R\mathbf{U},R^{\prime}\mathbf{U}^{\prime}),$$

where R, R′, U, U are independent. From Proposition 9.4 we know that \(\mathbf{X} -\mathbf{X}^{\prime}\stackrel{\mathrm{d}}{ =}\mathbf{A}{R}^{{_\ast}}{\mathbf{U}}^{{_\ast}}\), where R and U are independent, and the assumption \(\mathrm{P}({X}_{1} = {\mu }_{1}) = \mathrm{P}({X}_{2} = {\mu }_{2}) = 0\) implies that \(\mathrm{P}({R}^{{_\ast}} = 0) = 0\). With W = AU we have found that

$$\tau ({X}_{1},{X}_{2}) = 2\mathrm{P}({R}^{{_\ast}}{W}_{ 1}{W}_{2} > 0) - 1 = 2\mathrm{P}({W}_{1}{W}_{2} > 0) - 1.$$

Write

$$\mathbf{\Sigma } = \left (\begin{array}{lr} {\sigma }_{1}^{2} & {\sigma }_{1}{\sigma }_{2}\rho \\ {\sigma }_{1}{\sigma }_{2}\rho & {\sigma }_{2}^{2} \end{array} \right ),\quad \mathbf{A} = \left (\begin{array}{lr} {\sigma }_{1}{(1 - {\rho }^{2})}^{1/2} & {\sigma }_{ 1}\rho \\ 0 & {\sigma }_{2} \end{array} \right ),\quad {\mathbf{U}}^{{_\ast}}\stackrel{\mathrm{ d}}{ =}\left (\begin{array}{c} \cos U\\ \sin U \end{array} \right ),$$

where U is uniformly distributed on [ − π, π). Then

$$\begin{array}{rcl} \mathrm{P}({W}_{1}{W}_{2} > 0)& =& 2\mathrm{P}({W}_{1} > 0,{W}_{2} > 0) \\ & =& 2\mathrm{P}({\sigma }_{1}{(1 - {\rho }^{2})}^{1/2}\cos U + {\sigma }_{ 1}\rho \sin U > 0,{\sigma }_{2}\sin U > 0) \\ & =& 2\mathrm{P}({(1 - {\rho }^{2})}^{1/2}\cos U + \rho \sin U > 0,\sin U > 0) \\ & =& 2\mathrm{P}(\cos y\cos U +\sin y\sin U > 0,\sin U > 0), \\ \end{array}$$

where \(y =\arcsin \rho \in [-\pi /2,\pi /2]\). Clearly, sinU > 0 is here equivalent to U ∈ (0, π). Since \(\cos y\cos U +\sin y\sin U =\cos (U - y)\) and cos(Uy) > 0 is here equivalent to \(U \in (y - \pi /2,y + \pi /2)\), we find that

$$\begin{array}{rcl} \mathrm{P}(\cos y\cos U+\sin y\sin U>0,\sin U>0)& =& \mathrm{P}(U\in (y-\pi /2,y+\pi /2) \cap (0,\pi )) \\ & =& \mathrm{P}(U\in (0,y+\pi /2)).\end{array}$$

Putting the pieces together gives

$$\tau ({X}_{1},{X}_{2}) = 4\frac{\arcsin \rho + \pi /2} {2\pi } - 1 = \frac{2} {\pi }\arcsin \rho.$$

Consider the function { sign}(x) with value 0 for x = 0 and the value x ∕ | x | otherwise. Kendall’s tau in (9.6) can be written as

$$\tau ({X}_{1},{X}_{2}) = \mathrm{E}\left [{ sign}\left (({X}_{1} - {X^{\prime}}_{1})({X}_{2} - {X^{\prime}}_{2})\right )\right ].$$
(9.8)

Given a sample \(\{{\mathbf{X}}_{1},\ldots, {\mathbf{X}}_{n}\}\) of identically distributed vectors X k = (X k, 1, X k, 2)T, we estimate (9.8) by the number of index pairs (j, k), where j < k such that \(({X}_{j,1} - {X}_{k,1})({X}_{j,2} - {X}_{k,2}) > 0\) minus the number of index pairs such that \(({X}_{j,1} - {X}_{k,1})({X}_{j,2} - {X}_{k,2}) < 0\) divided by the total number of index pairs:

$$\widehat{\tau } ={ \left({n}\atop{2}\right)}^{-1}\sum\limits_{j<k}{ sign}\left (({X}_{j,1} - {X}_{k,1})({X}_{j,2} - {X}_{k,2})\right ).$$

Finally, if the X k are elliptically distributed such that the condition in Proposition 9.7 holds, then the estimator of the linear correlation parameter ρ is chosen as

$$\widehat{\rho } =\sin \left (\frac{\pi } {2} \widehat{\tau }\right ).$$
(9.9)

To assess the accuracy of the estimator in (9.9) and compare it to the sample correlation coefficient, we consider a simulation study that is summarized in Fig. 9.3. For samples from a bivariate normal distribution the two estimators perform similarly. For samples from a bivariate Student’s t distribution with three degrees of freedom we find that the estimator in (9.9), a nonlinear transformation of Kendall’s tau estimator, performs much better than the sample correlation coefficient and similarly to its performance on data from a bivariate normal distribution.

Fig. 9.3
figure 3

Histograms based on 10,000 estimates of linear correlation parameter, where each estimate is based on a sample of size 100 from a bivariate elliptical distribution with linear correlation parameter 0. 5. Plots to left show estimates based on samples from a bivariate normal distribution. Plots to right show estimates based on sample from a bivariate Student’s t distribution with three degrees of freedom. The estimates in the upper plots are ordinary sample correlations. The estimates in the lower plots are transformations of Kendall’s tau estimates as in (9.9)

2.3 Linearization and Elliptical Distributions

Suppose that the future value of a financial portfolio can be expressed as g(X), where the function g is a known function and X is a random vector whose components represent, e.g., log returns for a given set of assets over a given future time period. If the time period is rather short and if X is likely to take a value that is not too far from its expected value μ = E[X], then the first-order approximation

$$g(\mathbf{X}) \approx g(\mathbf{\mu }) + \nabla {g}^{\mathrm{T}}(\mathbf{\mu })(\mathbf{X} -\mathbf{\mu }) = g(\mathbf{\mu }) +\sum\limits_{k=1}^{d} \frac{\partial g} {\partial {x}_{k}}(\mathbf{\mu })({X}_{k} - {\mu }_{k})$$

can be assumed to be accurate. The approximation replaces the nonlinear expression in the components of X by a weighted sum of the components translated by a constant. However, it is typically hard to determine the probability distribution of a sum of dependent random variables. An important exception is when X is elliptically distributed. In this case, X has the stochastic representation \(\mathbf{X}\stackrel{\mathrm{d}}{ =}\mathbf{\mu } + \mathbf{A}\mathbf{Y}\), where Y has a spherical distribution, so Proposition 9.3 gives

$$g(\mathbf{X}) \approx g(\mathbf{\mu }) + \nabla {g}^{\mathrm{T}}(\mathbf{\mu })(\mathbf{X} -\mathbf{\mu })\stackrel{\mathrm{ d}}{ =}g(\mathbf{\mu }) +{ \left (\nabla {g}^{\mathrm{T}}(\mathbf{\mu })\mathbf{\Sigma }\nabla g(\mathbf{\mu })\right )}^{1/2}{Y }_{ 1},$$
(9.10)

where Σ = AA T or, more explicitly,

$$g(\mathbf{X})\stackrel{\mathrm{d}}{ \approx }g(\mathbf{\mu }) +{ \left (\sum\limits_{j,k=1}^{d} \frac{\partial g} {\partial {x}_{j}}(\mathbf{\mu }) \frac{\partial g} {\partial {x}_{k}}(\mathbf{\mu }){\Sigma }_{j,k}\right )}^{1/2}{Y }_{ 1}.$$

The accuracy of this approximation clearly depends strongly on how concentrated the probability mass of X is around its expected value μ. We illustrate the accuracy of the linearization with an example for one-dimensional elliptical distributions and a specific function g.

Example 9.7 (Linearization). 

Let g(x) = e x and consider a random variable X with a spherical distribution with distribution function F. The quantile function of g(X) is g(F − 1(p)), whereas the that of \(g(0) + g^{\prime}(0)X = 1 + X\) is \(1 + {F}^{-1}(p)\). Figure 9.4 plots the quantiles of e X (y-axis) against the quantiles of 1 + X (x-axis) together with the dashed straight line corresponding to a perfect fit. The upper plots correspond to X’s being normally distributed with standard deviation 0. 02 (left) and 0. 3 (right). The lower plots correspond to X’s having a Student’s t distribution with three degrees of freedom and standard deviation 0. 02 (left) and 0. 3 (right). We see that the smaller the standard deviation is and the lighter the tails are, the more accurate is the linear approximation.

Fig. 9.4
figure 4

These four q–q plots illustrate the approximation error from linearization. The plots show the quantiles of e X (y-axis) against the quantiles of 1 + X (x-axis). The upper plots correspond to X’s being N(0, 0. 022)-distributed (left) and N(0, 0. 32)-distributed (right). The lower plots correspond to X’s having a Student’s t distribution with three degrees of freedom and standard deviation 0. 02 (left) and 0. 3 (right)

Example 9.8 (Linearization and risk measures). 

Suppose that g(X) represents the value at time T of a portfolio of financial assets, where X has an elliptical distribution with stochastic representation \(\mathbf{X}\stackrel{\mathrm{d}}{ =}\mathbf{\mu } + \mathbf{A}\mathbf{Y}\). Consider a risk measure ρ and the approximation of g(X) in (9.10). If B 0 is the discount factor giving the current value of money at time T, and if ρ is translation invariant and positively homogeneous, then

$$\begin{array}{rcl} \rho (g(\mathbf{X}))& \approx & \rho \left (g(\mathbf{\mu }) +{ \left (\nabla {g}^{\mathrm{T}}(\mathbf{\mu })\mathbf{\Sigma }\nabla g(\mathbf{\mu })\right )}^{1/2}{Y }_{ 1}\right ) \\ & =& -{B}_{0}g(\mathbf{\mu }) +{ \left (\nabla {g}^{\mathrm{T}}(\mathbf{\mu })\mathbf{\Sigma }\nabla g(\mathbf{\mu })\right )}^{1/2}\rho ({Y }_{ 1}).\end{array}$$

For ρ chosen as value-at-risk (VaR) or expected shortfall (ES) and for Y 1 normally distributed or Student’s t-distributed, the quantity ρ(Y 1) can be computed as in Example 6.13. If Y 1 is standard normally distributed, then

$${ \mathrm{VaR}}_{p}({Y }_{1}) = {B}_{0}{\Phi }^{-1}(1 - p)\quad { and}\quad {\mathrm{ES}}_{ p}({Y }_{1}) = {B}_{0}\frac{\phi ({\Phi }^{-1}(1 - p))} {p}.$$

If Y 1 has a standard Student’s t distribution with ν degrees of freedom, then

$${ \mathrm{VaR}}_{p}({Y }_{1})={B}_{0}{t}_{\nu }^{-1}(1-p)\quad { and}\quad {\mathrm{ES}}_{ p}({Y }_{1})={B}_{0}\frac{{g}_{\nu }({t}_{\nu }^{-1}(1-p))} {p} \left (\frac{\nu +{({t}_{\nu }^{-1}(p))}^{2}} {\nu -1} \right ),$$

where g ν and t ν are the density and distribution functions, respectively, of Y 1.

If ρ is a monotone risk measure and g is a convex function, then it follows from Proposition 2.2 that

$$\rho (g(\mathbf{X})) \leq \rho \left (g(\mathbf{\mu }) + \nabla {g}^{\mathrm{T}}(\mathbf{\mu })(\mathbf{X} -\mathbf{\mu })\right ),$$

i.e., linearization overestimates the risk. If ρ is also translation invariant and positively homogeneous, then

$$\begin{array}{rcl} \rho (g(\mathbf{X}))& \leq & \rho \left (g(\mathbf{\mu }) + \nabla {g}^{\mathrm{T}}(\mathbf{\mu })(\mathbf{X} -\mathbf{\mu })\right ) \\ & =& -{B}_{0}g(\mathbf{\mu }) +{ \left (\nabla {g}^{\mathrm{T}}(\mathbf{\mu })\mathbf{\Sigma }\nabla g(\mathbf{\mu })\right )}^{1/2}\rho ({Y }_{ 1}).\end{array}$$

As an illustration, let X be a vector of log returns of d assets and consider a linear portfolio consisting of a long position of current value w k ≥ 0 in the kth asset, for every k. Then the future portfolio value is \(g(\mathbf{X}) = {w}_{1}{e}^{{X}_{1}} + \cdots + {w}_{d}{e}^{{X}_{d}}\) and g is convex.

Example 9.8 illustrates how linearization and an elliptical approximation can be used to construct explicit approximation formulas for risk measures. This approach must be used with caution. The accuracy of the first-order approximation of g around μ evaluated at X is best around μ. However, risk measures of g(X), such as VaR and ES, typically depend on the behavior of X far from μ.

Example 9.9 (Linearization over a short time horizon). 

Considering a portfolio of shares of two stocks. The portfolio contains h 1 and h 2 shares of the two stocks. The spot prices at time t are given by S t 1 and S t 2, respectively. Suppose that we want to compute \({\mathrm{VaR}}_{p}({V }_{T} - {V }_{0}/{B}_{0})\), where \({V }_{T} - {V }_{0}/{B}_{0}\) is the change in portfolio value from now until time T, measured in money at time T. We have

$$\begin{array}{rcl}{ V }_{T} - {V }_{0}/{B}_{0}& =& {h}_{1}({S}_{T}^{1} - {S}_{ 0}^{1}/{B}_{ 0}) + {h}_{2}({S}_{T}^{2} - {S}_{ 0}^{2}/{B}_{ 0}) \\ & =& {h}_{1}{S}_{0}^{1}({e}^{{X}_{1} } - 1/{B}_{0}) + {h}_{2}{S}_{0}^{2}({e}^{{X}_{2} } - 1/{B}_{0}) \\ & =& g({X}_{1},{X}_{2}), \\ \end{array}$$

where \(({X}_{1},{X}_{2}) = (\log ({S}_{T}^{1}/{S}_{0}^{1}),\log ({S}_{T}^{2}/{S}_{0}^{2}))\) is the log-return pair from now until time T. If T is small (a couple of days, say), then it may be reasonable to set \({\mu }_{1} = {\mu }_{2} = 0\) and B 0 = 1, which yields

$$\begin{array}{rcl} g({X}_{1},{X}_{2})& \approx & g({\mu }_{1},{\mu }_{2}) +\sum\limits_{k=1}^{2} \frac{\partial g} {\partial {x}_{k}}({\mu }_{1},{\mu }_{2})({X}_{k} - {\mu }_{k}) \\ & =& \sum\limits_{k=1}^{2}{h}_{ k}{S}_{0}^{k}({e}^{{\mu }_{k} } - 1/{B}_{0}) +\sum\limits_{k=1}^{2}{h}_{ k}{S}_{0}^{k}{e}^{{\mu }_{k} }({X}_{k} - {\mu }_{k}) \\ & =& {h}_{1}{S}_{0}^{1}{X}_{ 1} + {h}_{2}{S}_{0}^{2}{X}_{ 2}.\end{array}$$

If X = (X 1, X 2)T has an elliptical distribution with representation X = AY, where AA T = Σ, then

$$\begin{array}{rcl}{ \mathrm{VaR}}_{p}({V }_{1} - {V }_{0}/{B}_{0})& \approx & {V }_{0} +{ \mathrm{VaR}}_{p}({({\mathbf{w}}^{\mathrm{T}}\mathbf{\Sigma }\mathbf{w})}^{1/2}{Y }_{ 1}) \\ & =& {V }_{0} + {({\mathbf{w}}^{\mathrm{T}}\mathbf{\Sigma }\mathbf{w})}^{1/2}{F}_{{ Y }_{1}}^{-1}(1 - p), \\ \end{array}$$

where w T = (h 1 S 0 1, h 2 S 0 2).

Example 9.10 (Linearization over a long time horizon). 

Suppose that we want to compute \({\mathrm{VaR}}_{p}({V }_{T} - {V }_{0}/{B}_{0})\) for a portfolio over a T-day period. Suppose further that V T can be expressed as a function g of the T-day log returns and that the vectors \({\mathbf{X}}_{1},\ldots, {\mathbf{X}}_{T}\) of 1-day log returns are independent and identically elliptically distributed with mean μ = E[X 1] and covariance matrix Σ = Cov(X 1). Set \(\mathbf{W} ={ \mathbf{X}}_{1} + \cdots +{ \mathbf{X}}_{T}\) and note that W, with E[W] = T μ and Cov(W) = T Σ, is the vector of log returns for the entire T-day period. From Proposition 9.4 we know that W is elliptically distributed, however, in general (unless W is normally distributed) of a different type than X 1. The elliptical distribution of W is not easily inferred from the distribution of X 1. However, if T is sufficiently large, then it may be reasonable (based on the central limit theorem) to assume that W is approximately normally distributed. However, one should be aware that the convergence in distribution to the normal distribution in the central limit theorem is slow in the tail regions.

Linearization, together with the normal approximation, gives

$$\begin{array}{rcl} g(\mathbf{W})& = & \sum\limits_{k=1}^{d}{h}_{ k}{S}_{0}^{k}({e}^{{W}_{k} } - 1/{B}_{0}) \\ & \approx &\sum\limits_{k=1}^{d}{h}_{ k}{S}_{0}^{k}({e}^{T{\mu }_{k} } - 1/{B}_{0}) +\sum\limits_{k=1}^{d}{h}_{ k}{S}_{0}^{k}{e}^{T{\mu }_{k} }({W}_{k} - T{\mu }_{k}) \\ & \stackrel{\mathrm{d}}{ \approx }& \sum\limits_{k=1}^{d}{h}_{ k}{S}_{0}^{k}({e}^{T{\mu }_{k} } - 1/{B}_{0}) + {T}^{1/2}{\left (\sum\limits_{j,k=1}^{d}{h}_{ j}{h}_{k}{S}_{0}^{j}{S}_{ 0}^{k}{e}^{T({\mu }_{j}+{\mu }_{k})}{\Sigma }_{ j,k}\right )}^{1/2}Z,\\ \end{array}$$

where Z is standard normally distributed. In particular,

$$\begin{array}{rcl}{ \mathrm{VaR}}_{p}({V }_{T}-{V }_{0}/{B}_{0})& \approx & \sum\limits_{k=1}^{d}{h}_{ k}{S}_{0}^{k}\left (1-{B}_{ 0}{e}^{T{\mu }_{k} }\right ) \\ & & \quad +{T}^{1/2}{B}_{ 0}{\left (\sum\limits_{j,k=1}^{d}{h}_{ j}{h}_{k}{S}_{0}^{j}{S}_{ 0}^{k}{e}^{T({\mu }_{j}+{\mu }_{k})}{\Sigma }_{ j,k}\right )\!}^{1/2}{\Phi }^{-1}(1-p).\end{array}$$

If \({B}_{0}{e}^{T{\mu }_{k}} \approx 1\) for all k, then the estimate of \({\mathrm{VaR}}_{p}({V }_{T} - {V }_{0}/{B}_{0})\) is approximately proportional to the square root of the length T of the time period.

As an illustration, we consider the situation where X 1 has a ten-dimensional Student’s t distribution with three degrees of freedom, with zero mean and standard deviations 0. 01 and pairwise linear correlation coefficients of 0. 4. Moreover, we assume that we hold one share of each stock (h k = 1), that the current share price is 10 for each stock (S 0 k = 10), and that interest rates can be ignored (B 0 = 1). This gives

$$\begin{array}{rcl}{ \mathrm{VaR}}_{p}({V }_{T} - {V }_{0}/{B}_{0})& \approx & {T}^{1/2}{(d(1 + 0.4(d - 1)))}^{1/2}{\Phi }^{-1}(1 - p) \\ & =& {(27T)}^{1/2}{\Phi }^{-1}(1 - p).\end{array}$$

We now compare this estimate to the empirical estimate based on a large simulated sample of independent copies of \({V }_{T} - {V }_{0}/{B}_{0}\). The results are shown in Fig. 9.5. It is interesting to note that for T small, the underestimation of \({\mathrm{VaR}}_{p}({V }_{T} - {V }_{0}/{B}_{0})\) for p small due to the lighter tails of the normal distribution is offset by the overestimation of \({\mathrm{VaR}}_{p}({V }_{T} - {V }_{0}/{B}_{0})\) due to linearization.

Fig. 9.5
figure 5

Illustration of accuracy of estimates of VaR0. 05(V T V 0) and VaR0. 01(V T V 0) based on linearization and a normal approximation, as functions of \(T \in \{ 1,\ldots, 100\}\) (dashed curves). The solid curves show the empirical VaR estimates based on simulated samples of size 105

3 Applications of Elliptical Distributions in Risk Management

In this section, we consider five applications of elliptical distributions in risk management. In the first application, we derive a risk-aggregation formula that relates the risk, in terms of a translation-invariant and positively homogeneous risk measure, for a sum of jointly elliptically distributed random values to the risk of the terms in the sum. The second application shows how linearization and a normal approximation can be used to approximate the risk measure VaR0. 005(AL) used to determine the solvency of an insurance company. This application presents the idea behind the so-called standard formula that is used in the measurement of risk in the insurance industry. The third application suggests a hedging approach to European call options that is more appropriate than delta hedging if the joint distribution for the log return of the underlying asset value and the change in the implied volatility can be assumed to be elliptical. The fourth application presents how a trader might design a bet on changes in implied volatility for two maturity times and considers ways to investigate the risk of such a bet. The fifth application illustrates that if the vector of returns on a set of risky assets can be assumed to be elliptically distributed, then portfolio investment problems can often be reduced to the trade-off investment problem (4.7).

3.1 Risk Aggregation with Elliptical Distributions

Consider a company divided into n business units with future net values of assets and liabilities given by \({X}_{1},\ldots, {X}_{n}\). Suppose that each business unit is able to accurately estimate E[X k ] and ρ(X k ), where ρ is some translation-invariant and positively homogeneous risk measure. The company wants to compute \(\rho ({X}_{1} + \cdots + {X}_{n})\) to get a measurement on the aggregate risk for the whole company. There is no straightforward way to combine the individual risk estimates ρ(X k ) and the expected values E[X k ] into an aggregate risk estimate. However, there is a convenient risk-aggregation formula that is valid under the assumption that \({({X}_{1},\ldots, {X}_{n})}^{\mathrm{T}}\) has an elliptical distribution.

Suppose that \(\mathbf{X} = {({X}_{1},\ldots, {X}_{n})}^{\mathrm{T}}\) has an elliptical distribution so that \(\mathbf{X}\stackrel{\mathrm{d}}{ =}\mathbf{\mu } + \mathbf{A}\mathbf{Y}\), with AA T = Σ, and Y has a spherical distribution. Matrix Σ can always be expressed as the product DCD, where D is a diagonal matrix with diagonal entries \({D}_{k,k} = {\Sigma }_{k,k}^{1/2}\) and C is a correlation matrix (the linear correlation matrix of X if it exists). Note that

$$\rho ({X}_{1} + \cdots + {X}_{n}) = -{B}_{0}\sum\limits_{k=1}^{n}{\mu }_{ k} + \rho \left (\sum\limits_{k=1}^{n}({X}_{ k} - {\mu }_{k})\right ),$$

where B 0 is the discount factor between now and the considered future time, and

$$\sum\limits_{k=1}^{n}({X}_{ k} - {\mu }_{k}) ={ \mathbf{1}}^{\mathrm{T}}\mathbf{A}\mathbf{Y}\stackrel{\mathrm{ d}}{ =}{({\mathbf{1}}^{\mathrm{T}}\mathbf{\Sigma }\mathbf{1})}^{1/2}{Y }_{ 1}.$$

Since \({\mathbf{1}}^{\mathrm{T}}\mathbf{\Sigma }\mathbf{1} = {\Sigma }_{1,1} + {\Sigma }_{1,2} + \cdots + {\Sigma }_{n,n}\) and Σ j, k = D j, j C j, k D k, k , it holds that

$$\begin{array}{rcl} \rho \left (\sum\limits_{k=1}^{n}({X}_{ k} - {\mu }_{k})\right )& =& \rho \left ({\left (\sum\limits_{j,k}{\Sigma }_{j,k}\right )}^{1/2}{Y }_{ 1}\right ) \\ & =&{ \left (\sum\limits_{j,k}{C}_{j,k}{D}_{j,j}{D}_{k,k}\right )}^{1/2}\rho ({Y }_{ 1}) \\ & =&{ \left (\sum\limits_{j,k}{C}_{k,l}{D}_{j,j}{D}_{k,k}\rho {({Y }_{1})}^{2}\right )}^{1/2} \\ & =&{ \left (\sum\limits_{j,k}{C}_{j,k}\rho ({D}_{j,j}{Y }_{j})\rho ({D}_{k,k}{Y }_{k})\right )}^{1/2} \\ & =&{ \left (\sum\limits_{j,k}{C}_{j,k}\rho ({X}_{j} - {\mu }_{j})\rho ({X}_{k} - {\mu }_{k})\right )}^{1/2}.\end{array}$$

We have found that if \(\mathbf{X} = {({X}_{1},\ldots, {X}_{n})}^{\mathrm{T}}\) has an elliptical distribution and if ρ is a translation-invariant and positively homogeneous risk measure, then

$$\rho ({X}_{1} + \cdots + {X}_{n}) ={ \left (\sum\limits_{j,k}{C}_{j,k}\{{B}_{0}{\mu }_{j} + \rho ({X}_{j})\}\{{B}_{0}{\mu }_{k} + \rho ({X}_{k})\}\right )}^{1/2} -{B}_{ 0}\sum\limits_{k}{\mu }_{k}.$$

The only additional input needed, besides the individual risk estimates ρ(X k ) and the means μ k , are the linear correlation coefficients C j, k .

3.2 Solvency of an Insurance Company

In this section, we present another example of linearization and normal approximation in the context of the solvency of an insurance company.

Consider an insurance company with assets and liabilities. Let A and L denote the time 1 (1 year from now) values of the assets and liabilities, respectively. We consider the insurance company to be solvent if

$${ \mathrm{VaR}}_{0.005}(A - L) \leq 0.$$

If r 1 is the current risk-free, 1-year zero rate, then we may write

$${ \mathrm{VaR}}_{0.005}(A - L) = {F}_{{e}^{-{r}_{1}}(L-A)}^{-1}(0.995).$$

We consider a stylized model for the assets and liabilities and assume that the liabilities correspond to the stochastic cash flow \(({C}_{1},\ldots, {C}_{n})\), where C k is the amount the insurer has to pay at the end of year k due to the occurrence of claims before the end of year 1. Each written contract offers a protection for the insured over a 1-year period. Operating expenses for the insurer could be included in the C k or dealt with in other ways. The expectation E[C k ] is the expected claim amount to be paid at time k, and \({e}^{-{r}_{k}}\mathrm{E}[{C}_{ k}]\) is the present value of this amount. The expected claim amount E[C k ] could be determined by some stochastic claim-reserving method, such as the chain ladder method presented in Sect. 7.6.1. The best estimate, at time 0, of the present value of the liabilities is

$${L}_{0} =\sum\limits_{k=1}^{n}\mathrm{E}[{C}_{ k}]{e}^{-{r}_{k}k}.$$

At time 1 we observe C 1 and receive new information about the future payments C k . If I 1 denotes the information available at time 1, then E[C k I 1] is the updated prediction of the payment due at time k. The time 1 value of the liabilities is therefore given by

$$L =\sum\limits_{k=1}^{n}\mathrm{E}[{C}_{ k}\mid {\mathbf{I}}_{1}]{e}^{-({r}_{k-1}+\Delta {r}_{k-1})(k-1)},$$

where Δ r is the vector of zero rate changes from time 0 to 1. Suppose for simplicity that the assets of the insurer consist of a bond portfolio designed to match future claim payments and K units of cash on a bank account. The time 0 value A 0 of the assets and the time 1 value A of the assets are given by

$$\begin{array}{rcl}{ A}_{0}& =& \sum\limits_{k=1}^{n}\mathrm{E}[{C}_{ k}]{e}^{-{r}_{k}k} + K, \\ A& =& \sum\limits_{k=1}^{n}\mathrm{E}[{C}_{ k}]{e}^{-({r}_{k-1}+\Delta {r}_{k-1})(k-1)} + K{e}^{{r}_{1} }.\end{array}$$

The time 0 value of the bond portfolio precisely matches the time 0 value of the liability, \({A}_{0} - K = {L}_{0}\). Moreover,

$$\begin{array}{rcl}{ e}^{-{r}_{1} }(L - A)& =& {e}^{-{r}_{1} }\sum\limits_{k=1}^{n}(\mathrm{E}[{C}_{ k}\mid {\mathbf{I}}_{1}] -\mathrm{E}[{C}_{k}]){e}^{-({r}_{k-1}+\Delta {r}_{k-1})(k-1)} - K \\ & =& \sum\limits_{k=1}^{n}{e}^{-{r}_{k}k}\mathrm{E}[{C}_{ k}]{Y }_{k}{e}^{{X}_{k} } - K \\ & =& g({X}_{1},\ldots, {X}_{n},{Y }_{1},\ldots, {Y }_{n}), \\ \end{array}$$

where \({X}_{k} = -{r}_{1} - ({r}_{k-1} + \Delta {r}_{k-1})(k - 1) + {r}_{k}k\), \({Y }_{k} = (\mathrm{E}[{C}_{k}\mid {\mathbf{I}}_{1}] -\mathrm{E}[{C}_{k}])/\mathrm{E}[{C}_{k}]\) for \(k = 1,\ldots, n\), and

$$g(\mathbf{x},\mathbf{y}) =\sum\limits_{k=1}^{n}{e}^{-{r}_{k}k}\mathrm{E}[{C}_{ k}]{y}_{k}{e}^{{x}_{k} } - K.$$

The quantity \({Y }_{1} = ({C}_{1} -\mathrm{E}[{C}_{1}])/\mathrm{E}[{C}_{1}]\) measures the relative deviation of the actual amount paid at the end of the year from the current prediction. For k ≥ 2, \({Y }_{k} = (\mathrm{E}[{C}_{k}\mid {\mathbf{I}}_{1}] -\mathrm{E}[{C}_{k}])/\mathrm{E}[{C}_{k}]\) measures the relative deviation of the updated prediction at the end of the year of the claim payments at time k, for claims incurred before the end of the year, from the current prediction.

Since g is a nonlinear function of the risk factors \(({X}_{1},\ldots, {X}_{n},{Y }_{1},\ldots, {Y }_{n})\), the computation of VaR is simplified substantially by linearization. Let μ k = E[X k ], and note that E[Y k ] = 0. Therefore, it makes sense to consider the first-order approximation of g around \(({\mu }_{1},\ldots, {\mu }_{n},0,\ldots, 0)\), which gives

$$\begin{array}{rcl} g({X}_{1},\ldots {X}_{n},{Y }_{1},\ldots, {Y }_{n})& \approx & g({\mu }_{1},\ldots, {\mu }_{k},0,\ldots, 0) +\sum\limits_{k=1}^{n}{e}^{-{r}_{k}k}\mathrm{E}[{C}_{ k}]{Y }_{k}{e}^{{\mu }_{k} } \\ & =& -K +\sum\limits_{k=1}^{n}{e}^{-{r}_{k}k}\mathrm{E}[{C}_{ k}]{Y }_{k}{e}^{{\mu }_{k} } \\ & =& -K +{ \mathbf{w}}^{\mathrm{T}}\mathbf{Y}, \\ \end{array}$$

where \({w}_{k} = {e}^{-{r}_{k}k}\mathrm{E}[{C}_{ k}]{e}^{{\mu }_{k}}\). Because of the linearization, the effect of the X k vanishes. The contributions to the risk coming from changes in the zero rates are second-order effects and do not show up in the linearized version of g. Although ignoring second-order effects is convenient for explicit computations, it leads to a crude approximation.

If Y is N(0, Σ)-distributed, then we find that

$${ \mathrm{VaR}}_{0.005}(A - L) \approx {F}_{-K+{\mathbf{w}}^{\mathrm{T}}\mathbf{Y}}^{-1}(0.995) = -K + {({\mathbf{w}}^{\mathrm{T}}\mathbf{\Sigma }\mathbf{w})}^{1/2}{\Phi }^{-1}(0.995).$$

Taking this approximation as an equality we find that the solvency condition VaR0. 005(AL) ≤ 0 is equivalent to

$$K \geq {({\mathbf{w}}^{\mathrm{T}}\mathbf{\Sigma }\mathbf{w})}^{1/2}{\Phi }^{-1}(0.995).$$

The outlined procedure is the basic idea behind the standard formula in the Solvency II framework for the computation of sufficient buffer capital for an insurance company. Of course, in practice, many more risk factors need to be included and the insurer’s asset portfolio is more complex. Nevertheless, the linearization approach and the normal approximation is at the heart of the standard formula. To compensate for the inaccuracies of linearization and the normal approximation, the covariance matrix Σ is not estimated from data but given exogenously by the regulators.

3.3 Hedging of a Call Option When the Volatility Is Stochastic

Suppose that now at time 0 we have issued a European call option with strike price K on the value S T of a stock market index at time T. Suppose also that we want to hedge against changes in the option price from now until time t < T by taking a position in the underlying index and deposit cash to minimize

$$\mathrm{E}[{({h}_{0} + {h}_{1}{S}_{t} - {C}_{t})}^{2}],$$

where C t is the call option price at time t. If t is small, then the delta-hedging approach in Sect. 3.5 gives an approximative solution to the quadratic hedging problem. Suppose that the option price is expressed in terms of the Black–Scholes formula (1.7) as a function \({C}_{t} = C({S}_{t},{\sigma }_{t},{r}_{t},t,T - t)\), where the arguments correspond to the value of the underlying index at time t, the option’s implied volatility at time t, interest rate prevailing between time t and the maturity time T of the option, and the remaining time to maturity. The delta-hedging approach relies on the first-order approximation

$${C}_{t} \approx {C}_{0} + \frac{\partial {C}_{0}} {\partial {S}_{0}} ({S}_{t} - {S}_{0}),$$

which gives the delta-hedge position (h 0 δ, h 1 δ) ≈ (h 0, h 1), where

$${h}_{1}^{\delta } = \frac{\partial {C}_{0}} {\partial {S}_{0}} \quad { and}\quad {h}_{0}^{\delta } = {C}_{ 0} -\frac{\partial {C}_{0}} {\partial {S}_{0}} {S}_{0}.$$

The Black–Scholes formula reads

$$\begin{array}{rcl}{ C}_{t}& =& {S}_{t}\Phi ({d}_{1}) - K{e}^{-{r}_{t}(T-t)}\Phi ({d}_{ 2}), \\ {d}_{1}& =& \frac{\log ({S}_{t}/K) + ({r}_{t} + {\sigma }_{t}^{2}/2)(T - t)} {{\sigma }_{t}\sqrt{T - t}} \quad { and}\quad {d}_{2} = {d}_{1} - {\sigma }_{t}\sqrt{T - t} \\ \end{array}$$

and gives

$$\frac{\partial {C}_{0}} {\partial {S}_{0}} = \Phi ({d}_{1}),\quad {d}_{1} = \frac{\log ({S}_{0}/K) + ({r}_{0} + {\sigma }_{0}^{2}/2)T} {{\sigma }_{0}\sqrt{T}}.$$

The hedging error at time t is

$${h}_{0}^{\delta } + {h}_{ 1}^{\delta }{S}_{ t} - {C}_{t} = {C}_{0} - {C}_{t} + \Phi ({d}_{1})({S}_{t} - {S}_{0}).$$

The change in the interest rate from r 0 to r t typically does not contribute much to the hedging error, and therefore we may approximate r t r 0. We may therefore view the hedging error as a function of the changes in the index value and in the implied volatility or, equivalently, as a function g(z) evaluated at Z = (Z 1, Z 2), where \({Z}_{1} =\log ({S}_{t}/{S}_{0})\) and \({Z}_{2} = {\sigma }_{t} - {\sigma }_{0}\). Therefore, a model for (Z 1, Z 2) implies a model for the hedging error and the latter model can be analyzed by, e.g., simulation from (Z 1, Z 2). The sample from (Z 1, Z 2) can then be converted to a sample from the distribution of the hedging error whose empirical distribution can be studied. Alternatively, we could linearize the nonlinear function g(z) and evaluate the linear approximation at Z = (Z 1, Z 2). The linearization approach may give an approximation of the distribution for the hedging error that can be analyzed analytically, without simulation. Consider the first-order approximation

$${C}_{t} \approx g(\mathbf{0}) + \frac{\partial g} {\partial {z}_{1}}(\mathbf{0}){Z}_{1} + \frac{\partial g} {\partial {z}_{2}}(\mathbf{0}){Z}_{2},$$

where g(z) = g 1(g 2(z 1), g 3(z 2)) with \({g}_{2}({z}_{1}) = {S}_{0}{e}^{{z}_{1}}\), \({g}_{3}({z}_{2}) = {z}_{2} + {\sigma }_{0}\), and

$$\begin{array}{rcl} & & {g}_{1}(s,\sigma ) = s\Phi ({d}_{1}) - K{e}^{-{r}_{0}(T-t)}\Phi ({d}_{ 2}), \\ & & {d}_{1} = \frac{\log (s/K) + ({r}_{0} + {\sigma }^{2}/2)(T - t)} {\sigma \sqrt{T - t}} \quad { and}\quad {d}_{2} = {d}_{1} - \sigma \sqrt{T - t}.\end{array}$$

The chain rule, together with the expressions for the partial derivatives of the Black–Scholes formula (Sect. 1.2.2), gives

$$\begin{array}{rcl} \frac{\partial g} {\partial {z}_{1}}(\mathbf{0})& =& \frac{\partial {g}_{1}} {\partial s} ({S}_{0},{\sigma }_{0})\frac{d{g}_{2}} {d{z}_{1}}(0) = \Phi ({d}_{1}){S}_{0}, \\ \frac{\partial g} {\partial {z}_{2}}(\mathbf{0})& =& \frac{\partial {g}_{1}} {\partial \sigma } ({S}_{0},{\sigma }_{0})\frac{d{g}_{3}} {d{z}_{2}}(0) = \phi ({d}_{1}){S}_{0}\sqrt{T - t}\end{array}$$

Summing up, we arrive at the following approximation of the hedging error:

$$\begin{array}{rcl}{ h}_{0}^{\delta } + {h}_{ 1}^{\delta }{S}_{ t} - {C}_{t}& =& {C}_{0} - {C}_{t} + \Phi ({d}_{1})({S}_{t} - {S}_{0}) \\ & \approx & {C}_{0} - {C}_{0} - \Phi ({d}_{1}){S}_{0}{Z}_{1} - \phi ({d}_{1}){S}_{0}\sqrt{T - t}{Z}_{2} \\ & & +\Phi ({d}_{1})({S}_{0}(1 + {Z}_{1}) - {S}_{0}) \\ & =& -\phi ({d}_{1}){S}_{0}\sqrt{T - t}({\sigma }_{t} - {\sigma }_{0}).\end{array}$$

We see that the position, the delta hedge and the issued call option, is immune against changes in the index value (approximately, over a short time period) and that the hedging error is due to changes in the implied volatility. We also find that the variance of the hedging error is

$$\mathrm{Var}({h}_{0}^{\delta } + {h}_{ 1}^{\delta }{S}_{ t} - {C}_{t}) \approx \phi {({d}_{1})}^{2}{S}_{ 0}^{2}(T - t)\mathrm{Var}({\sigma }_{ t}).$$

We now want to reduce the hedging error by replacing the delta hedge by a similar hedge that also takes changes in the implied volatility into account. The position in the underlying index and in cash for the optimal quadratic hedge is

$${h}_{1} = \frac{\mathrm{Cov}({S}_{t},{C}_{t})} {\mathrm{Var}({S}_{t})} \quad { and}\quad {h}_{0} = \mathrm{E}[{C}_{t}] - {h}_{1}\mathrm{E}[{S}_{t}].$$

Here we approximate

$$\begin{array}{rcl} \mathrm{Cov}({S}_{t},{C}_{t})& \approx & \mathrm{Cov}({S}_{0}{Z}_{1},\Phi ({d}_{1}){S}_{0}{Z}_{1} + \phi ({d}_{1}){S}_{0}\sqrt{T - t}{Z}_{2}) \\ & =& {S}_{0}^{2}\Phi ({d}_{ 1})\mathrm{Var}({Z}_{1}) + {S}_{0}^{2}\phi ({d}_{ 1})\sqrt{T - t}\mathrm{Cov}({Z}_{1},{Z}_{2}), \\ \mathrm{Var}({S}_{t})& \approx & {S}_{0}^{2}\mathrm{Var}({Z}_{ 1}), \\ \mathrm{E}[{C}_{t}]& \approx & {C}_{0}, \\ \mathrm{E}[{S}_{t}]& \approx & {S}_{0}.\end{array}$$

This gives the hedge (h 0 , h 1 ) ≈ (h 0, h 1), where

$$\begin{array}{rcl}{ h}_{1}^{{_\ast}}& =& \Phi ({d}_{ 1}) + \phi ({d}_{1})\sqrt{T - t}\frac{\mathrm{Cov}({Z}_{1},{Z}_{2})} {\mathrm{Var}({Z}_{1})} \\ & =& \Phi ({d}_{1}) + \phi ({d}_{1})\sqrt{T - t}\frac{{\sigma }_{{Z}_{2}}} {{\sigma }_{{Z}_{1}}} \rho, \\ {h}_{0}^{{_\ast}}& =& {C}_{ 0} - {h}_{1}^{{_\ast}}{S}_{ 0}, \\ \end{array}$$

where \({\sigma }_{{Z}_{k}} = \mathrm{Var}{({Z}_{k})}^{1/2}\) and ρ = Cor(Z 1, Z 2). We observe that the position h 1 in the underlying index corresponds to the delta-hedge position h 1 δ plus a correction term. We get the following approximation of the hedging error:

$$\begin{array}{rcl}{ h}_{0}^{{_\ast}} + {h}_{ 1}^{{_\ast}}{S}_{ t} - {C}_{t}& \approx & {C}_{0} + \left (\Phi ({d}_{1}) + \phi ({d}_{1})\sqrt{T - t}\frac{{\sigma }_{{Z}_{2}}} {{\sigma }_{{Z}_{1}}} \rho \right ){S}_{0}{Z}_{1} \\ & & -{C}_{0} - \Phi ({d}_{1}){S}_{0}{Z}_{1} - \phi ({d}_{1}){S}_{0}\sqrt{T - t}{Z}_{2} \\ & =& \phi ({d}_{1}){S}_{0}\sqrt{T - t}\left (\frac{{\sigma }_{{Z}_{2}}} {{\sigma }_{{Z}_{1}}} \rho {Z}_{1} - {Z}_{2}\right ).\end{array}$$

In particular, the variance of the hedging error is approximately

$$\begin{array}{rcl} \mathrm{Var}({h}_{0}^{{_\ast}} + {h}_{ 1}^{{_\ast}}{S}_{ t} - {C}_{t})& \approx & \mathrm{Var}\left (\phi ({d}_{1}){S}_{0}\sqrt{T - t}\left (\frac{{\sigma }_{{Z}_{2}}} {{\sigma }_{{Z}_{1}}} \rho {Z}_{1} - {Z}_{2}\right )\right ) \\ & =& \phi {({d}_{1})}^{2}{S}_{ 0}^{2}(T - t)\mathrm{Var}({\sigma }_{ t})(1 - {\rho }^{2}), \\ \end{array}$$

where the last equality can be verified by straightforward computations of the variance of the sum of two correlated terms. Notice that taking changes in implied volatility into account when computing the approximation of the quadratic hedge makes the variance of the hedging error smaller by a factor of (1 − ρ2).

3.4 Betting on Changes in Volatility

Suppose that a trader is betting on changes in implied volatility from time 0 today until time t > 0 in the future for two future maturity times and that we want to analyze the riskiness of this volatility bet. Consider two call options on the values of an index at two future times 0 < T 1 < T 2. The trader believes that over a short period of time the change in implied volatility σ t 1 − σ0 1 for the nearer maturity time T 1 will be greater than that for the more distant maturity time T 2, σ t 2 − σ0 2. The trader wants to capitalize on this belief but at the same time not bet on other potential movements of the underlying index value. We first determine the particular portfolio corresponding to the volatility bet.

Consider a long position of size h 2 in a call option with strike K 1 maturing at time T 1 and a short position of size h 3 in a call option with strike K 2 maturing at time T 2. The future value of this position is, to a first-order approximation and with the expressions for the partial derivatives of the Black–Scholes formula,

$$\begin{array}{rcl}{ h}_{2}{C}_{t}^{1} - {h}_{ 3}{C}_{t}^{2}& \approx & {h}_{ 2}{C}_{0}^{1} - {h}_{ 3}{C}_{0}^{2} \\ & & +{h}_{2}\left (\Phi ({d}_{1}^{1})({S}_{ t} - {S}_{0}) + \phi ({d}_{1}^{1}){S}_{ 0}\sqrt{{T}_{1}}({\sigma }_{t}^{1} - {\sigma }_{ 0}^{1})\right ) \\ & & -{h}_{3}\left (\Phi ({d}_{1}^{2})({S}_{ t} - {S}_{0}) + \phi ({d}_{1}^{2}){S}_{ 0}\sqrt{{T}_{2}}({\sigma }_{t}^{2} - {\sigma }_{ 0}^{2})\right ), \\ \end{array}$$

where

$${d}_{1}^{j} = \frac{\log ({S}_{0}/{K}_{j}) + ({r}_{j} + {({\sigma }_{0}^{j})}^{2}/2){T}_{ j}} {{\sigma }_{0}^{j}\sqrt{{T}_{j}}} \quad { for }j = 1,2.$$

With \({Z}_{1} =\log ({S}_{t}/{S}_{0})\), \({Z}_{2} = {\sigma }_{t}^{1} - {\sigma }_{0}^{1}\), and \({Z}_{3} = {\sigma }_{t}^{2} - {\sigma }_{0}^{2}\), and the approximation S t S 0S 0 Z 1, we get

$$\begin{array}{rcl}{ h}_{2}{C}_{t}^{1} - {h}_{ 3}{C}_{t}^{2}& \approx & {h}_{ 2}{C}_{0}^{1} - {h}_{ 3}{C}_{0}^{2} + ({h}_{ 2}\Phi ({d}_{1}^{1}) - {h}_{ 3}\Phi ({d}_{1}^{2})){S}_{ 0}{Z}_{1} \\ & & +{h}_{2}{S}_{0}\phi ({d}_{1}^{1})\sqrt{{T}_{ 1}}{Z}_{2} - {h}_{3}{S}_{0}\phi ({d}_{1}^{2})\sqrt{{T}_{ 2}}{Z}_{3}.\end{array}$$

The volatility bet is a bet on the occurrence of the event Z 2 > Z 3, and on nothing else. Therefore, the trader chooses h 2 and h 3 so that

$${h}_{2}\phi ({d}_{1}^{1})\sqrt{{T}_{ 1}} - {h}_{3}\phi ({d}_{1}^{2})\sqrt{{T}_{ 2}} = 0,$$

meaning that the impact of a parallel shift in the implied volatility should be approximately zero. Moreover, the trader wants the bet to be immune to changes in the value of the underlying index. Therefore, the trader takes the position

$${h}_{1} = -({h}_{2}\Phi ({d}_{1}^{1}) - {h}_{ 3}\Phi ({d}_{1}^{2}))$$

in the index and a position

$${h}_{0} = -{h}_{1}{S}_{0} - {h}_{2}{C}_{0}^{1} + {h}_{ 3}{C}_{0}^{2}$$

in cash. Summing up, we find that the volatility bet corresponds to the portfolio weights h 0, h 1, h 2, h 3 and the future portfolio value

$$\begin{array}{rcl}{ h}_{0} + {h}_{1}{S}_{t} + {h}_{2}{C}_{t}^{1} - {h}_{ 3}{C}_{t}^{2}& \approx & {h}_{ 2}{S}_{0}\phi ({d}_{1}^{1})\sqrt{{T}_{ 1}}{Z}_{2} - {h}_{2}\frac{\phi ({d}_{1}^{1})\sqrt{{T}_{1}}} {\phi ({d}_{1}^{2})\sqrt{{T}_{2}}}{S}_{0}\phi ({d}_{1}^{2})\sqrt{{T}_{ 2}}{Z}_{3} \\ & =& {h}_{2}{S}_{0}\phi ({d}_{1}^{1})\sqrt{{T}_{ 1}}({Z}_{2} - {Z}_{3}).\end{array}$$

To estimate the risk of holding this portfolio until time t, we could now assign a bivariate elliptical distribution to (Z 2, Z 3), determine the corresponding univariate elliptical distribution of Z 2Z 3, and finally compute \(\rho ({h}_{2}{S}_{0}\phi ({d}_{1}^{1})\sqrt{{T}_{1}}({Z}_{2} - {Z}_{3}))\) for a suitable choice of risk measure ρ. However, this apparent straightforward approach to measuring the riskiness of the volatility bet is not unproblematic. Assigning a bivariate model to (Z 2, Z 3) can at best be guided by historical data on implied volatility changes but will to a large extent be based on subjective beliefs. Moreover, if the sizes of the option positions are large, then it may be unrealistic to assume that the positions can be closed at time t if t is small. In that case, we need a longer time period for the risk modeling, and this makes the whole linearization approach questionable.

3.5 Portfolio Optimization with Elliptical Distributions

Suppose vector R of returns on a collection of risky assets can be modeled by a normal variance mixture distribution so that \(\mathbf{R}\stackrel{\mathrm{d}}{ =}\mathbf{\mu } + W\mathbf{A}\mathbf{Z}\), where Z is N d (0, I)-distributed and independent of W ≥ 0, and AA T = Σ. If R 0 is the return on a risk-free asset, then the future value of a portfolio with monetary portfolio weights w in the risky assets and w 0 in the risk-free asset can be expressed as

$$\begin{array}{rcl}{ V }_{1}& = & {w}_{0}{R}_{0} +{ \mathbf{w}}^{\mathrm{T}}\mathbf{R} \\ & \stackrel{\mathrm{d}}{ =}& {w}_{0}{R}_{0} +{ \mathbf{w}}^{\mathrm{T}}\mathbf{\mu } + {({\mathbf{w}}^{\mathrm{T}}\mathbf{\Sigma }\mathbf{w})}^{1/2}W{Z}_{ 1}.\end{array}$$
(9.11)

Suppose the variance Var(V 1) = σ2 w T Σw, where σ2 = Var(WZ 1), exists. Then the solution to the investment problem

$$\begin{array}{ll} { maximize } &{w}_{0}{R}_{0} +{ \mathbf{w}}^{\mathrm{T}}\mathbf{\mu } - \frac{c} {2{V }_{0}} {\sigma }^{2}{\mathbf{w}}^{\mathrm{T}}\mathbf{\Sigma }\mathbf{w} \\ { subject to }&{w}_{0} +{ \mathbf{w}}^{\mathrm{T}}\mathbf{1} \leq {V }_{0}\end{array}$$

follows from the solution to the trade-off investment problem (4.7) by replacing Σ in (4.7) by σ2 Σ and is given by

$$\mathbf{w} = \frac{{V }_{0}} {c} {({\sigma }^{2}\mathbf{\Sigma })}^{-1}(\mathbf{\mu } - {R}_{ 0}\mathbf{1})\quad { and}\quad {w}_{0} = {V }_{0} -{\mathbf{w}}^{\mathrm{T}}\mathbf{1}.$$

A convenient feature of having an elliptical distribution for vector R of returns is that portfolio optimization problems often reduce to the trade-off investment problem (4.7). Consider the problem of portfolio optimization in the context of a spectral risk measure.

Example 9.11 (Spectral risk measures). 

Portfolio optimization with respect to a spectral risk measure (Sect. 6.5) amounts to minimizing a spectral risk measure ρϕ(X), where X denotes a future portfolio value, under a budget constraint (and possibly additional constraints). By the stochastic representation (9.11), we can express the quantile function of V 1 as

$${F}_{{V }_{1}}^{-1}(p) = {w}_{ 0}{R}_{0} +{ \mathbf{w}}^{\mathrm{T}}\mathbf{\mu } + {({\mathbf{w}}^{\mathrm{T}}\mathbf{\Sigma }\mathbf{w})}^{1/2}{F}_{ W{Z}_{1}}^{-1}(p).$$

Therefore, the spectral risk measure

$${\rho }_{\phi }(X) = -{\int }_{0}^{1}\phi (p){F}_{ X/{R}_{0}}^{-1}(p)dp,$$

applied to \(X = {V }_{1} - {V }_{0}{R}_{0}\), can be expressed as

$$\begin{array}{rcl}{ \rho }_{\phi }({V }_{1} - {V }_{0}{R}_{0})& =& -{\int }_{0}^{1}\phi (p){F}_{{ V }_{1}/{R}_{0}}^{-1}(p)dp + {V }_{ 0} \\ & =& \frac{1} {{R}_{0}}\left (-{w}_{0}{R}_{0} -{\mathbf{w}}^{\mathrm{T}}\mathbf{\mu } - {({\mathbf{w}}^{\mathrm{T}}\mathbf{\Sigma }\mathbf{w})}^{1/2}{ \int }_{0}^{1}\phi (p){F}_{ W{Z}_{1}}^{-1}(p)dp\right ) + {V }_{ 0}.\end{array}$$

In particular, we can formulate the portfolio optimization problem

$$\begin{array}{ll} { minimize } &{\rho }_{\phi }({w}_{0}{R}_{0} +{ \mathbf{w}}^{\mathrm{T}}\mathbf{R} - {V }_{0}{R}_{0}) \\ { subject to }&{w}_{0} +{ \mathbf{w}}^{\mathrm{T}}\mathbf{1} \leq {V }_{0}\end{array}$$

as the trade-off problem

$$\begin{array}{ll} { maximize } &{w}_{0}{R}_{0} +{ \mathbf{w}}^{\mathrm{T}}\mathbf{\mu } - \frac{c} {2{V }_{0}} {({\mathbf{w}}^{\mathrm{T}}\mathbf{\Sigma }\mathbf{w})}^{1/2} \\ { subject to }&{w}_{0} +{ \mathbf{w}}^{\mathrm{T}}\mathbf{1} \leq {V }_{0}, \end{array}$$

where

$$c = -2{V }_{0}{ \int }_{0}^{1}\phi (p){F}_{ W{Z}_{1}}^{-1}(p)dp.$$

We conclude that, for an elliptical model for vector R of returns, minimizing the spectral risk measure of the future portfolio value subject to a budget constraint is equivalent to solving a trade-off problem with the trade-off parameter given above.

4 Copulas

A rather common situation arises when we search for a multivariate model for a set of random variables \({Y }_{1},\ldots, {Y }_{d}\) whose univariate distributions are rather well understood but whose joint distribution is only partially understood. A useful approach to the construction of a multivariate distribution for \(\mathbf{Y} = ({Y }_{1},\ldots, {Y }_{d})\) with specified univariate marginal distribution functions \({G}_{1},\ldots, {G}_{d}\), the distribution functions of the vector’s components, is obtained by combining the so-called probability and quantile transforms. The probability transform says that if X is a random variable with a continuous distribution function F, then F(X) is uniformly distributed on the interval (0, 1). The quantile transform says that if U is uniformly distributed and if G is any distribution function, then G − 1(U) has distribution function G. This implies that for any random vector \(\mathbf{X} = ({X}_{1},\ldots, {X}_{d})\) whose components have continuous distribution functions \({F}_{1},\ldots, {F}_{d}\), the random vector \(\mathbf{Y} = ({G}_{1}^{-1}({F}_{1}({X}_{1})),\ldots, {G}_{d}^{-1}({F}_{d}({X}_{d})))\) corresponds to a multivariate model with prespecified univariate marginal distributions. If all F k and G k are both continuous and strictly increasing, then the preceding statement is actually straightforward to verify:

$$\mathrm{P}({G}_{k}^{-1}({F}_{ k}({X}_{k})) \leq y) = \mathrm{P}({F}_{k}({X}_{k}) \leq {G}_{k}(y)) = {G}_{k}(y),$$

which shows that Y k has distribution function G k . The difficulty when it comes to constructing a good multivariate model for Y using this approach clearly lies in the choice of the distribution for vector X since the dependence between the X k will be inherited by the Y k .

Example 9.12.

Consider the two scatter plots in Fig. 9.6. The left scatter plot shows a sample of size 2,000 from a bivariate standard normal distribution with linear correlation 0. 5. The right scatter plot shows a sample of size 2,000 from a bivariate distribution with standard normal marginal distributions and a dependence structure inherited from a bivariate Student’s t distribution with one degree of freedom. The points of the right scatter plot were obtained from the points of the left scatter plot as follows. Write \({\mathbf{Z}}_{1},\ldots, {\mathbf{Z}}_{2000}\) for the independent bivariate normal random vectors whose outcomes are shown in the left plot. Let \({S}_{1},\ldots, {S}_{2000}\) be independent χ1 2-distributed random variables independent of the sample from the bivariate normal distribution. A sample of independent bivariate Student’s t 1-distributed vectors was obtained by setting \({\mathbf{X}}_{k} = {S}_{k}^{-1/2}{\mathbf{Z}}_{k}\) for \(k = 1,\ldots, 2000\). Finally, the random vectors whose outcomes are shown in the plot to the right were constructed as \({\mathbf{Y}}_{k} = {({\Phi }^{-1}({t}_{1}({X}_{k,1})),{\Phi }^{-1}({t}_{1}({X}_{k,2})))}^{\mathrm{T}}\) for \(k = 1,\ldots, 2000\).

Fig. 9.6
figure 6

Samples of size 2,000 from two bivariate distributions with standard normal marginal distributions. Left plot: sample from a bivariate standard normal with linear correlation 0. 5. Right plot: sample from a bivariate standard Student’s t distribution with one degree of freedom, with marginal distributions transformed to a standard normal

Suppose that we want to build a multivariate model corresponding to a random vector \(\mathbf{X} = ({X}_{1},\ldots, {X}_{d})\) with a nontrivial dependence between its components and certain marginal distribution functions \({F}_{1},\ldots, {F}_{d}\). Then the quantile transform says that we may start with a suitable vector \(\mathbf{U} = ({U}_{1},\ldots, {U}_{d})\) whose components are uniformly distributed on (0, 1) and specify X as

$$\mathbf{X} = ({F}_{1}^{-1}({U}_{ 1}),\ldots, {F}_{d}^{-1}({U}_{ d})).$$

The random vector X inherits the dependence among its components from vector U. The distribution function C of a random vector U whose components U k are uniformly distributed on (0, 1) is called a copula, i.e.,

$$C({u}_{1},\ldots, {u}_{d}) = \mathrm{P}({U}_{1} \leq {u}_{1},\ldots, {U}_{d} \leq {u}_{d}),\quad ({u}_{1},\ldots, {u}_{d}) \in {(0,1)}^{d}.$$

Let \(({X}_{1},\ldots, {X}_{d})\) be a random vector with distribution function \(F({x}_{1},\ldots, {x}_{d}) = \mathrm{P}({X}_{1} \leq {x}_{1},\ldots, {X}_{d} \leq {x}_{d})\) and suppose that F k (x) = P(X k x) is a continuous function for every k. The probability transform, statement (iv) of Proposition 6.1, implies that the components of the vector \(\mathbf{U} = ({U}_{1},\ldots, {U}_{d}) = ({F}_{1}({X}_{1}),\ldots, {F}_{d}({X}_{d}))\) are uniformly distributed on (0, 1). In particular, the distribution function C of U is a copula and we call it the copula of X. Using statement (i) of Proposition 6.1 we find that

$$\begin{array}{rcl} C({F}_{1}({x}_{1}),\ldots, {F}_{d}({x}_{d}))& =& \mathrm{P}({U}_{1} \leq {F}_{1}({x}_{1}),\ldots, {U}_{d} \leq {F}_{d}({x}_{d})) \\ & =& \mathrm{P}({F}_{1}^{-1}({U}_{ 1}) \leq {x}_{1},\ldots, {F}_{d}^{-1}({U}_{ d}) \leq {x}_{d}) \\ & =& F({x}_{1},\ldots, {x}_{d}).\end{array}$$

This representation of the joint distribution function F in terms of the copula C and the marginal distribution functions \({F}_{1},\ldots, {F}_{d}\) explains the name “copula”: a function that “couples” the joint distribution function to its univariate marginal distribution functions.

Example 9.13 (Gaussian and Student’s t copulas). 

The copula C R { Ga} of a d-dimensional standard normal distribution, with linear correlation matrix R, is the distribution function of the random vector \((\Phi ({X}_{1}),\ldots, \Phi ({X}_{d}))\), where Φ is the univariate standard normal distribution function and X is N d (0, R)-distributed. Hence,

$${C}_{\mathbf{R}}^{{ Ga}}(\mathbf{u}) = \mathrm{P}(\Phi ({X}_{ 1}) \leq {u}_{1},\ldots, \Phi ({X}_{d}) \leq {u}_{d}) = {\Phi }_{\mathbf{R}}^{d}({\Phi }^{-1}({u}_{ 1}),\ldots, {\Phi }^{-1}({u}_{ d})),$$

where Φ R d is the distribution function of X. Copulas of the preceding form are called Gaussian copulas.

The copula C ν, R t of a d-dimensional standard Student’s t distribution with ν > 0 degrees of freedom and linear correlation matrix R is the distribution of the random vector \(({t}_{\nu }({X}_{1}),\ldots, {t}_{\nu }({X}_{d}))\), where X has a t d (0, R, ν) distribution and t ν is the univariate standard Student’s t ν distribution function. Hence,

$${C}_{\nu, \mathbf{R}}^{t}(\mathbf{u}) = \mathrm{P}({t}_{ \nu }({X}_{1}) \leq {u}_{1},\ldots, {t}_{\nu }({X}_{d}) \leq {u}_{d}) = {t}_{\nu, \mathbf{R}}^{d}({t}_{ \nu }^{-1}({u}_{ 1}),\ldots, {t}_{\nu }^{-1}({u}_{ d})),$$

where t ν, R d the distribution function of X. Copulas of the preceding form are called Student’s t copulas.

Consider a random vector (Y 1, Y 2) with continuous strictly increasing marginal distribution functions G 1 and G 2 and the copula of a Student’s t distribution with linear correlation parameter ρ. We consider here the question of how ρ can be estimated from a sample from the distribution of (Y 1, Y 2). We may write \(({Y }_{1},{Y }_{2}) = ({G}_{1}^{-1}({F}_{1}({X}_{1})),{G}_{2}^{-1}({F}_{2}({X}_{2})))\), where (X 1, X 2) has a Student’s t distribution with linear correlation parameter ρ. In particular, the functions T 1 and T 2 given by \({T}_{k}(x) = {G}_{k}^{-1}({F}_{k}(x))\) are continuous and strictly increasing, so for an independent copy (X′ 1, X′ 2) of (X 1, X 2) it holds that

$$\begin{array}{rcl} \tau ({Y }_{1},{Y }_{2})& =& \tau ({T}_{1}({X}_{1}),{T}_{2}({X}_{2})) \\ & =& 2\mathrm{P}(({T}_{1}({X}_{1}) - {T}_{1}({X^{\prime}}_{1}))({T}_{2}({X}_{2}) - {T}_{2}({X^{\prime}}_{2})) > 0) - 1 \\ & =& 2\mathrm{P}(({X}_{1} - {X^{\prime}}_{1})({X}_{2} - {X^{\prime}}_{2}) > 0) - 1 \\ & =& \tau ({X}_{1},{X}_{2}).\end{array}$$

It follows immediately from (9.7) that \(\rho =\sin (\pi \tau ({Y }_{1},{Y }_{2})/2)\). Therefore, the estimate \(\widehat{\tau }\) of τ(Y 1, Y 2) from the sample from the distribution of (Y 1, Y 2) gives an estimate \(\widehat{\rho } =\sin (\pi \widehat{\tau }/2)\) of ρ.

Example 9.14 (Investments in foreign stocks). 

Consider a Swedish investor about to invest Swedish kronor (SEK) in foreign telecom stocks. The current share prices of British Telecom (BT) and Deutsche Telekom (DT) are 185. 5 British pounds (GBP) and 9. 26 euros (EUR), respectively. The current SEK/GBP exchange rate is 0. 0942 (x kronor can be exchanged for 0. 0942x pounds). The current SEK/EUR exchange rate is 0. 1098.

The investor has obtained a sample of four-dimensional vectors of share prices, in the local currencies, and exchange rates from the 249 most recent (trading) days. We assume that the investor believes that the information in the data is relevant for assessing future portfolio values, and that no additional information on which to base model selection is available. The scatter plots for the stock log-return pairs and for the exchange-rate log-return pairs are shown in Fig. 9.7.

Fig. 9.7
figure 7

Scatter plot to left shows log-return pairs, British Telecom in pounds on the x-axis and Deutsche Telekom in euros on the y-axis. The scatter plot to the right shows log-return pairs, SEK/GBP on the x-axis and SEK/EUR on the y-axis

The investor is about to invest the amounts w 1 and w 2 kronor in the two foreign telecom stocks and wants to model the portfolio value V 1 in kronor tomorrow. Let A t , B t , C t , D t denote the time t share prices (BT and DT) and exchange rates (SEK/GBP and SEK/EUR). Let \({X}_{A} =\log ({A}_{1}/{A}_{0})\) be the log return from today until tomorrow for BT in GBP and similarly for X B , X C , X D . If h 1 and h 2 are the number of shares of BT and DT bought, then

$$\frac{{A}_{0}} {{C}_{0}}{h}_{1} = {w}_{1}\quad { and}\quad \frac{{B}_{0}} {{D}_{0}}{h}_{2} = {w}_{2}.$$

The portfolio value in kronor tomorrow is therefore

$$\begin{array}{rcl}{ V }_{1}& =& {h}_{1} \frac{{A}_{1}} {{C}_{1}} + {h}_{2} \frac{{B}_{1}} {{D}_{1}} \\ & =& {w}_{1}\frac{{A}_{1}} {{A}_{0}}{\left (\frac{{C}_{1}} {{C}_{0}}\right )}^{-1} + {w}_{ 2}\frac{{B}_{1}} {{B}_{0}}{\left (\frac{{D}_{1}} {{D}_{0}}\right )}^{-1} \\ & =& {w}_{1}\exp \{{X}_{A} - {X}_{C}\} + {w}_{2}\exp \{{X}_{B} - {X}_{D}\}.\end{array}$$

If the investor has already decided on a particular portfolio, i.e., has chosen the portfolio weights w 1 and w 2, then the log-return data may be used to generate a sample from the distribution of V 1 by viewing V 1 as a function of (X A , X B , X C , X D ). This sample can be transformed into a sample from the distribution of the portfolio log return log(V 1V 0), where \({V }_{0} = {w}_{1} + {w}_{2}\), and a parametric model can be chosen for the portfolio log return.

Here we want to allow the investor to vary the portfolio weights in order to choose an optimal (according to some criterion left unspecified) portfolio. Therefore, instead of setting up a model for V 1 directly, we set a model for the joint log-return distribution of (X A , X B , X C , X D ) from which the model for V 1 is easily inferred.

The Student’s t location-scale family of distributions is a natural choice of parametric family for log returns. Maximum-likelihood estimation of the parameter triple (μ, σ, ν) of the Student’s t location-scale family on the samples of daily log returns gives the following estimates:

$$\begin{array}{rrrl} (-6 \cdot 1{0}^{-4},&0.013,&3.7)&\quad { (British Telecom in pounds)}, \\ (2 \cdot 1{0}^{-4},&0.015,&7.7)&\quad { (Deutsche Telekom in euros)}, \\ (2 \cdot 1{0}^{-4},&0.006,&9.6)&\quad { (SEK/GBP)}, \\ (8 \cdot 1{0}^{-5},&0.004,&8.6)&\quad { (SEK/EUR)}.\end{array}$$

There is no a priori reason for the log-return distributions to be symmetric; the polynomial normal model in Example 8.10 is also a natural model for the log returns. The estimated parameters (θ0, θ1, θ2, θ3) based on the samples of daily log returns are

$$\begin{array}{rrrrl} (3.1,&142.6,& - 1.4,&15.5) \cdot 1{0}^{-4} & \quad { (British Telecom in pounds)}, \\ (-8.8,&120.1,& 9.2,&22.1) \cdot 1{0}^{-4} & \quad { (Deutsche Telekom in euros)}, \\ (4.0,& 53.5,& - 3.7,& 3.6) \cdot 1{0}^{-4} & \quad { (SEK/GBP)}, \\ (1.9,& 31.1,& - 2.0,& 3.5) \cdot 1{0}^{-4} & \quad { (SEK/EUR)}.\end{array}$$

The conditions θ3 > 0 and 3θ1θ3 − θ2 2 > 0 ensuring that the third-degree polynomial is strictly increasing is satisfied for estimated parameter vectors. Figure 9.8 shows the empirical quantiles of the log returns of BT and DT against those of the fitted parametric distributions. By comparing the two upper q–q plots we find that the polynomial normal model captures the asymmetry between the left and right tails in BT log-return data, whereas the Student’s t model does not.

Fig. 9.8
figure 8

Upper plots: empirical quantiles of British Telecom log-return data (y-axes) against quantiles of fitted distributions (x-axes): Student’s t model to the left and polynomial normal model to the right. Lower plots: empirical quantiles of Deutsche Telekom log-return data (y-axes) against quantiles of fitted distributions (x-axes): Student’s t model to the left and polynomial normal model to the right

We now proceed to the modeling of the dependence between the log returns. The sample correlations between log returns of the stocks and log returns of the exchange rates is approximately zero, and there are no obvious economic reasons not to assume independence between the log-return pairs (X A , X B ) and (X C , X D ) of stocks and exchange rates, respectively. We therefore assume that the log-return pairs (X A , X B ) and (X C , X D ) are independent and that the distribution functions of the two log-return pairs are of the form, with subscripts s for stocks and e for exchange rates,

$$\begin{array}{rcl} \mathrm{P}({X}_{A} \leq {x}_{A},{X}_{B} \leq {x}_{B})& =& {C}_{{\nu }_{s},{\rho }_{s}}^{t}({F}_{ A}({x}_{A}),{F}_{B}({x}_{B})), \\ \mathrm{P}({X}_{C} \leq {x}_{C},{X}_{D} \leq {x}_{D})& =& {C}_{{\nu }_{e},{\rho }_{e}}^{t}({F}_{ C}({x}_{C}),{F}_{D}({x}_{D})), \\ \end{array}$$

where F A , F B , F C , F D denote the distribution functions of X A , X B , X C , X D . Student’s t copula is a flexible parametric family for the dependence structure of the log-return pairs. Set U A = F A (X A ) and similarly for U B , U C , U D . The assumption of Student’s t copulas as models for the dependence structure for the log-return pairs requires that \(({U}_{A},{U}_{B})\stackrel{\mathrm{d}}{ =}(1 - {U}_{A},1 - {U}_{B})\) and \(({U}_{C},{U}_{D})\stackrel{\mathrm{d}}{ =}(1 - {U}_{C},1 - {U}_{D})\). Whatever choice of models for the individual log returns X A , X B , X C , X D among the sets of models given above, the log-return data give no reasons to reject the hypothesis that \(({U}_{A},{U}_{B})\stackrel{\mathrm{d}}{ =}(1 - {U}_{A},1 - {U}_{B})\) and \(({U}_{C},{U}_{D})\stackrel{\mathrm{d}}{ =}(1 - {U}_{C},1 - {U}_{D})\) (Fig. 9.9).

Fig. 9.9
figure 9

Left scatter plot: sample points in (9.12) obtained by componentwise transformation of original log-return pairs for stocks by fitted Student’s t location-scale distribution functions. Right scatter plot: with corresponding sample points for componentwise transformation by distribution functions of fitted polynomial normal models added, marked by times symbol, to illustrate the effect of the componentwise transformations

We may now estimate ρ s and ρ e by \(\widehat{{\rho }}_{s} =\sin (\pi \widehat{{\tau }}_{s}/2)\) and \(\widehat{{\rho }}_{e} =\sin (\pi \widehat{{\tau }}_{e}/2)\), and the estimate of (ρ s , ρ e ) is approximately (0. 62, 0. 61). Under the assumption that the marginal distribution functions F A , F B , F C , F D of the joint log-return distribution equal the estimated marginal distribution functions \(\widehat{{F}}_{A},\widehat{{F}}_{B},\widehat{{F}}_{C},\widehat{{F}}_{D}\), we may transform the samples

$$\{({X}_{A}^{1},{X}_{ B}^{1}),\ldots, ({X}_{ A}^{248},{X}_{ B}^{248})\}\quad { and}\quad \{({X}_{ C}^{1},{X}_{ D}^{1}),\ldots, ({X}_{ C}^{248},{X}_{ D}^{248})\}$$

into the samples

$$\{({U}_{A}^{1},{U}_{ B}^{1}),\ldots, ({U}_{ A}^{248},{U}_{ B}^{248})\}\quad { and}\quad \{({U}_{ C}^{1},{U}_{ D}^{1}),\ldots, ({U}_{ C}^{248},{U}_{ D}^{248})\}$$
(9.12)

from Student’s t copulas, where \({U}_{A}^{k} =\widehat{ {F}}_{A}({X}_{A}^{k})\), and similarly for U B k, U C k, U D k. In the case of a polynomial normal model choice, dropping subscripts for notational convenience, \(\widehat{F}(x) = \Phi (\widehat{{g}}^{-1}(x))\), and \(\widehat{{g}}^{-1}(x)\) is obtained as the (here unique real) solution y to the polynomial equation \(\widehat{{\theta }}_{0} +\widehat{ {\theta }}_{1}y +\widehat{ {\theta }}_{2}{y}^{2} +\widehat{ {\theta }}_{3}{y}^{3} = x\). Under the further assumption that the linear correlation parameters ρ s , ρ e equal the estimates \(\widehat{{\rho }}_{s},\widehat{{\rho }}_{e}\), the two samples in (9.12) are samples from two Student’s t copulas whose parameters are known except for the degree-of-freedom parameters ν s and ν e . The unknown parameters can be estimated by maximum likelihood, and the bivariate density function of Student’s t copula corresponding to the pair of log returns for stocks is given by

$${ c}_{{\nu }_{s},\widehat{{\rho }}_{s}}^{t}({u}_{ 1},{u}_{2}) = \frac{{\partial }^{2}} {\partial {u}_{1}\partial u2}{t}_{{\nu }_{s},\widehat{{\rho }}_{s}}^{2}({t}_{{ \nu }_{s}}^{-1}({u}_{ 1}),{t}_{{\nu }_{s}}^{-1}({u}_{ 2})) = \frac{{g}_{{\nu }_{s},\widehat{{\rho }}_{s}}^{2}({t}_{{\nu }_{s}}^{-1}({u}_{1}),{t}_{{\nu }_{s}}^{-1}({u}_{2}))} {{g}_{{\nu }_{s}}({t}_{{\nu }_{s}}^{-1}({u}_{1})){g}_{{\nu }_{s}}({t}_{{\nu }_{s}}^{-1}({u}_{2}))},$$

where \({t}_{{\nu }_{s},\widehat{{\rho }}_{s}}^{2}\) and \({g}_{{\nu }_{s},\widehat{{\rho }}_{s}}^{2}\) denote the distribution and density function, respectively, of the bivariate Student’s t distribution with degree-of-freedom parameter ν s and linear correlation parameter \(\widehat{{\rho }}_{s}\), and \({t}_{{\nu }_{s}}\) and \({g}_{{\nu }_{s}}\) denote the distribution and density function, respectively, of the univariate Student’s t distribution with degree-of-freedom parameter ν s . The procedure is similar for the pair of log returns for the exchange rates.

The samples in (9.12) depend on the choice of parametric models for the log returns and the corresponding parameter estimates. Therefore, we will here obtain two pairs of estimates \((\widehat{{\nu }}_{s},\widehat{{\nu }}_{e})\) of the copula parameters ν s and ν e . If the log-return distributions are assumed to be Student’s t distributions and the parameters are estimated by maximum likelihood, then we obtain the copula parameter estimates \((\widehat{{\nu }}_{s},\widehat{{\nu }}_{e}) \approx (5.1,6.8)\). If the log-return distributions are assumed to be given by the polynomial normal model, then we obtain the copula parameter estimates \((\widehat{{\nu }}_{s},\widehat{{\nu }}_{e}) \approx (3.6,5.5)\).

Now that the two models for the joint log-return distribution of the vector (X A , X B , X C , X D ) are set up and their parameters estimated, we evaluate the models in terms of how close the resulting distribution of the portfolio log return

$$\log ({V }_{1}/{V }_{0}),\quad {V }_{1} = \frac{{V }_{0}} {2} \exp \{{X}_{A} - {X}_{C}\} + \frac{{V }_{0}} {2} \exp \{{X}_{B} - {X}_{D}\}$$

is to the empirical distribution of the portfolio log return. The joint log-return models do not give closed-form expressions for the distributions of the portfolio log return. However, the portfolio log-return distributions are straightforward to simulate from. We simulate 105 outcomes of log(V 1V 0), according to the chosen model, by simulating outcomes (Z A , Z B ) and (Z C , Z D ) of independent Student’s t-distributed random vectors and using the formula

$$\log \left (\frac{1} {2}\exp \{\widehat{{F}}_{A}^{-1}({t}_{\widehat{{ \nu }_{s}}}({Z}_{A})-\widehat{{F}}_{C}^{-1}({t}_{\widehat{{ \nu }_{e}}}({Z}_{C}))\}\!+\frac{1} {2}\exp \{\widehat{{F}}_{B}^{-1}({t}_{\widehat{{ \nu }_{s}}}({Z}_{B})-\widehat{{F}}_{D}^{-1}({t}_{\widehat{{ \nu }_{e}}}({Z}_{D}))\}\right )\!,$$
Fig. 9.10
figure 10

These two q–q plots show the empirical quantiles of the portfolio log returns (y-axes) for \({w}_{1} = {w}_{2} = {V }_{0}/2\) against the quantiles of two models for log(V 1V 0) (x-axes). The plot to the left corresponds to the model for (X A , X B , X C , X D ) with Student’s t marginal distributions, and the plot to the right corresponds to the model for (X A , X B , X C , X D ) with polynomial normal marginal distributions

where (Z A , Z B ) has a bivariate standard Student’s t distribution with degree-of-freedom parameter \(\widehat{{\nu }_{s}}\) and linear correlation parameter \(\widehat{{\rho }}_{s}\), and (Z C , Z D ) has a bivariate standard Student’s t distribution with degree-of-freedom parameter \(\widehat{{\nu }_{e}}\) and linear correlation parameter \(\widehat{{\rho }}_{e}\). Finally, we compare the empirical distributions of the simulated samples of size 105 to the empirical distribution based on the original log-return sample. The result is shown in Fig. 9.10. Both models give a good fit to the log-return data.

If \(\mathbf{X} = ({X}_{1},\ldots, {X}_{d})\) is a random vector with continuous marginal distribution functions \({F}_{1},\ldots, {F}_{d}\), and if \({G}_{1},\ldots, {G}_{d}\) are any given distribution functions, then the random vector \(\mathbf{Y} = ({G}_{1}^{-1}({F}_{1}({X}_{1})),\ldots, {G}_{d}^{-1}({F}_{d}({X}_{d})))\) has marginal distribution functions \({G}_{1},\ldots, {G}_{d}\) and has inherited the dependence structure or copula from vector X. However, it may happen that the distribution functions \({F}_{1},\ldots, {F}_{d}\) cannot be determined explicitly. Another option is to consider a family of models for vectors \(({U}_{1},\ldots, {U}_{d})\) with components that are uniformly distributed on (0, 1) and consider models of the form \(\mathbf{Y} = ({G}_{1}^{-1}({U}_{1}),\ldots, {G}_{d}^{-1}({U}_{d}))\).

Example 9.15 (Archimedean copulas). 

Consider a strictly positive random variable X with a density f and Laplace transform \(\Psi (t) = \mathrm{E}[{e}^{-tX}]\). A useful family of copulas called Archimedean copulas is based on the fact that \(\Psi (-\log (V )/X)\) is uniformly distributed on (0, 1) if V is uniformly distributed on (0, 1) and independent of X. To verify this claim we first note that

$$\Psi (t) ={ \int }_{0}^{\infty }{e}^{-tx}f(x)dx\quad { and}\quad \Psi ^{\prime}(t) = -{\int }_{0}^{\infty }x{e}^{-tx}f(x)dx < 0,$$

so Ψ is nonnegative, continuous, and strictly decreasing on [0, ). For any u ∈ (0, 1) we can now verify that

$$\begin{array}{rcl} \mathrm{P}\left (\Psi \left (\frac{-\log V } {X} \right ) \leq u\right )& =& \mathrm{E}\left [\mathrm{P}\left (\Psi \left (\frac{-\log V } {X} \right ) \leq u\mid X\right )\right ] \\ & =& \mathrm{E}\left [\mathrm{P}\left (V \leq {e}^{-{\Psi }^{-1}(u)X }\mid X\right )\right ] \\ & =& \mathrm{E}\left [{e}^{-{\Psi }^{-1}(u)X }\right ] \\ & =& \Psi ({\Psi }^{-1}(u)) = u.\end{array}$$

It follows that if \({V }_{1},\ldots, {V }_{d}\) are uniformly distributed on (0, 1) and independent of X, then the distribution function C of

$$\mathbf{U} = \left (\Psi \left (\frac{-\log {V }_{1}} {X} \right ),\ldots, \Psi \left (\frac{-\log {V }_{d}} {X} \right )\right )$$
(9.13)

is a copula. We should always aim to understand a multivariate model through its stochastic representation. Here Ψ is decreasing with Ψ(0) = 1 and lim t Ψ(t) = 0. Therefore, we observe that if X takes a small value, then the random variables \(-\log ({V }_{k})/X\), for \(k = 1,\ldots, d\), are all likely to take large values, which implies small values for the random variables \({U}_{k} = \Psi (-\log ({V }_{k})/X)\). In particular, choosing a random variable X that has a relatively high probability of taking very small values is likely to lead to asymptotic dependence, in the sense that small values for one component are likely to imply small values for other components, for a model with the stochastic representation \(({G}_{1}^{-1}({U}_{1}),\ldots, {G}_{d}^{-1}({U}_{d}))\). Simulation from an Archimedean copula C as above is straightforward: just independently simulate standard uniform variates \({V }_{1},\ldots, {V }_{d}\) and X, and set U according to (9.13). Note that the copula can be expressed explicitly as

$$\begin{array}{rcl} C({u}_{1},\ldots, {u}_{d})& & =\mathrm{P}({U}_{1}\leq {u}_{1},\ldots, {U}_{d}\leq {u}_{d}) \\ & & =\mathrm{E}\left [\mathrm{P}\left ({V }_{1} \leq {e}^{-{\Psi }^{-1}({u}_{ 1})X},\ldots, {V }_{d}\leq {e}^{-{\Psi }^{-1}({u}_{ d})X}\mid X\right )\right ] \\ & & = \mathrm{E}\left [{e}^{-({\Psi }^{-1}({u}_{ 1})+\cdots +{\Psi }^{-1}({u}_{ d}))X}\right ] \\ & & = \Psi ({\Psi }^{-1}({u}_{ 1}) + \cdots + {\Psi }^{-1}({u}_{ d})). \end{array}$$
(9.14)

Example 9.16 (Clayton copula). 

If X has a { Gamma}(1 ∕ θ, 1) distribution, then X has density function \(f(x) = {x}^{1/\theta -1}{e}^{-x}/\Gamma (1/\theta )\) and Laplace transform

$$\Psi (t) = \mathrm{E}[{e}^{-tX}] ={ \int }_{0}^{\infty }{e}^{-tx} \frac{1} {\Gamma (1/\theta )}{x}^{1/\theta -1}{e}^{-x}\mathrm{d}x = {(t + 1)}^{-1/\theta }.$$

This choice of Ψ gives the Clayton copula. Solving \(\Psi ({\Psi }^{-1}(u)) = u\) for Ψ − 1(u) gives \({\Psi }^{-1}(u) = {u}^{-\theta } - 1\). Therefore, the copula expression (9.14) takes the form

$${C}_{\theta }^{{ Cl}}(\mathbf{u}) = {({u}_{ 1}^{-\theta } + \cdots + {u}_{ d}^{-\theta } - d + 1)}^{-1/\theta }.$$

Applying l’Hôpital’s rule shows that the Clayton copula has lower tail dependence in the sense that

$$ \begin{array}{rcl}\lim \limits_{u\rightarrow 0}\mathrm{P}({U}_{k} \leq u\mid {U}_{j} \leq u)& =& \lim \limits_{u\rightarrow 0}\frac{{(2{u}^{-\theta } - 1)}^{-1/\theta }} {u} \\ & =& \lim \limits_{u\rightarrow 0}\frac{ \frac{d} {du}{(2{u}^{-\theta } - 1)}^{-1/\theta }} { \frac{d} {du}u} \\ & =& \lim \limits_{u\rightarrow 0}2{u}^{-\theta -1}{(2{u}^{-\theta } - 1)}^{-1/\theta -1} \\ & =& {2}^{-1/\theta }.\end{array}$$

If θ = 1, then both X, and the random variables − logV k are standard exponentially distributed. In particular, we may write

$$({U}_{1},\ldots, {U}_{d})\stackrel{\mathrm{d}}{ =}\left ( \frac{{E}_{0}} {{E}_{0} + {E}_{1}},\ldots, \frac{{E}_{0}} {{E}_{0} + {E}_{d}}\right ),$$

where \({E}_{0},{E}_{1},\ldots, {E}_{d}\) are independent and standard exponentially distributed. We see that for all the U k to take small values, we need E 0 to take a small value. However, for all the U k to take large values, we need E 0 to take a large value and for all \({E}_{1},\ldots, {E}_{d}\) to take small values. The latter is less likely, and therefore a reasonable guess is that the Clayton copula does not have upper tail dependence: lim u → 0P(U k > uU j > u) = 0. An application of l’Hôpital’s rule verifies this claim. Samples from the Clayton copula are illustrated graphically in Fig. 9.11.

Fig. 9.11
figure 11

The upper two scatter plots show samples of size 2,000 from two bivariate distributions with standard normal marginal distributions. The left plot shows a sample from the bivariate standard normal distribution with linear correlation coefficient 0. 5 and the right plot shows a sample from a bivariate Clayton copula with parameter θ = 1, componentwise transformed to standard normal marginal distributions. The two lower scatter plots show samples of size 2,000 from two bivariate distributions with Gamma(3, 1) marginal distributions. The left plot corresponds to the copula of a bivariate standard normal distribution with linear correlation 0. 5, and the right plot corresponds to the copula of the vector (U 1, U 2) such that \((1 - {U}_{1},1 - {U}_{2})\) has a bivariate Clayton copula with parameter θ = 1

4.1 Misconceptions of Correlation and Dependence

Now we turn to common misconceptions of linear correlation. We have seen that given any two univariate distribution functions F 1 and F 2 and copula function C, F(x 1, x 2) = C(F 1(x 1), F 2(x 2)) is a bivariate distribution function with marginal distribution functions F 1 and F 2. It is typically hard to know which copula C to choose, and it is therefore tempting to ask for a bivariate distribution with given marginal distribution functions F 1 and F 2 and a given linear correlation coefficient ρ. However, we will see that this question is ill-posed in the sense that the set of bivariate distributions fulfilling the requirement may be empty.

To this end we first consider an integral representation of the covariance between two random variables in terms of their joint distribution function and their marginal distribution functions.

Proposition 9.8.

If (X 1 ,X 2 ) has distribution function F and marginal distribution functions F 1 and F 2 and the covariance Cov (X 1 ,X 2 ) exists finitely, then

$$\mathrm{Cov}({X}_{1},{X}_{2}) ={ \int }_{-\infty }^{\infty }{\int }_{-\infty }^{\infty }(F({x}_{ 1},{x}_{2}) - {F}_{1}({x}_{1}){F}_{2}({x}_{2}))d{x}_{1}d{x}_{2}.$$

Proof. Let (Y 1, Y 2) be an independent copy of (X 1, X 2), and note that

$$\mathrm{E}[({X}_{1}-{Y }_{1})({X}_{2}-{Y }_{2})] = \mathrm{E}[{X}_{1}{X}_{2}]-\mathrm{E}[{X}_{1}{Y }_{2}]+\mathrm{E}[{Y }_{1}{Y }_{2}]-\mathrm{E}[{Y }_{1}{X}_{2}] = 2\mathrm{Cov}({X}_{1},{X}_{2}).$$

Writing

$$({X}_{1} - {Y }_{1}) ={ \int }_{-\infty }^{\infty }(I\{{Y }_{ 1} \leq {x}_{1}\} - I\{{X}_{1} \leq {x}_{1}\})d{x}_{1},$$

and similarly for (X 2Y 2), we find that

$$\begin{array}{rcl} & & \mathrm{E}[({X}_{1}-{Y }_{1})({X}_{2}-{Y }_{2})] \\ & & \quad = \mathrm{E}\left [{\int }_{-\infty }^{\infty }(I\{{Y }_{ 1}\leq {x}_{1}\}-I\{{X}_{1}\leq {x}_{1}\})d{x}_{1}{ \int }_{-\infty }^{\infty }(I\{{Y }_{ 2}\leq {x}_{2}\}-I\{{X}_{2}\leq {x}_{2}\})d{x}_{2}\right ] \\ & & \quad =\mathrm{E}\left [{\int }_{-\infty }^{\infty }{\int }_{-\infty }^{\infty }(I\{{Y }_{ 1}\leq {x}_{1}\}-I\{{X}_{1}\leq {x}_{1}\})(I\{{Y }_{2}\leq {x}_{2}\}-I\{{X}_{2}\leq {x}_{2}\})d{x}_{1}d{x}_{2}\right ] \\ & & \quad ={\int }_{-\infty }^{\infty }{\int }_{-\infty }^{\infty }\mathrm{E}[I\{{Y }_{ 1}\leq {x}_{1}\}-I\{{X}_{1}\leq {x}_{1}\}]\mathrm{E}[I\{{Y }_{2}\leq {x}_{2}\}-I\{{X}_{2}\leq {x}_{2}\}]d{x}_{1}d{x}_{2} \\ & & \quad = 2{\int }_{-\infty }^{\infty }{\int }_{-\infty }^{\infty }(F({x}_{ 1},{x}_{2}) - {F}_{1}({x}_{1}){F}_{2}({x}_{2}))d{x}_{1}d{x}_{2}, \\ \end{array}$$

from which the conclusion follows. □

To determine which joint distribution function gives the minimal and maximal covariance (and therefore also linear correlation), we need to determine sharp upper and lower bounds on F in terms of F 1 and F 2. Note that

$$\begin{array}{rcl} \min (\mathrm{P}({X}_{1} \leq {x}_{1}),\mathrm{P}({X}_{2} \leq {x}_{2}))& \geq & \mathrm{P}({X}_{1} \leq {x}_{1},{X}_{2} \leq {x}_{2}) \\ & =& 1 -\mathrm{P}({X}_{1} > {x}_{1}{ or }{X}_{2} > {x}_{2}) \\ & \geq & 1 -\left (\mathrm{P}({X}_{1} > {x}_{1}) + \mathrm{P}({X}_{2} > {x}_{2})\right ) \\ & =& \mathrm{P}({X}_{1} \leq {x}_{1}) + \mathrm{P}({X}_{2} \leq {x}_{2}) - 1, \\ \end{array}$$

so

$$\max ({F}_{1}({x}_{1}) + {F}_{2}({x}_{2}) - 1,0) \leq F({x}_{1},{x}_{2}) \leq \min ({F}_{1}({x}_{1}),{F}_{2}({x}_{2})).$$
(9.15)

If \(({X}_{1},{X}_{2}) = ({F}_{1}^{-1}(U),{F}_{2}^{-1}(U))\), then statement (i) of Proposition 6.1 implies that

$$\begin{array}{rcl} F({x}_{1},{x}_{2})& =& \mathrm{P}({F}_{1}^{-1}(U) \leq {x}_{ 1},{F}_{2}^{-1}(U) \leq {x}_{ 2}) \\ & =& \mathrm{P}(U \leq {F}_{1}({x}_{1}),U \leq {F}_{2}({x}_{2})) \\ & =& \min ({F}_{1}({x}_{1}),{F}_{2}({x}_{2})), \\ \end{array}$$

so the upper bound is attained. In this case, X 1 and X 2 are said to be comonotonic. If \(({X}_{1},{X}_{2}) = ({F}_{1}^{-1}(U),{F}_{2}^{-1}(1 - U))\), then statement (i) of Proposition 6.1 implies that

$$\begin{array}{rcl} F({x}_{1},{x}_{2})& =& \mathrm{P}({F}_{1}^{-1}(U) \leq {x}_{ 1},{F}_{2}^{-1}(1 - U) \leq {x}_{ 2}) \\ & =& \mathrm{P}(U \leq {F}_{1}({x}_{1}),1 - U \leq {F}_{2}({x}_{2})) \\ & =& \max ({F}_{1}({x}_{1}) + {F}_{2}({x}_{2}) - 1,0), \\ \end{array}$$

so the lower bound is also attained. In this case, X 1 and X 2 are said to be countermonotonic.

Proposition 9.9.

Let F 1 and F 2 be distribution functions for random variables with nonzero finite variances. The set of linear correlation coefficients ρ(F) for the set of bivariate distribution functions F with marginal distribution functions F 1 and F 2 form a closed interval [ρ min max ] with 0 ∈ (ρ min max ) such that ρ(F) = ρ min if and only if \(F({x}_{1},{x}_{2}) =\max ({F}_{1}({x}_{1}) + {F}_{2}({x}_{2}) - 1,0)\) and ρ = ρ max if and only if F(x 1 ,x 2 ) = min (F 1 (x 1 ),F 2 (x 2 )).

Proof. The existence of attainable minimum and maximum linear correlation values ρmin, ρmax follows immediately from Proposition 9.8 and the bounds in (9.15). Taking F(x 1, x 2) = F 1(x 1)F 2(x 2) shows that 0 ∈ [ρmin, ρmax]. By Proposition 9.8, ρmax = 0 would imply that min(F 1(x 1), F 2(x 2)) = F 1(x 1)F 2(x 2) for all x 1, x 2, which in turn implies that either F 1 or F 2 takes only the values 0 and 1. Such distribution functions correspond to constant random variables for which the variance is zero. We conclude that ρmax > 0. A similar argument shows that ρmin < 0. It remains to show that any value in [ρmin, ρmax] is attainable. For λ ∈ [0, 1] the function

$${F}_{\lambda }({x}_{1},{x}_{2}) = \lambda \max ({F}_{1}({x}_{1}) + {F}_{2}({x}_{2}) - 1,0) + (1 - \lambda )\min ({F}_{1}({x}_{1}),{F}_{2}({x}_{2}))$$

is a distribution function since it is the distribution function of the random vector

$$I({F}_{1}^{-1}(U),{F}_{ 2}^{-1}(1 - U)) + (1 - I)({F}_{ 1}^{-1}(U),{F}_{ 2}^{-1}(U)),$$

where I and U are independent, I takes the value 1 with probability λ and the value 0 otherwise, and U is uniformly distributed on (0, 1). Moreover, F λ(x 1, 1) = F 1(x 1) and F λ(1, x 2) = F 2(x 2). Varying λ ∈ [0, 1] shows that all values in the interval [ρmin, ρmax] are attainable correlation values. □

Example 9.17 (A bad stress test). 

Consider potential aggregate losses X and Y in two lines of business for an insurance company. Suppose that X is { Exp}(α)-distributed and that Y is { Pa}(α)-distributed with an unspecified dependence structure. To perform a stress test, the chief risk officer asks an actuary to assign a high linear correlation to the pair (X, Y ) and study the effect on the quantile values for the sum X + Y. This problem is ill-posed. The correlation coefficient does not exist for α ≤ 2 since

$$\mathrm{E}[{Y }^{2}] {=\lim }_{ x\rightarrow \infty }{\int }_{1}^{x}{y}^{2}\alpha {y}^{-\alpha -1}dy {=\lim }_{ x\rightarrow \infty } \frac{\alpha } {2 - \alpha }({x}^{2-\alpha } - 1)$$

does not exist finitely for α ≤ 2. Moreover, for α > 2 not all correlation values are possible for the pair (X, Y ), and for each attainable correlation value there are infinitely many possible joint distributions for (X, Y ) that may produce very different distributions for X + Y.

To compute the upper bound ρmax of the attainable correlation values, we note that \(Y \stackrel{\mathrm{d}}{ =}{e}^{X}\) and that X and Y are comonotonic if Y = e X. In particular, ρmax = Cor(X, e X). The means and variances of the { Exp}(α) and { Pa}(α) distributions are given by

$$\mathrm{E}[0, 1] = \frac{1} {\alpha },\quad \mathrm{Var}(X) = \frac{1} {{\alpha }^{2}},\quad \mathrm{E}[{e}^{X}] = \frac{\alpha } {\alpha - 1},\quad \mathrm{Var}({e}^{X}) = \frac{\alpha } {{(\alpha - 1)}^{2}(\alpha - 2)},$$
Fig. 9.12
figure 12

The upper two plots show histograms of the distribution of X + Y, based on samples of size 106 and with the values corresponding to very high quantiles omitted, where X is { Exp}(α)-distributed and Y is { Pa}(α)-distributed with α = 2. 1. The left histogram corresponds to Y = e X, and the right histogram corresponds to X and Y independent. Lower right plot: empirical quantiles of X + e X divided by empirical quantiles of X + Y with X and Y independent. Lower right plot: ρmax as a function of α

and integration by parts can be used to compute the covariance

$$\mathrm{Cov}(X,{e}^{X}) = \mathrm{E}[X{e}^{X}]-\mathrm{E}[0, 1]\mathrm{E}[{e}^{X}] ={ \int }_{0}^{\infty }x\alpha {e}^{(1-\alpha )x}dx- \frac{1} {\alpha - 1} = \frac{1} {{(\alpha - 1)}^{2}}.$$

We find that

$${\rho }_{\max } = \frac{\mathrm{Cov}(X,{e}^{X})} {\mathrm{Var}{(X)}^{1/2}\mathrm{Var}{({e}^{X})}^{1/2}} = \frac{{({\alpha }^{2} - 2\alpha )}^{1/2}} {\alpha - 1}.$$

The lower right plot in Fig. 9.12 shows ρmax as a function of α. For instance, α = 2. 1 gives ρmax ≈ 0. 4, which may indicate weak dependence, although it corresponds to comonotonicity. The histograms in Fig. 9.12 show the distribution of X + Y in the case of comonotonicity (left plot) and independence (right plot) for X and Y for α = 2. 1.

Example 9.18 (Correlation and causality). 

If we analyze quarterly data of changes in the 3-month, zero-coupon bond rate for government bonds and quarterly data of log returns of the country’s stock market index, then it is likely that a bivariate autoregressive model of order 1, AR(1), gives a rather good fit. With X t 1 and X t 2 denoting the change in the 3-month rate and the index log return, respectively, from quarter t − 1 to t, consider the model

$$\left (\begin{array}{c} {X}_{t}^{1} \\ {X}_{t}^{2} \end{array} \right ) = \left (\begin{array}{rr} 0.45&0.02\\ - 9.2 &0.35 \end{array} \right )\left (\begin{array}{c} {X}_{t-1}^{1} \\ {X}_{t-1}^{2} \end{array} \right )+\left (\begin{array}{c} {Z}_{t}^{1} \\ {Z}_{t}^{2} \end{array} \right ),$$

or in matrix form \({\mathbf{X}}_{t} = \mathbf{A}{\mathbf{X}}_{t-1} +{ \mathbf{Z}}_{t}\), where the Z k are independent and identically distributed and

$$\mathrm{Cov}({\mathbf{Z}}_{t}) = \left (\begin{array}{lr} 2 \cdot 1{0}^{-5} & 0 \\ 0 &1{0}^{-2} \end{array} \right ).$$

We find that

$${ \mathbf{X}}_{t} = \mathbf{A}{\mathbf{X}}_{t-1} +{ \mathbf{Z}}_{t} = \mathbf{A}(\mathbf{A}{\mathbf{X}}_{t-2} +{ \mathbf{Z}}_{t-1}) +{ \mathbf{Z}}_{t} = \cdots =\sum\limits_{k=0}^{\infty }{\mathbf{A}}^{k}{\mathbf{Z}}_{ t-k}.$$

In particular,

$$\mathrm{Cov}({\mathbf{X}}_{t}) =\sum\limits_{k=0}^{\infty }{\mathbf{A}}^{k}\mathrm{Cov}({\mathbf{Z}}_{ t}){({\mathbf{A}}^{k})}^{\mathrm{T}} \approx \left (\begin{array}{rr} 3.18 \cdot 1{0}^{-5} & - 2.82 \cdot 1{0}^{-5} \\ - 2.82 \cdot 1{0}^{-5} & 1.47 \cdot 1{0}^{-2} \end{array} \right ),$$

which corresponds to a linear correlation coefficient Cor(X t 1, X t 2) ≈ − 0. 04. However,

$$\begin{array}{rcl} \mathrm{Cov}({X}_{t}^{2},{X}_{ t-1}^{1})& =& \mathrm{Cov}(-9.2{X}_{ t-1}^{1} + 0.35{X}_{ t-1}^{2} + {Z}_{ t}^{2},{X}_{ t-1}^{1}) \\ & =& -9.2\mathrm{Var}({X}_{t}^{1}) + 0.35\mathrm{Cov}({X}_{ t}^{1},{X}_{ t}^{2}), \\ \end{array}$$

which gives

$$\mathrm{Cor}({X}_{t}^{2},{X}_{ t-1}^{1}) = -9.2{\left (\frac{\mathrm{Var}({X}_{t}^{1})} {\mathrm{Var}({X}_{t}^{2})}\right )}^{1/2} + 0.35\mathrm{Cor}({X}_{ t}^{1},{X}_{ t}^{2}) \approx -0.44$$

reflecting the fact that the stock market typically reacts negatively to increasing interest rates (the present value of future dividends decreases) and positively to decreasing interest rates. Similarly, Cor(X t 1, X t − 1 2) ≈ 0. 41, which may reflect the fact that central banks raise interest rates to cool down an overheated economy and lower interest rates to boost a struggling economy. The main point is that the linear correlation coefficient Cor(X t 1, X t 2) ≈ − 0. 04 that could be estimated on the pairs of interest rate changes and index log returns says very little about the dependencies between interest rate changes and index log returns. Here we have two rather strong causal dependencies that essentially net out when only considering the dependence among the components of the random vector (X t 1, X t 2).

Example 9.19 (Asymptotic dependence). 

We know from Proposition 9.5 that the components of a bivariate standard normally distributed vector (X 1, X 2) with linear correlation ρ < 1 are asymptotically independent in the sense that \(\lim \limits_{x\rightarrow -\infty }\mathrm{P}({X}_{2} \leq x\mid {X}_{1} \leq x) = 0\). In this case, an extreme value for one component is not likely to make the other component take an extreme value. Combining Proposition 9.6 and Example 8.2 implies that the components of a bivariate standard Student’s t ν-distributed vector (Y 1, Y 2) with linear correlation ρ ∈ (0, 1) are asymptotically dependent in the sense that \(\lim \limits_{x\rightarrow -\infty }\mathrm{P}({Y }_{2} \leq x\mid {Y }_{1} \leq x) = \lambda > 0\). In this case, an extreme value for one component makes it likely that the other component will take an extreme value.

Fig. 9.13
figure 13

q–q plots of simulated samples of size 2, ​000 against a normal distribution with zero mean and variance 55. The first distribution (left) is the sum of the components of a ten-dimensional standard normal distributed vector with pairwise linear correlation 0. 5. The second distribution (right) is the sum of the components on a ten-dimensional random vector with standard normal univariate marginal distributions and the dependence structure of a ten-dimensional Student’s t distribution with one degree of freedom and pairwise linear correlation 0. 5

Consider the random vector (U 1, U 2) = (Φ(X 1), Φ(X 2)), whose distribution function is called a Gaussian copula, and the random vector (V 1, V 2) = (t ν(Y 1), t ν(Y 2)), whose distribution function is called a t ν copula. If G is a distribution function and p ∈ (0, 1) is small, then the probability that both components of the vector \(({Z}_{1},{Z}_{2}) = ({G}^{-1}({V }_{1}),{G}^{-1}({V }_{2}))\) take values smaller than G − 1(p) is approximately

$$\mathrm{P}({Z}_{1} \leq {G}^{-1}(p),{Z}_{ 2} \leq {G}^{-1}(p)) = \mathrm{P}({V }_{ 1} \leq p)\mathrm{P}({V }_{2} \leq p\mid {V }_{1} \leq p) \approx \lambda p,$$

whereas the corresponding probability of joint extremes for the vector \(({W}_{1},{W}_{2}) = ({G}^{-1}({U}_{1}),{G}^{-1}({U}_{2}))\) is of the order p 2. As a consequence, the left tail of Z 1 + Z 2 will be heavier than that of W 1 + W 2. The influence of the (lack of) asymptotic dependence of the (Gaussian) t ν copula of a random vector on the tail behavior of the sum of its components is valid for vectors of arbitrary dimension. Figure 9.13 illustrates this effect graphically in terms of q–q plots for 10-dimensional random vectors Z and W, where G = Φ is the standard normal distribution function and the underlying multivariate standard normally distributed X and Student’s t 1-distributed Y both have pairwise linear correlation parameter ρ = 0. 5. With R denoting the linear correlation matrix with off-diagonal entries 0. 5,

$${Z}_{1} + \cdots + {Z}_{10}\stackrel{\mathrm{d}}{ =}{({\mathbf{1}}^{\mathrm{T}}R\mathbf{1})}^{1/2}{Z}_{ 1}$$

is N(0, 55)-distributed.

Example 9.20 (Default risk). 

Consider a portfolio of corporate loans of a retail bank. Suppose there are n loans and let, for \(k = 1,\ldots, n\), X k be an indicator that takes the value 1 if the kth obligor has defaulted on its loan at the end of the year, and 0 otherwise. Suppose also that the default probabilities \({p}_{k} = \mathrm{P}({X}_{k} = 1)\) can be accurately estimated and may be considered as known. A common estimation approach is to divide the obligors into m homogeneous groups so that all obligors belonging to the same group have the same default probability. The estimates of default probabilities can then be based on the relative frequencies of defaults over the years for the different groups.

The random variable \(N = {X}_{1} + \cdots + {X}_{n}\), representing the total number of defaults within the current year, is likely to be of interest to the bank. However, the default probabilities only determine the marginal distributions and not the full multivariate distribution of the random vector \(({X}_{1},\ldots, {X}_{n})\). To specify a multivariate model for the default indicators, it is common to consider a vector \(({Y }_{1},\ldots, {Y }_{n})\) of so-called latent variables. The latent variable Y k may represent the difference between the values of the assets and liabilities of the kth obligor at the end of the year, and a threshold d k is determined so that Y k d k corresponds to default for obligor k. We may now express the probability that the first k among the n loans default as, assuming that the Y k have continuous distribution functions,

$$\begin{array}{rcl}{ p}_{1\ldots k}& =& \mathrm{P}({Y }_{1} \leq {d}_{1},\ldots, {Y }_{k} \leq {d}_{k}) \\ & =& C(\mathrm{P}({Y }_{1} \leq {d}_{1}),\ldots, \mathrm{P}({Y }_{k} \leq {d}_{k}),1,\ldots, 1) \\ & =& C({p}_{1},\ldots, {p}_{k},1,\ldots, 1), \\ \end{array}$$

where C denotes the copula of \(({Y }_{1},\ldots, {Y }_{n})\). Joint default probabilities of this type will depend heavily on the choice of copula C. To illustrate this point, we consider a numerical example.

Consider a loan portfolio with n = 1, ​000 obligors, and suppose that the default probability of each obligor is equal to p = 0. 05, i.e., p k = 0. 05 for each k. We consider four different copula models for the latent variable vector: (a) C is a Gaussian copula with pairwise correlation parameter ρ = 0, (b) C is a Gaussian copula with pairwise correlation parameter ρ = 0. 1, (c) C is a Student’s t 3 copula with pairwise correlation parameter ρ = 0, and (d) C is a Student’s t 3 copula with pairwise correlation parameter ρ = 0. 1.

For each model we generate a sample of size 105 from the resulting model for N, the total number of defaults, and illustrate the distribution of N in terms of the histograms shown in Fig. 9.14. The histograms show clearly that zero correlation for the underlying Student’s t distribution is far from independence. For the Gaussian copula, zero correlation is equivalent to independence. The histograms also show the impact on the distribution of N of the small change in the correlation parameter ρ from 0 to 0. 1.

Fig. 9.14
figure 14

The distribution of the number of defaults is illustrated in histograms based on samples of size 105 for the sum of 103 default indicators. The histograms correspond to the following latent variable models: Gaussian with ρ = 0 (upper left), Gaussian with ρ = 0. 1 (upper right), Student’s t 3 with ρ = 0 (middle and lower left), Student’s t 3 with ρ = 0. 1 (middle and lower right)

5 Models for Large Portfolios

In this section we investigate models for the aggregated loss \({S}_{n} = {X}_{1} + \cdots + {X}_{n}\) for a large homogeneous portfolio over a specified time period. Here X k represent the loss from an investment in the kth asset. As an example we consider the aggregate loss of a bank’s portfolio of loans to small and medium size firms due to failure of borrowers to honor their contracted obligations to the lender (the bank). The number of assets, n, is thought of as very large, and we do not have enough information to accurately specify an n-dimensional distribution for \(({X}_{1},\ldots, {X}_{n})\). We will present a cruder approach based on conditional independence.

In many cases, it is not reasonable to assume that the X k are independent because the losses may depend on the state of the economy. However, it may be reasonable to assume that the X k are conditionally independent, given the values of a set of economic indicators (e.g., current and future values of interest rates for different maturities, capacity utilization in the industry, GDP growth). Let the components of random vector Z represent the future values of the economic indicators, and let \({f}_{n}(\mathbf{Z}) = \mathrm{E}[{S}_{n}/n\mid \mathbf{Z}]\) be the expected average loss conditional on the economic indicators. When n is large, it seems plausible that the diversification effect causes the idiosyncratic risks to be small and the main risk drivers are captured by vector Z. This motivates the approximation S n nf n (Z). A mathematical motivation for the approximation S n nf n (Z) is given in the following result.

Proposition 9.10.

Let \({X}_{1},\ldots, {X}_{n}\) be random variables that are conditionally independent given random vector  Z. Write \({S}_{n} = {X}_{1} + \cdots + {X}_{n}\) and \({f}_{n}(\mathbf{Z}) = \mathrm{E}[{S}_{n}/n\mid \mathbf{Z}]\). Then

$$\mathrm{P}(\vert {S}_{n}/n - {f}_{n}(\mathbf{Z})\vert > \epsilon ) \leq \frac{\sum\limits_{k=1}^{n}\mathrm{E}[\mathrm{Var}({X}_{k}\mid \mathbf{Z})]} {{(n\epsilon )}^{2}}, \quad \epsilon > 0.$$

If, in addition, the Xk are identically distributed, then f = fn does not depend on n and

$$\mathrm{P}(\vert {S}_{n}/n - f(\mathbf{Z})\vert > \epsilon ) \leq \frac{\mathrm{E}[{X}_{1}^{2}] -\mathrm{E}[f{(\mathbf{Z})}^{2}]} {n{\epsilon }^{2}}, \quad \epsilon > 0.$$

If, further, the Xk take values in {0,1}, then

$$\mathrm{P}(\vert {S}_{n}/n - {f}_{n}(\mathbf{Z})\vert > \epsilon ) \leq \frac{\mathrm{E}[f(\mathbf{Z})] -\mathrm{E}[f{(\mathbf{Z})}^{2}]} {n{\epsilon }^{2}}.$$

Proof. An application of Chebyshev’s inequality gives

$$\begin{array}{rcl} \mathrm{P}(\vert {S}_{n}/n - {f}_{n}(\mathbf{Z})\vert > \epsilon )& =& \mathrm{E}[\mathrm{P}(\vert {S}_{n} - n{f}_{n}(\mathbf{Z})\vert > n\epsilon \mid \mathbf{Z})] \\ & \leq & \frac{\mathrm{E}[\mathrm{Var}({S}_{n}\mid \mathbf{Z})]} {{n}^{2}{\epsilon }^{2}}.\end{array}$$

Because the X k are conditionally independent given Z, it follows that

$$\mathrm{E}[\mathrm{Var}({S}_{n}\mid \mathbf{Z})] =\sum\limits_{k=1}^{n}\mathrm{E}[\mathrm{Var}({X}_{ k}\mid \mathbf{Z})],$$

which proves the first claim. The second claim follows from the first claim because

$$\mathrm{E}[\mathrm{Var}({X}_{k}\mid \mathbf{Z})]=\mathrm{E}[\mathrm{Var}({X}_{1}\mid \mathbf{Z})]=\mathrm{E}[\mathrm{E}[{X}_{1}^{2}\mid \mathbf{Z}]-{(\mathrm{E}[{X}_{ 1}\mid \mathbf{Z}])}^{2}]=\mathrm{E}[{X}_{ 1}^{2}]-\mathrm{E}[f{(\mathbf{Z})}^{2}].$$

Moreover, if X 1 takes a value in {0, 1}, then \(\mathrm{E}[{X}_{1}^{2}\mid \mathbf{Z}] = \mathrm{E}[{X}_{1}\mid \mathbf{Z}] = f(\mathbf{Z})\). This completes the proof. □

Proposition 9.10 not only motivates the approximation S n nf n (Z); it also provides an upper bound for tail probabilities for the aggregated loss S n . For instance, combining Proposition 9.10 and the inequality

$$\begin{array}{rcl} \mathrm{P}({S}_{n} > s)& =& \mathrm{P}({S}_{n} > s,\vert {S}_{n} - n{f}_{n}(\mathbf{Z})\vert \leq \epsilon n) + \mathrm{P}({S}_{n} > s,\vert {S}_{n} - n{f}_{n}(\mathbf{Z})\vert > \epsilon n) \\ & \leq & \mathrm{P}(n{f}_{n}(\mathbf{Z}) > s - \epsilon n) + \mathrm{P}(\vert {S}_{n} - n{f}_{n}(\mathbf{Z})\vert > \epsilon n),\quad \epsilon > 0, \\ \end{array}$$

gives an upper bound for P(S n > s). The upper bound for the tail probability gives an upper bound for the quantile. If the X k are identically distributed and conditionally independent given Z, then, with \(C = \mathrm{E}[{X}_{1}^{2}] -\mathrm{E}[f{(\mathbf{Z})}^{2}]\),

$$\begin{array}{rcl}{ F}_{{S}_{n}}^{-1}(q)& =& \min \{s : {F}_{{ S}_{n}}(s) \geq q\} \\ & =& \min \{s : \mathrm{P}({S}_{n} > s) \leq 1 - q\} \\ & \leq & \min \{s : \mathrm{P}(nf(\mathbf{Z}) > s - \epsilon n) + C/({\epsilon }^{2}n) \leq 1 - q\} \\ & =& n(\epsilon + {F}_{f(\mathbf{Z})}^{-1}(q + C/({\epsilon }^{2}n))),\quad \epsilon > 0.\end{array}$$

In particular,

$${F}_{{S}_{n}}^{-1}(q) \leq {n\min }_{ \epsilon >0}\left (\epsilon + {F}_{f(\mathbf{Z})}^{-1}(q + C/({\epsilon }^{2}n))\right ),\quad C = \mathrm{E}[{X}_{ 1}^{2}] -\mathrm{E}[f{(\mathbf{Z})}^{2}].$$
(9.16)

The upper bound for quantile (9.16) can be used to derive upper bounds for risk measures such as VaR and ES.

Example 9.21 (A large homogeneous loan portfolio). 

Consider a large portfolio of loans to small and medium size firms and suppose that we want to analyze the distribution of aggregate losses from now until 1 year from now due to defaults. Write X k for the loss on the kth loan and \({S}_{n} = {X}_{1} + \cdots + {X}_{n}\) for the aggregated loss. In this case, X k can be written as X k = L k I k , where I k is the default indicator that takes the value 1 if the kth obligor defaults and 0 otherwise, and L k is the amount of money lost if the kth obligor defaults. If the default probabilities P(I k = 1) are of similar size and the loss given default variables L k are statistically similar, then the loan portfolio can be considered homogeneous.

A particularly nice situation is where the L k are identical and deterministic, L k = l, for each \(k = 1,\ldots, n\). In this case

$${f}_{n}(\mathbf{Z}) = \mathrm{E}\left [\frac{{S}_{n}} {n} \mid \mathbf{Z}\right ] = \mathrm{E}\left [ \frac{1} {n}\sum\limits_{k=1}^{n}{L}_{ k}{I}_{k}\mid \mathbf{Z}\right ] = l\mathrm{E}\left [\frac{{N}_{n}} {n} \mid \mathbf{Z}\right ],$$

where \({N}_{n} = {I}_{1} + \cdots + {I}_{n}\) is the number of defaults. The fraction of defaults, given the economic indicators, is written as \({p}_{n}(\mathbf{Z}) = \mathrm{E}[{N}_{n}/n\mid \mathbf{Z}]\). That is, f n (Z) = lp n (Z), and the aggregated loss can be approximated by S n nlp n (Z). If, in addition, the default indicators are identically distributed, then p n (Z) = p(Z) does not depend on n, and the last statement of Proposition 9.10 leads to

$$\mathrm{P}(\vert {N}_{n}/n - p(\mathbf{Z})\vert > \epsilon ) \leq \frac{\mathrm{E}[p(\mathbf{Z})(1 - p(\mathbf{Z}))]} {n{\epsilon }^{2}}.$$

5.1 Beta Mixture Model

In this section, we will illustrate the modeling approach presented in the previous example for a specific choice of model for N = N n defaults and p(Z) fraction of defaults. Write \(N = {I}_{1} + \cdots + {I}_{n}\), where the I k are identically distributed, independent, and Bernoulli distributed with parameter Z conditional on the random variable Z = f(Z), which we take to be Beta(a, b)-distributed. We do not give any economic interpretation of the Beta(a, b)-distributed Z and choose this model only because it is a particularly simple model to work with in terms of both analytical and numerical computations.

The assumption that Z is Beta(a, b)-distributed implies that Z has the density function

$$g(z) = \frac{1} {\beta (a,b)}{z}^{a-1}{(1 - z)}^{b-1},\quad a,b > 0,z \in (0,1),$$

where β(a, b) can be expressed in terms of the Gamma function as

$$\beta (a,b) ={ \int }_{0}^{1}{z}^{a-1}{(1 - z)}^{b-1}\mathrm{d}z = \frac{\Gamma (a)\Gamma (b)} {\Gamma (a + b)}.$$

Using the property \(\Gamma (z + 1) = z\Gamma (z)\) of the Gamma function we find that

$$\begin{array}{rcl} \mathrm{E}[0, 1]& =& \frac{1} {\beta (a,b)}{\int }_{0}^{1}{z}^{a}{(1 - z)}^{b-1}\mathrm{d}z = \frac{\beta (a + 1,b)} {\beta (a,b)} = \frac{a} {a + b}, \\ \mathrm{E}[{Z}^{2}]& =& \frac{\beta (a + 2,b)} {\beta (a,b)} = \frac{a(a + 1)} {(a + b)(a + b + 1)}.\end{array}$$

Conditional on Z, the number of defaults N has a Bin(n, Z) distribution, and therefore the distribution of N is given by

$$\begin{array}{rcl} \mathrm{P}(N = k)& =& \left({n}\atop{k}\right){\int }_{0}^{1}{z}^{k}{(1 - z)}^{n-k}g(z)\mathrm{d}z \\ & =& \left({n}\atop{k}\right) \frac{1} {\beta (a,b)}{\int }_{0}^{1}{z}^{a+k-1}{(1 - z)}^{n-k+b-1}\mathrm{d}z \\ & =& \left({n}\atop{k}\right)\frac{\beta (a + k,b + n - k)} {\beta (a,b)}, \\ \end{array}$$

which is called the beta-binomial distribution. The distribution function of the beta-binomial distribution is illustrated in Fig. 9.15. The expected number of defaults is easily computed:

$$\mathrm{E}[0, 1] = \mathrm{E}[\mathrm{E}[N\mid Z]] = \mathrm{E}[0, 1] = n \frac{a} {a + b}.$$

In addition, the individual default probability is \(\mathrm{P}({I}_{1} = 1) = \mathrm{E}[\mathrm{E}[{I}_{1}\mid Z]] = \mathrm{E}[0, 1]\), the pairwise default probability is \(\mathrm{P}({I}_{1} = {I}_{2} = 1) = \mathrm{E}[{Z}^{2}]\), and the default correlation is

$$\mathrm{Cor}({I}_{1},{I}_{2}) = \frac{\mathrm{E}[{I}_{1}{I}_{2}] -\mathrm{E}{[{I}_{1}]}^{2}} {\mathrm{E}[{I}_{1}^{2}] -\mathrm{E}{[{I}_{1}]}^{2}} = \frac{\mathrm{E}[{Z}^{2}] -\mathrm{E}{[0, 1]}^{2}} {\mathrm{E}[0, 1] -\mathrm{E}{[0, 1]}^{2}} = \frac{1} {a + b + 1}.$$

To analyze the model, we fix the common individual default probability at \(p = \mathrm{P}({I}_{1} = 1)\). This implies that we allow only parameter pairs (a, b) for which \(p = a/(a + b)\), i.e., pairs (a, b) satisfying

$$(a,b) = \frac{1 - c} {c} (p,1 - p),\quad c \in (0,1),$$

where c are possible values for the default correlation Cor(I 1, I 2). We can now study the beta-binomial distribution and compare the quantile F N − 1(q) with its estimate nF Z − 1(q). We find that for q ∈ [0. 9, 0. 99], the values of \({F}_{N}^{-1}(q)/{F}_{nZ}^{-1}(q)\) are in the intervals

$${ F}_{N}^{-1}(q)/{F}_{ nZ}^{-1}(q) \in \left \{\begin{array}{ll} (1.006675,1.015625)&{ for }c = 0.001, \\ (1.001286,1.003138)&{ for }c = 0.01, \\ (1.000199,1.000966)&{ for }c = 0.05, \\ (1.000023,1.000610)&{ for }c = 0.1. \end{array} \right.$$

In particular, the approximation NnZ is very accurate. We also find that a small change in the common default correlation coefficient between the I k has a huge effect on the distribution of \(N = {I}_{1} + \cdots + {I}_{n}\). This is seen in Fig. 9.15, which shows distribution functions and quantile functions for beta-binomial models with p = 0. 05, n = 104, and different correlation coefficients. Figure 9.15 illustrates clearly that only specifying the individual default probability p says very little about the distribution of N. Every choice of \((a,b) = ((1 - c)/c)(p,1 - p)\), c > 0, gives default probability p. Let Z c, p be Beta-distributed with the parameters a, b above. Then, for every ε > 0,

$$\mathrm{P}(\vert {Z}_{c,p} - p\vert > \epsilon ) \leq \frac{\mathrm{Var}({Z}_{c,p})} {{\epsilon }^{2}} = \frac{p(1 - p)c} {{\epsilon }^{2}}.$$

In particular, if N c, p is beta-binomially distributed with mixture variable Z c, p , then

$$\begin{array}{rcl} \mathrm{P}({N}_{c,p} = k)& =& \mathrm{E}\left [\left({n}\atop{k}\right){Z}_{c,p}^{k}{(1 - {Z}_{ c,p})}^{n-k}\right ] \\ & =& \mathrm{E}\left [\left({n}\atop{k}\right){Z}_{c,p}^{k}{(1 - {Z}_{ c,p})}^{n-k};\vert {Z}_{ c,p} - p\vert \leq {c}^{1/3}\right ] \\ & & +\mathrm{E}\left [\left({n}\atop{k}\right){Z}_{c,p}^{k}{(1 - {Z}_{ c,p})}^{n-k};\vert {Z}_{ c,p} - p\vert > {c}^{1/3}\right ] \\ & \leq & {\max }_{\vert t\vert \leq {c}^{1/3}}\left({n}\atop{k}\right){(p + t)}^{k}{(1 - (p + t))}^{n-k} + p(1 - p){c}^{1/3} \\ & \rightarrow & \left({n}\atop{k}\right){p}^{k}{(1 - p)}^{n-k}\quad { as }c \rightarrow 0. \end{array}$$

The lower bound is constructed similarly. We conclude that N c, p converges in distribution to Bin(n, p) as c → 0. This is also seen in Fig. 9.15.

Fig. 9.15
figure 15

Distribution functions (left) and quantile functions (right) for beta-binomial distributions with n = 104, p = 0. 05, and \((a,b) = ((1 - c)/c)(p,1 - p)\) for c = 0, 0. 001, 0. 01, 0. 05, 0. 1 (c = 0 gives the Bin(n, p) distribution)

6 Notes and Comments

Much more material on elliptical distributions can be found in the book [16] by Kai-Tai Fang, Samuel Kotz, and Kai Wang Ng.

For further material on multivariate elliptical and copula-based models, dependence concepts, and applications in financial risk management we refer the reader to the book [31] by Alexander McNeil, Rüdiger Frey, and Paul Embrechts. Much material on models and methods for portfolio credit risk, which we have only touched upon here, can be found in [31]. Moreover, techniques for parameter estimation for copula models, a topic we have not considered at all, are presented and illustrated in [31].

A statement equivalent to Proposition 9.6 appears in the book Chap. [12] by Paul Embrechts, Alexander McNeil and Daniel Straumann It can be proved by considering the conditional density of one component of a bivariate Student’s t-distributed vector given a value of its other component. However, the asymptotic dependence (or tail dependence) property of the Student’s t distribution is a consequence of a more general fact that says that pairs of components of an elliptically distributed random vector are asymptotically dependent if the distribution functions of its components are regularly varying. A proof of this more general fact, which also applies to Proposition 9.6, can be found in the article [22] by Henrik Hult and Filip Lindskog. The statement in Proposition 9.7 appears in the book Chap. [25] by Filip Lindskog, Alexander McNeil, and Uwe Schmock. and in the article [15] by Hong-Bin Fang, Kai-Tai Fang, and Samuel Kotz.

The reader seeking more information about copulas in general is encouraged to consult the books [23] by Harry Joe and [35] by Roger Nelson.

7 Exercises

In the exercises below, it is assumed, whenever applicable, that you can take positions corresponding to fractions of assets.

Exercise 9.1 (Risk minimization). 

Consider the value L of a liability and values \({X}_{1},\ldots, {X}_{d}\) of assets at time T > 0 that may be used to hedge the liability. Suppose that L and the X k have finite variances, and let ρ be a translation-invariant and positively homogeneous risk measure.

  1. (a)

    Show that if \(({X}_{1},\ldots, {X}_{d},L)\) has an elliptical distribution, then the portfolio weights \({h}_{0},{h}_{1},\ldots, {h}_{d}\) minimizing

    $$\mathrm{E}[{({h}_{0} + {h}_{1}{X}_{1} + \cdots + {h}_{d}{X}_{d} - L)}^{2}],$$

    i.e., the optimal quadratic hedge, minimize \(\rho ({h}_{0} + {h}_{1}{X}_{1} + \cdots + {h}_{d}{X}_{d} - L)\).

  2. (b)

    Show, by an explicit example, that the conclusion in (a) does not hold in general when \(({X}_{1},\ldots, {X}_{d},L)\) does not have an elliptical distribution.

Exercise 9.2 (Allocation invariance). 

Let \(\mathbf{X} = {({X}_{1},\ldots, {X}_{d})}^{\mathrm{T}}\) and \(\mathbf{Y} = {({Y }_{1},\ldots, {Y }_{d})}^{\mathrm{T}}\) be random vectors having normal variance mixture distributions with identical dispersion matrices and identical location vectors R 0 1, where R 0 is the return on a risk-free asset. Vectors X and Y represent returns on 2d risky assets. Let V X (w) and V Y (w) denote the values at the end of the investment horizon for an investment of the capital V 0 in positions in the risk-free asset and in the assets with return vectors X and Y, respectively, where w is a vector of monetary portfolio weights corresponding to the positions in the risky assets.

  1. (a)

    Show that if ρ is a translation-invariant and positively homogeneous risk measure, then

    $$\frac{\rho ({V }_{\mathbf{X}}(\mathbf{w}) - {V }_{0}{R}_{0})} {\rho ({V }_{\mathbf{Y}}(\mathbf{w}) - {V }_{0}{R}_{0})}$$
    (9.17)

    does not depend on the allocation of the initial capital or on the common dispersion matrix of the return vectors.

  2. (b)

    Suppose that X has a Student’s t distribution with four degrees of freedom, that Y has a normal distribution, and that ρ = VaR p , and compute the expression in (9.17) as a function of p for p ≤ 0. 05.

Exercise 9.3 (Asymptotic dependence). 

Consider a random vector (X 1, X 2) whose components are equally distributed and use Propositions 9.5 and 9.6 to compute lim x P(X 2 > xX 1 > x) in the following two cases:

  1. (a)

    X 1 and X 2 are Student’s t-distributed with four degrees of freedom, and (X 1, X 2) has a Gaussian copula with linear correlation parameter 0. 5.

  2. (b)

    X 1 and X 2 are Student’s t-distributed with four degrees of freedom, and (X 1, X 2) has a Student’s t copula with linear correlation parameter 0. 5 and degrees of freedom parameter 6.

Exercise 9.4 (Comonotonic additive risk). 

Show that if X 1 and X 2 are comonotone random variables, then \({\mathrm{VaR}}_{p}({X}_{1} + {X}_{2}) ={ \mathrm{VaR}}_{p}({X}_{1}) +{ \mathrm{VaR}}_{p}({X}_{2})\) and \({\rho }_{\phi }({X}_{1} + {X}_{2}) = {\rho }_{\phi }({X}_{1}) + {\rho }_{\phi }({X}_{2})\) for any spectral risk measure ρϕ defined in (6.18).

Exercise 9.5 (Kendall’s tau). 

Let Ψ be the Laplace transform of a strictly positive random variable, and consider the random pair (U 1, U 2) whose distribution function is the copula \(C({u}_{1},{u}_{2}) = \Psi ({\Psi }^{-1}({u}_{1}) + {\Psi }^{-1}({u}_{2}))\).

  1. (a)

    Show that \(\tau ({U}_{1},{U}_{2}) = 4\mathrm{E}[C({U}_{1},{U}_{2})] - 1\).

  2. (b)

    It can be shown that \(\mathrm{P}(C({U}_{1},{U}_{2}) \leq v) = v - {\Psi }^{-1}(v)/({\Psi }^{-1})^{\prime}(v)\) for v in (0, 1). Use this relation to show that

    $$\tau ({U}_{1},{U}_{2}) = 1 + 4{\int }_{0}^{1} \frac{{\Psi }^{-1}(v)} {({\Psi }^{-1})^{\prime}(v)}dv.$$
  3. (c)

    Compute τ(U 1, U 2) when C = C θ { Cl} is a Clayton copula.

Exercise 9.6 (Credit rating migration). 

Consider the two corporate bonds in Exercise 4.6. Let the credit ratings be numbered from 1 to 4 and correspond to the ratings Excellent, Good, Poor, and Default in Exercise 4.6. Let (X 1, X 2) denote the pair of credit ratings of the two issuers after 1 year with the distribution given in Table 4.1.

  1. (a)

    Find a copula C such that P(X 1x 1, X 2x 2) = C(P(X 1x 1), P(X 2x 2)) for all (x 1, x 2).

  2. (b)

    The copula C of (X 1, X 2) in (a) can be well approximated by a Gaussian copula. Investigate numerically what value of the correlation parameter in the Gaussian copula gives a good approximation of the copula of (X 1, X 2) in (a).

Exercise 9.7 (Portfolio default risk). 

Consider a latent variable model for a homogeneous portfolio of n risky loans. Let p be the default probability for each loan, let \(Y,{Y }_{1},\ldots, {Y }_{n}\) be independent and standard normally distributed, and let ρ ∈ (0, 1) be a parameter. The default indicators are modeled as

$${ X}_{k} = \left \{\begin{array}{ll} 1&{ if }\sqrt{\rho }Y + \sqrt{1 - \rho }{Y }_{k} \leq {\Phi }^{-1}(p), \\ 0&{ otherwise}, \end{array} \right.$$
(9.18)

where Φ denotes the standard normal distribution function.

  1. (a)

    Determine the random variable Θ = g(Y ) such that the default indicators are conditionally independent and Be(θ)-distributed given Θ = θ.

  2. (b)

    Show that the following formula holds for the q-quantile of Θ:

    $${F}_{\Theta }^{-1}(q) = \Phi \left ({\Phi }^{-1}(q) \frac{\sqrt{\rho }} {\sqrt{1 - \rho }} + {\Phi }^{-1}(p) \frac{1} {\sqrt{1 - \rho }}\right ).$$
  3. (c)

    Consider a loan portfolio of a bank consisting of one thousand loans, each of size one million dollars. Suppose that, for each of the loans, the probability of default within 1 year is 3%, and in case of default the bank makes a loss equal to 25% of the size of the loan. Suppose further that the bank makes a profit of $10,000 per year from interest payments on each loan that does not default and nothing on those that do. The bank decides to set aside an amount of buffer capital that equals its estimate of ES0. 01(S), where S is the profit from interest income minus the loss from defaults over a 1-year period. Estimate the size of the buffer capital under the assumption that the default indicators are given by (9.18) with ρ = 0. 2 and that the bank may invest in a risk-free, 1-year, zero-coupon bond with a zero rate of 3%.

Exercise 9.8 (Potential death spiral). 

Consider a life insurance company with a liability cash flow with long duration. The value of the liability 1 year from now is denoted by L and increases in value when interest rates decline. The premium received for insuring the liability is V 0 = 1. 1E[L]. The insurer invests its capital in a fixed-income portfolio with 1-year return R 1 and in a stock market portfolio with 1-year return R 2. The vector (R 1, R 2, L) is, for simplicity, assumed to have a multivariate Student’s t distribution with four degrees of freedom. Its mean vector and correlation matrix are given by

$$\mathrm{E}\left (\begin{array}{c} {R}_{1} \\ {R}_{2} \\ L\end{array} \right ) = \left (\begin{array}{l} 1.02\\ 1.10 \\ 1.2 \cdot 1{0}^{7} \end{array} \right )\quad { and}\quad \mathrm{Cor}\left (\begin{array}{c} {R}_{1} \\ {R}_{2} \\ L\end{array} \right ) = \left (\begin{array}{lrr} 1 &0.3&0.9\\ 0.3 & 1 &0.2 \\ 0.9&0.2& 1 \end{array} \right ).$$

The standard deviations of R 1, R 2, and L are given by 0. 005, 0. 05, and 1. 2 ⋅105, respectively.

Let w 1, w 2 be the amount invested in the fixed-income portfolio and the stock market portfolio, respectively. The insurer invests the initial capital V 0 in the two portfolios so that the expected value of its asset portfolio has an expected return of 1. 06.

  1. (a)

    Determine w 1 and w 2.

  2. (b)

    Is the insurer solvent in the sense that VaR0. 005(AL) ≤ 0?

  3. (c)

    Suppose there is an instantaneous decline of 15% in the value of the stock market portfolio. Does the insurer remain solvent? If not, determine how the insurer must adjust the asset portfolio weights w 1 and w 2 simply to become solvent in the sense that \({\mathrm{VaR}}_{0.005}(A - L) = 0\).

  4. (d)

    Compute the expected return of the insurer’s adjusted asset portfolio determined in (c).

Comment: A simultaneous decline in the value of stocks and in interest rates is particularly dangerous to an insurer with a liability having a long duration. The reduction in the value of the insurer’s capital forces the insurer to adjust its asset allocation away from stocks to less risky fixed-income instruments to remain solvent. The adjusted allocation has a lower expected return, which makes it difficult for the insurer to make up for the suffered losses. Moreover, insurance companies often have large amounts of capital invested in the stock market, and a forced sale of large positions in stocks and an increase in the demand for safe bonds could reduce both the prices of stocks and the interest rates even more. This phenomenon, sometimes referred to as a death spiral, makes the insurer stuck in a near-insolvent state with an asset portfolio that is unlikely to generate good returns.

Project 1 (Scenario-based risk analysis). 

Consider a stylized model of a life insurer. The insurer faces a liability cash flow of 100 each year for the next 30 years. The current zero rates are given in Table 9.1, from which the current value of the liability can be computed. In the market there is a short supply of bonds with maturities longer than 10 years. Therefore, the insurer has purchased a bond portfolio with payments only within the next 10 years. The bond portfolio has the cash flow given in Table 9.1. The insurer has also invested in a stock portfolio. The initial capital of the insurer is 30% more than the current value of the liability. The insurer invests 70% of the initial capital in the bond portfolio and 30% of the initial capital in the stock portfolio. The objective in this project is to identify the most dangerous extreme scenario.

Table 9.1 Annual cash flow of bond portfolio and current zero rates

Suppose that there are two risk factors in the model, the log return Y 1 of the stock portfolio and the size Y 2 of a parallel shift of the zero-rate curve. The risk factors are assumed to have a bivariate normal distribution, means μ1, μ2, standard deviations σ1, σ2, and linear correlation coefficient ρ given by

$${\mu }_{1} = 0.08,\quad {\mu }_{2} = 0,\quad {\sigma }_{1} = 0.2,\quad {\sigma }_{2} = 0.01,\quad \rho = 0.1.$$

Consider equally likely extreme scenarios in the following sense. The risk factors can be represented via two independent standard normally distributed random variables Z 1 and Z 2 as

$$\begin{array}{rcl}{ Y }_{1}& =& {\mu }_{1} + {\sigma }_{1}{Z}_{1}, \\ {Y }_{2}& =& {\sigma }_{2}\left (\rho {Z}_{1} + \sqrt{1 - {\rho }^{2}}{Z}_{2}\right ).\end{array}$$

All scenarios with \(\sqrt{{Z}_{1 }^{2 } + {Z}_{2 }^{2}} = 3\) can be viewed as equally likely extreme scenarios corresponding to three-standard-deviation movements. The extreme scenarios for Z 1, Z 2 translate into extreme scenarios for the risk factors Y 1, Y 2 by the relation above.

  1. (a)

    Plot the value of the insurer’s portfolio, assets minus liabilities, in 1 year for all the equally likely extreme scenarios.

  2. (b)

    Identify which scenario for Y 1, Y 2 leads to the worst outcome for the value of the insurer’s assets minus that of the liabilities in 1 year.

  3. (c)

    Repeat the analysis outlined above and find the most dangerous scenario when (Y 1, Y 2) has another bivariate elliptical distribution.

Project 2 (Tail dependence in large portfolios). 

Let \({Z}_{1},\ldots, {Z}_{50}\) represent log returns from today until tomorrow for 50 hypothetical financial assets. Suppose that Z k has a Student’s t distribution with three degrees of freedom and standard deviation 0. 01 for each k and that τ(Z j , Z k ) = 0. 4 for jk.

Consider an investment of $20,000 in long positions in each of the assets. Let V 0 and V 1 be the portfolio value today and tomorrow, respectively. Investigate the effect of tail dependence on the distribution of the portfolio value V 1 tomorrow and the distribution of the portfolio log return log(V 1V 0) by simulating from the distribution of V 1. Simulate from the distribution of V 1 under the assumption that

  1. (a)

    \(({Z}_{1},\ldots, {Z}_{50})\) has a Gaussian copula.

  2. (b)

    \(({Z}_{1},\ldots, {Z}_{50})\) has a t 4-copula.

  3. (c)

    \(({Z}_{1},\ldots, {Z}_{50})\) has a Clayton copula.

  4. (d)

    How large a sample size is needed to get stable estimates of VaR0. 01(V 1V 0) and ES0. 01(V 1V 0)? Explain the differences in the estimates of VaR0. 01(V 1V 0) and ES0. 01(V 1V 0) in the three cases (a)–(c).

  5. (e)

    Compare the results in (a)–(d) to the results when $1 million is invested in only one of the assets.

  6. (f)

    Suppose that the Z k are equally distributed and have a left-skewed polynomial normal distribution with zero mean and standard deviation 0. 01. Study and explain the effect of the log-return distribution of the Z k on the distribution of V 1 and the portfolio risk by simulating from the distribution of V 1 under assumptions (a)–(c).