1 Introduction

Credit risk in banking and trading book is one of the largest financial risk exposures for many financial institutions. As such quantifying the risk of a credit portfolio is essential in credit risk management. One of the key challenges in credit risk management is the accurate modelling of dependence between obligors, particularly in the tails. This is attributed to two phenomena. First, many financial institutions’ exposure to credit risk do not just confine to a single obligor, but to a large portfolio of multiple obligors. Second, empirically we have been observing simultaneous defaults in large credit portfolios as financial institutions are affected by common macroeconomic or systemic factors. This suggests that obligors tend to exhibit stronger dependence in a stressed market and hence simultaneous defaults tend to be more likely. For this reason, the model for the dependence structure of the default events has a direct impact on the tail of the loss for a large portfolio. In light of these issues and challenges, the first objective of this paper is to analyze the large credit portfolio loss when the obligors are modeled to be strongly dependent. From the asymptotic analysis, the second objective of the paper is to propose two variance reduction simulation algorithms that provide efficient estimation of the risk of a credit portfolio.

To accomplish the above two objectives, we model the credit risk based on the so-called threshold models which are widely used to capture the event of default for an individual obligor within the portfolio. A default in a threshold model occurs if some critical random variable, usually called a latent variable, exceeds (or falls below) a pre-specified threshold. The dependence among defaults then stems from the dependence among the latent variables. It has been found that the copula representation is a useful tool for studying the dependence structure. Specifically, the copula of the latent variables determines the link between marginal default probabilities for individual obligors and joint default probabilities for groups of obligors. Most threshold models used in the industry are based explicitly or implicitly on the Gaussian copula; see for example, CreditMetrics (Gupton et al., 1997) and Moody’s KMV system (Kealhofer & Bohn, 2001). While the Gaussian copula models can accommodate a wide range of correlation structure, they are inadequate to model extremal dependence between the latent variables as these models are known to exhibit weaker dependence in the tails of the underlying random variables (see for example, Section 7.2.4 of McNeil et al. (2015) for further discussion on tail dependence). This limitation raises considerable concern as we have already noted earlier that when the market condition worsens, simultaneous defaults can occur with nonnegligible probability in large credit portfolios. To better reflect the empirical evidence, copulas that can capture “stronger” tail dependence of obligors such as t-copula and its generalizations have been proposed (see Bassamboo et al., 2008; Chan & Kroese, 2010; Tang et al., 2019). As pointed out in Section 11.1.4 of McNeil et al. (2015), the Archimedean copula is another plausible class of dependence model for obligors. The Archimedean copula offers great flexibility in modeling dependence as it is capable of covering dependence structures ranging from independence to comonotonicity (the perfect dependence). Typical examples of Archimedean copulas include Clayton, Gumbel and Frank copulas. Because of their flexibility in dependence modeling, the Archimedean copulas have been applied in credit risks (Cherubini et al., 2004; Hofert, 2010; Hofert & Scherer, 2011; Naifar, 2011), insurance (Frees & Valdez, 1998; Embrechts et al., 2001; Denuit et al., 2004; Albrecher et al., 2011; Cossette et al., 2018), and many other areas of applications such as Genest and Favre (2007), Zhang and Singh (2007) and Wang (2003). See also Charpentier and Segers (2009), Hofert et al. (2013), Okhrin et al. (2013) and Zhu et al. (2016) for higher dimensional applications of Archimedean copulas and their generalizations (such as the hierarchical Archimedean copulas) in finance and risk management. Motivated by their dependence modeling flexibility and their wide applications, this paper similarly uses the Archimedean copula to model the dependence of obligors in order to account for market phenomenon of simultaneous defaults. Moreover, as to be discussed in Sect. 3.1, when obligors are modeled with Archimedean copulas, the threshold model can similarly be understood as a one-factor Bernoulli mixture model, which leads to great conveniences for asymptotic analysis and simulations of large portfolio losses in the later sections.

In terms of quantifying portfolio credit risk, the most popular measure is to study the probability of large portfolio loss over a fixed time horizon, say, a year (see Glasserman, 2004; Glasserman et al., 2007; Tang et al., 2019). The expected shortfall of large portfolio loss, which has been found to be very useful in the risk management and the pricing of credit instruments, is another important measure of credit risk. A general discussion on quantifying portfolio credit risk can be found in Hong et al. (2014). With the extremal dependence being modelled by the Archimedean copula, there are no analytic expressions for the above two measures. This implies we need to rely on numerical methods to evaluate these measures. While the Monte Carlo (MC) simulation method is a popular alternate numerical tool, naive application of the method to evaluate these measures is very inefficient since the event of defaults of high-quality obligors is rare and the naive MC is notoriously known to be inefficient for rare-event applications. Hence, variance reduction techniques such as that based on the importance sampling (Glasserman & Li, 2005; Bassamboo et al., 2008; Glasserman et al., 2008) and the conditional Monte Carlo method (Chan & Kroese, 2010) have been proposed to increase the efficiency of MC methods for estimating these measures.

We now summarize the key contributions of the paper in studying the large portfolio loss from the following two aspects. By exploiting the threshold model and using an Archimedean copula to capture the dependence among latent variables, the first key contribution is to derive sharp asymptotics for the two performance measures: the probability of large portfolio loss and the expected shortfall. These results quantify the asymptotic behavior of the two measures when the portfolio size is large and provide better understanding on how dependence affects the large portfolio loss. While the effectiveness of estimating the portfolio loss based on the asymptotic expansions may deteriorate if the portfolio size is not sufficiently large enough, it is still very useful as it provides a good foundation for the MC based algorithms that we are developing subsequently. In particular, the second key contribution of the paper is to exploit these asymptotic results and propose two efficient MC based methods for estimating the above two performance measures. More specifically, the first one is a two-step full importance sampling algorithm that provides efficient estimators for both measures. Furthermore, we show that the proposed estimator of probability of large portfolio loss is logarithmically efficient. The second algorithm, which is based on the conditional Monte Carlo method, is shown to have bounded relative error. Relative to the importance sampling method, the conditional Monte Carlo algorithm has the advantage of its simplicity though it can only be used to estimate the probability of portfolio loss. Simulation studies also show the better performance of the second algorithm than the first one. Overall, both of them generate significant variance reductions when compared to naive MC simulations.

The rest of the paper is organized as follows. We formulate our problem in Sect. 2 and describe Archimedean copula and regular variation in Sect. 3. Main results are presented in Sects. 4, 5 and 6, with Sect. 4 derives the sharp asymptotics and the latter two sections present our proposed efficient Monte Carlo algorithms and analyze their performances. Through an extensive simulation study, Sect. 7 provides further comparable analysis on the relative effectiveness of our proposed algorithms. Proofs are relegated to “Appendix”.

2 Problem formulation

Consider a large credit portfolio of n obligors. Similar to Bassamboo et al. (2008), we employ a static structural model for portfolio loss by introducing latent variables \(\{X_{1},\ldots ,X_{n}\}\) so that each obligor defaults if each latent variable \(X_{i}\) exceeds some pre-specified threshold \(x_{i}\), for \(i=1,2,\ldots ,n\). By denoting \(c_{i}>0\) as the risk exposure at default that corresponds to obligor i, for \(i=1,2,\ldots ,n\), the portfolio loss incurred from defaults is given by

$$\begin{aligned} L_{n}=\sum _{i=1}^{n}c_{i}1_{\{X_{i}>x_{i}\}}, \end{aligned}$$
(2.1)

where \(1_{A}\) is the indicator function of an event A. Such a threshold model can first trace back to Merton (1974). Let \(F_{i}\) and \(\overline{F}_{i}\) be, respectively, the marginal distribution function and marginal survival function of \(X_{i}\). Then \(F_{i}=1-\overline{F}_{i}\), for \(i=1,2,\ldots ,n\). In the threshold model, \(\overline{F}_{i}(x_i)\) can be interpreted as the marginal default probability of obligor i and we use \(p_{i}\) to denote it.

As pointed out in the last section, the dependence among the latent variables has a direct impact on the tail of the loss for a large portfolio and their dependence structure is conveniently modelled via copulas. This is also highlighted in Lemma 11.2 of McNeil et al. (2015) that in a threshold model the copula of the latent variables determines the link between marginal default probabilities and portfolio default probabilities. To see this, let \(U_{i}=F_{i}(X_{i})\) and \(p_i= \overline{F}_{i} (x_i)\) for \(i=1,\ldots ,n\). It follows immediately from Lemma 11.2 of McNeil et al. (2015) that \((X_{i},x_{i})_{1\le i\le n}\) and \((U_{i},p_{i})_{1\le i\le n}\) are two equivalent threshold models. Then the portfolio loss is affected by the dependence among the latent variables rather than the distribution of each latent variable. This is also the reason why we conduct our analysis by focusing on the dependence structure of the obligors and with the assumption that the dependence of \((U_{1},U_{2},\ldots ,U_{n})\) is adequately captured by an Archimedean copula.

Recall that the main focus of the paper is to study the credit portfolio for which the portfolio consists of a large number of obligors and each obligor has low default probability. While default events are rare, the potential loss is significant once they are triggered and with simultaneous defaults. By the transformation \(U_{i}=F_{i}(X_{i})\) for \(i=1,\ldots ,n\), the event that obligor defaults \(\{X_{i}>x_{i}\}\) is equivalent to \(\{U_{i}>1-p_{i}\}\). From the theory of diversification, the probability of large portfolio loss should diminish as n increases. To capture this feature, the individual default probability can also be expressed as \(p_{i}=l_{i}f_{n}\) for \(i=1,\ldots ,n\), where \(f_{n}\) is a positive deterministic function converging to 0 as \(n\rightarrow \infty \) and \(\{l_{1},\ldots ,l_{n}\}\) are strictly positive constants accounting for variations effect on different obligors. We emphasize that on one hand, the assumption that \(f_{n}\) converges to 0 is to reflect the diversification effect in a large portfolio that individual default probability diminishes as n increases. On the other hand, it provides mathematical convenience to derive sharp asymptotics for the large portfolio loss (see discussions after Theorem 4.1) and to prove the algorithms’ efficiency (Theorems 5.1 and 6.1). Such condition is also assumed in Gordy (2003), Bassamboo et al. (2008), Chan and Kroese (2010) and Tang et al. (2019), for example. More detailed explanations on the assumption of \(f_n\) are provided in Sect. 4.1. With this representation, we can rewrite the overall portfolio loss (2.1) as

$$\begin{aligned} L_{n}=\sum _{i=1}^{n}c_{i}1_{\{U_{i}>1-l_{i}f_{n}\}}. \end{aligned}$$
(2.2)

In the remaining of the paper, we use (2.2) to analyze the large portfolio loss. To characterize the potential heterogeneity among obligors, we further impose some restrictions on the sequence \(\{(c_{i},l_{i}):i\ge 1\}\), as in Bassamboo et al. (2008).

Assumption 2.1

Let the positive sequence \(((c_{i},l_{i}):i\ge 1)\) take values in a finite set \(\mathcal {W}\). By denoting \(n_{j}\) as the number of each element \((c_{j},l_{j})\in \mathcal {W}\) in the portfolio, we further assume that \(n_{j}/n\) converges to \(w_{j}>0\), for each \(j\le |\mathcal {W}|\) as \(n\rightarrow \infty \).

In practice, Assumption 2.1 can be interpreted as a heterogeneous credit portfolio that comprises of a finite number of homogeneous sub-portfolios based on risk types and exposure sizes. We note that it is easy to relax this assumption to the case where \(c_{i}\) and \(l_{i}\) are random variables; see Tong et al. (2016) and Tang et al. (2019) for recent discussions.

3 Preliminaries

3.1 Archimedean copulas

Archimedean copulas have a simple closed form and can be represented by a generator function \(\phi \) as follows:

$$\begin{aligned} C(u_{1},\ldots ,u_{n})=\phi ^{-1}(\phi (u_{1})+\cdots +\phi (u_{n})), \end{aligned}$$
(3.1)

where \(C:[0,1]^{n}\rightarrow [0,1]\) is a copula function. The generator function \(\phi :[0,1]\rightarrow [0,\infty ]\) is continuous, decreasing and convex such that \(\phi (1)=0\) and \(\phi (0)=\infty \), and \(\phi ^{-1}\) is the inverse of \(\phi \). We further assume \(\phi ^{-1}\) is completely monotonic, i.e. \((-1)^{i}\left( \phi ^{-1}\right) ^{(i)}\ge 0\) for all \(i\in \mathbb {N}\), which allows \(\phi ^{-1}\) to be a Laplace-Stieltjes (LS) transform of a distribution function G on \([0,\infty ]\) such that \(G(0)=0\). Let V be a random variable with a distribution function G on \([0,\infty ]\). The LS transform of V (or G) is defined as

$$\begin{aligned} \phi ^{-1}(s)=\mathcal {L}_{V}(s)=\int _{0}^{\infty }e^{-sv}\mathrm {d} G(v)=\mathbb {E}\left[ e^{-sV}\right] ,\qquad s\ge 0. \end{aligned}$$

Archimedean copulas that are generated from LS transforms of different distributions are referred as LT-Archimedean copulas, as formally defined below:

Definition 3.1

An LT-Archimedean copula is a copula of the form (3.1), where \(\phi ^{-1}\) is the Laplace-Stieltjes transform of a distribution function G on \([0,\infty ]\) such that \(G(0)=0\).

For many popular Archimedean copulas, the random variable V has a known distribution. For example, V is Gamma distributed for Clayton copulas, while V is a one-sided stable random variable for Gumbel copulas. A detailed specification on V can be found in Table 1 of Hofert (2008).

The following result provides a stochastic representation of \(\mathbf {U} =(U_{1},\ldots ,U_{n})\) where \(\mathbf {U}\) follows an LT-Archimedean copula of the form:

$$\begin{aligned} \mathbf {U}=\left( \phi ^{-1}\left( \frac{R_{1}}{V}\right) ,\ldots ,\phi ^{-1}\left( \frac{R_{n}}{V}\right) \right) . \end{aligned}$$
(3.2)

Here V is a positive random variable with LS transform \(\phi ^{-1}\) and \(\{R_{1},\ldots ,R_{n}\}\) is a sequence of independent and identically distributed (i.i.d.) standard exponential random variables independent of V. This representation is first recognized by Marshall and Olkin (1988) and later is formally proved in Proposition 7.51 of McNeil et al. (2015).

The construction (3.2) is especially useful in the field of credit risk. To see this, let us consider the threshold model defined in (2.2) and that \(\mathbf {U}\) has an LT-Archimedean copula with generator \(\phi \) as defined in (3.2). Then the random variable V can be considered as a proxy for systematic risks. Conditioning on V, random variables \(U_{1},\ldots ,U_{n}\) are independent with conditional distribution function \(\mathbb {P}(U_{i}\le u|V=v)=\exp (-v\phi (u))=p_{i}(v)\) for some predetermined \(u\in [0,1]\). By such a construction, the threshold model (2.2) can be represented succinctly as a one-factor Bernoulli mixture model with mixing variable V and mixing probabilities \(p_{i}(v),i=1,\ldots ,n\). This property offers two important aspects. First is that it facilitates the asymptotic analysis of large portfolio loss. By understanding it as a one-factor Bernoulli mixture model, we will show in Sect. 4 that the large portfolio loss is essentially determined by the mixing distribution of V or its LS transform \(\phi ^{-1}\). This is also observed in Gordy (2003) that asymptotic analysis provides a simple yet quite accurate way of evaluating large portfolio loss. In the current paper, we push one step further by deriving the sharp asymptotics for the large portfolio loss in a more explicit way and under the Archimedean copula model. Second, the Bernoulli mixture models lend themselves to practical implementation of MC simulations (see McNeil et al., 2015; Basoğlu et al., 2018). To be more specific, a Bernoulli mixture model can be simulated by first generating a realization v of V and then conducting independent Bernoulli experiments with conditional default probabilities \(p_{i}(v)\). This generation algorithm is explicitly exploited in Sect. 5 as the starting point of our proposed importance sampling simulations.

3.2 Regular variation

Regular variation is an important notion in our modeling. Intuitively, a function f is regularly varying at infinity if it behaves like a power law function near infinity. Interested readers may refer to Bingham et al. (1989) and Resnick (2013) for textbook treatments. In our model, we assume the generator function \(\phi \) of the LT-Archimedean copula is regularly varying in order to capture the upper tail dependence.

Definition 3.2

A positive Lebesgue measurable function f on \((0,\infty )\) is said to be regularly varying at \(\infty \) with index \(\alpha \in \mathbb {R}\), written as \(f\in \mathrm {RV}_{\alpha }\), if for \(x>0\),

$$\begin{aligned} \lim _{t\rightarrow \infty }\frac{f(tx)}{f(t)}=x^{\alpha }. \end{aligned}$$

Similarly, f is said to be regularly varying at 0 if \(f(\frac{1}{\cdot })\in \mathrm {RV}_{\alpha }\) and f to be regularly varying at \(a>0\) if \(f(a-\frac{1}{\cdot })\in \mathrm {RV}_{\alpha }\).

It turns out that many LT-Archimedean copulas which are commonly used in practice have generators that are regularly varying at 1. For example, Gumbel copula has a generator of \(\phi (t)=(-\ln (t))^{\alpha }\) for \(\alpha \in [1,\infty )\), and it follows that \(\phi ^{-1}\) is completely monotonic and \(\phi (1-\frac{1}{\cdot })\in \mathrm {RV}_{-\alpha }\).

4 Asymptotic analysis for large portfolio loss

In this section, we conduct an asymptotic analysis on a regime where the number of obligors is large with each individual obligor having an excellent credit rating (i.e. with small default probability). Our asymptotic analysis focuses on large portfolio losses with Sect. 4.1 analyzes the tail probabilities of the losses and Sect. 4.2 tackles the expected shortfall of the losses.

4.1 Asymptotics for probabilities of large portfolio loss

By considering the portfolio loss model (2.2), this subsection analyzes the asymptotic probability \(\mathbb {P}(L_{n}>nb)\) as \(n\rightarrow \infty \), where b is an arbitrarily fixed number. We restrict our analysis to LT-Archimedean copulas for modeling the dependence among the latent variables in order to fully take advantage of the Bernoulli mixture structure (as explained in Sect. 3.1). Recall the random variable V in the representation (3.2) can be interpreted as the systematic risk or common shock factor. The dependence of obligors is mainly induced by V. Note that \(\phi ^{-1}\) is a decreasing function. When V takes on large values, all \(U_{i}\)’s are likely to be large (close to 1), which leads to many obligors default simultaneously. To incorporate strong dependence among obligors, we assume V to be heavy tailed. In particular, we assume \(\overline{F}_{V} \in \mathrm {RV}_{-1/\alpha }\), with \(\overline{F}_{V}(\cdot )\) corresponds to the survival distribution function of V and \(\alpha >1\). Here \(\alpha \) represents the heavy tailedness of V so that the larger \(\alpha \) is, the heavier tail V has and the more dependent the obligors are, which means simultaneous defaults are more likely to occur. By Karamata’s Tauberian theorem, this is equivalent to assuming that \(\phi (1-\frac{1}{\cdot })\in \mathrm {RV}_{-\alpha }\) with \(\alpha >1\), where by the convexity of the generator \(\phi \) the condition \(\alpha >1\) necessarily holds. Such heavy tailed assumption on the systematic risk factor (or common shock) can be seen in Bassamboo et al. (2008), Chan and Kroese (2010) and Tang et al. (2019). This is formalized in the following assumption.

Assumption 4.1

Assume \(\mathbf {U}=(U_{1},\ldots ,U_{n})\) follows an LT-Archimedean copula with generator \(\phi \) satisfying that \(\phi (1-\frac{1}{\cdot } )\in \mathrm {RV}_{-\alpha }\) with \(\alpha >1\). Let V be the random variable associated with Laplace-Stieltjes transform \(\phi ^{-1}\). Assume that V has a eventually monotone density function.

Before presenting the main result of this section, it is useful to note that by conditioning on \(V=\dfrac{v}{\phi (1-f_{n})}\), we have

$$\begin{aligned} p(v,i)&:=\mathbb {P}\left( U_{i}>1-l_{i}f_{n}\Big \vert V=\frac{v}{\phi (1-f_{n})}\right) \nonumber \\&=1-\exp \left( -v\frac{\phi (1-l_{i}f_{n})}{\phi (1-f_{n})}\right) . \end{aligned}$$
(4.1)

Under Assumption 4.1 that \(\phi (1-\frac{1}{\cdot })\in \mathrm {RV} _{-\alpha }\), we immediately obtain

$$\begin{aligned} \lim _{n\rightarrow \infty }p(v,i)=1-\exp \left( -vl_{i}^{\alpha }\right) :=\tilde{p}(v,i). \end{aligned}$$

With the condition \(V=\dfrac{v}{\phi (1-f_{n})}\), and by the Kolmogorov’s strong law of large numbers, it follows that, almost surely

$$\begin{aligned} \frac{L_{n}|V=\frac{v}{\phi (1-f_{n})}}{n}\rightarrow r(v):=\sum _{j\le |\mathcal {W}|}c_{j}w_{j}\tilde{p}(v,j),\qquad \text {as }n\rightarrow \infty . \end{aligned}$$
(4.2)

Recall \(w_j\) is formally defined in Assumption 2.1. Note that r(v) is strictly increasing in v and attains its upper bound \(\bar{c}=\sum _{j\le |\mathcal {W}|}c_{j}w_{j}\) at infinity, where \(\bar{c}\) can be interpreted as the limiting average loss when all obligors default. Thus, for each \(b\in (0,\bar{c})\), we denote \(v^{*}\) as the unique solution to

$$\begin{aligned} r(v)=b. \end{aligned}$$
(4.3)

Essentially, \(v^{*}\) represents the threshold value so that for \(V\in (0,v^{*}/\phi (1-f_{n}))\), the limiting average portfolio loss \(\bar{c}\) is less than b; for \(V\in (v^{*}/\phi (1-f_{n}),\infty )\), the limiting average portfolio loss \(\bar{c}\) is greater than b.

Now we are ready to present the main theorem of this section which gives a sharp asymptotic for the probability of large portfolio losses. The proof is relegated to “Appendix”.

Theorem 4.1

Consider the portfolio loss defined in (2.2). Under Assumptions 2.1 and 4.1 and further assume that \(\exp (-n\beta )=o(f_{n})\) for any \(\beta >0\). Then for any fixed \(b\in (0,\bar{c})\), as \(n\rightarrow \infty \),

$$\begin{aligned} \mathbb {P}(L_{n}>nb)\sim f_{n}\frac{(v^{*})^{-1/\alpha }}{\Gamma (1-1/\alpha )}, \end{aligned}$$
(4.4)

where \(v^{*}\) is the unique solution that solves (4.3).

Remark 4.1

We emphasize that the asymptotic behavior of the portfolio loss is mostly dictated by \(\alpha \) and \(f_{n}\). Recall \(\alpha \) is the index of regular variation of the generator function \(\phi \). It controls dependence among the latent variables: the larger \(\alpha \), the more likely that obligors tend to default simultaneously. Once \(\alpha \) is fixed, Theorem 4.1 shows that the probability of large portfolio loss diminishes to zero at the same rate as \(f_{n}\). This result is sharp and is more explicit, compared to the results in Gordy (2003). As explained in Sect. 3.1, the threshold model (2.2) under the Archimedean copula is reduced to a Bernoulli mixture model while Gordy (2003) studies a risk-factor model, which is essentially a Bernoulli mixture model. In that paper, the author shows that the capital of the fine-grained portfolio asymptotically converges to the capital of the systematic risk factor. In other words, the asymptotics are presented between the portfolio loss and the systematic risk factor (equivalent to the random variable V here). Moreover, through numerical experiments, the author shows the approximation of the large portfolio loss by the systemic risk factor is quite accurate and simple. In our current paper, we are able to go one step further that the large portfolio loss is given in a more explicit form that it is linearly proportional to the individual default probabilities \(f_{n}\). Thus, to approximate the large portfolio loss, we do not rely on the full information of the systematic risk factor. At the same time, the asymptotics derived in our paper should share the same advantages of accuracy and simplicity as discussed in Gordy (2003).

Now we discuss the assumption on \(f_n\) in greater details. As mentioned earlier, due to the effect of diversification, individual default probability diminishes in a large portfolio as n increases. On the technical side, letting \(f_{n}\) converge to 0 ensures that a large portfolio loss occurs primarily when V takes large values, whereas \(R_{i}\), \(i=1,\ldots ,n\), generally does not play a role in the occurrence of the large portfolio loss. To better understand this requirement, we consider the case with \(f_{n}\equiv f\) being a constant. Then similar calculations as in (4.1) leads to

$$\begin{aligned} p_{0}(v,i):=\mathbb {P}\left( U_{i}>1-l_{i}f|V=v\right) =1-\exp (-v\phi (1-l_{i}f)). \end{aligned}$$

Note that \(p_{0}(v,i)\) is strictly increasing in v. Under the condition \(V=v\), and the Kolmogorov’s strong law of large numbers, we have, almost surely,

$$\begin{aligned} \frac{L_{n}|V=v}{n}\rightarrow r_{0}(v):=\sum _{j\le |\mathcal {W}|}c_{j} w_{j}p_{0}(v,j),\qquad \text {as }n\rightarrow \infty , \end{aligned}$$

where the limit follows from Assumption 2.1. Clearly, \(r_{0}(v)\) is also strictly increasing in v. Define \(v_{0}^{*}\) as the unique solution to

$$\begin{aligned} r_{0}(v)=b. \end{aligned}$$

It then follows for portfolio size n large enough, we have \(\mathbb {P} (L_{n}>nb|V=v)=0\) for \(v\le v_{0}^{*}\); and \(\mathbb {P}(L_{n}>nb|V=v)=1\) for \(v>v_{0}^{*}\). Thus, for any \(b\in (0,\bar{c})\), and large enough n, we have

$$\begin{aligned} \mathbb {P}(L_{n}>nb)=\mathbb {E}[\mathbb {P}(L_{n}>nb|V)]=\overline{F}_{V} (v_{0}^{*}). \end{aligned}$$

This leads to a mathematically trivial result. Moreover, it is counter-intuitive in the sense that as the size of the portfolio increases, the probability of portfolio loss is still significant (i.e. not converging to 0 which is in contradiction with the portfolio diversification). This illustration exemplified the importance of the assumption that \(f_{n}\) diminishes to 0 as \(n\rightarrow \infty \) to account for the rarity of large loss. The assumption \(\exp (-n\beta )=o(f_{n})\) essentially requires that the decay rate of \(f_{n}\) to 0 needs to be slower than an exponential function. By choosing different \(f_{n}\), portfolios will have different credit rating classes. For example, if \(f_{n}\) decays at a faster rate such as 1/n, then the portfolio has higher quality obligors, whereas if \(f_{n}\) decays at a slower rate of \(1/\ln n\), then the portfolio consists of more risky obligors. There are also many similar discussions on the requirement of individual default probability diminishes in a large portfolio (equivalent to our \(f_{n}\rightarrow 0\)) in the literature, which are all rooted in the effect of diversification of a large portfolio. For example, in Gordy (2003), the assumption (A-2) guarantees that the share of the largest single exposure in total portfolio exposure vanishes to zero as the number of exposures in the portfolio increases. Tang et al. (2019) explained it more explicitly in their Sect. 2 as follows. As the size of the portfolio increases, each latent variable \(X_{i}\) should be modified to \(\frac{X_{i}}{\iota _{i}g_{n}}\), where \(g_{n}\) is a positive function diverging to \(\infty \) to reflect an overall improvement on the credit quality, and \(\iota _{i}\) is a positive random variable to reflect a minor variation in portfolio effect on obligor i. With the endogenously determined default threshold of obligor i is fixed as \(a_{i}>0\), the individual default occurs as \(\frac{X_{i}}{\iota _{i}g_{n}}>a_{i}\) if and only if \(X_{i}>\iota _{i}a_{i}g_{n}\), which is equivalent to have \(U_{i}<1-l_{i}f_{n}\) with \(f_{n}\) decreasing to 0 in our context.

Next we use an example involving a fully homogeneous portfolio to further illustrate our results.

Example 4.1

Assume a fully homogeneous portfolio, that is \(l_{i}\equiv l,c_{i}\equiv c\). Under this assumption, (4.2) simplifies to

$$\begin{aligned} r(v)=c\left( 1-\exp \left( -vl^{\alpha }\right) \right) . \end{aligned}$$

Thus, \(v^{*}=l^{-\alpha }\ln \dfrac{c}{c-b}\) is the unique solution to \(r(v)=b\). It immediately follows from relation (4.4) that, for \(b\in (0,c)\), we have

$$\begin{aligned} \mathbb {P}(L_{n}>nb)\sim lf_{n}\frac{\left( \ln \frac{c}{c-b}\right) ^{-1/\alpha }}{\Gamma (1-1/\alpha )}. \end{aligned}$$
(4.5)

Direct calculation further shows that the right-hand side of (4.5) is an increasing function of \(\alpha \) if \(\ln \frac{c}{c-b}\ge \exp (-\gamma )\), i.e., \(b/c\ge 1-e^{-e^{-\gamma }}\), where \(\gamma \) denotes the Euler’s constant. This monotonic result can be interpreted in an intuitive way. Recall that \(\alpha \) is the index of regular variation of the generator function \(\phi \). A larger \(\alpha \) corresponds to a stronger upper tail dependence, and therefore a joint default of obligors is more likely to occur. However, the monotonicity fails if b is not large. In this case, the mean portfolio loss (\(L_{n}/n\)) is compared to a lower level b and such event may occur due to a single obligor default. Thus, both the upper tail dependence and the level of mean portfolio loss affect probability of large portfolio loss.

4.2 Asymptotics for expected shortfall of large portfolio loss

The asymptotic expansions on the tail probabilities of the large portfolio loss provide the foundation in the analysis of the expected shortfall. To see this, the expected shortfall can be rewritten as

$$\begin{aligned} \mathbb {E}\left[ L_{n}|L_{n}>nb\right] =nb+n\frac{\int _{b}^{\infty }\mathbb {P}\left( L_{n}>nx\right) \mathrm {d}x}{\mathbb {P}\left( L_{n}>nb\right) }. \end{aligned}$$
(4.6)

Theorem 4.1 becomes the key to establishing an asymptotic for the expected shortfall, as formally stated in the following theorem.

Theorem 4.2

Under the same assumption as in Theorem 4.1, the following relation

$$\begin{aligned} \mathbb {E}\left[ L_{n}|L_{n}>nb\right] \sim n\psi (\alpha ,b) \end{aligned}$$
(4.7)

holds for any fixed \(b\in (0,\bar{c})\), where

$$\begin{aligned} \psi (\alpha ,b):=b+\frac{\int _{v^{*}}^{\infty }r^{\prime }(v)v^{-1/\alpha }\mathrm {d}v}{(v^{*})^{-1/\alpha }}. \end{aligned}$$

The above theorem states that the expected shortfall grows almost linearly with the size of the portfolio n.

5 Importance sampling (IS) simulations for large portfolio loss

The asymptotic results established in the last section (see Theorems 4.1 and 4.2) characterize the behavior of large portfolio losses. These results, however, may not be applicable in practical applications unless the size of portfolio is large. In practice, the tail probability or the expected shortfall of the portfolio loss are typically estimated via MC simulation methods due to the non-tractability. Naive application of MC method to this type of rare-event problems, on the other hand, is notoriously inefficient. For this reason, variance reduction methods are often used to enhance the underlying MC methods. We similarly follow this line of inquiry and propose two variance reduction algorithms. In particular, an IS algorithm based on a hazard rate twisting is presented in this section while a second algorithm based on the conditional Monte Carlo simulations will be discussed in the next section. The asymptotic analysis in Sect. 4 plays an important role in proving the efficiency of both algorithms.

5.1 Preliminary of importance sampling

We are interested in estimating \(\mathbb {P}\left( L_{n}>nb\right) \), where \(L_{n}\) can be considered as a linear combination of conditionally independent Bernoulli random variables \(\{1_{\{U_{i}>1-l_{i}f_{n}\}},i=1,\ldots ,n\}\). For each Bernoulli variable, the associated probability is denoted by \(p_{j}\) for \(j\le |\mathcal {W}|\), which is a function of the generated variable V. Following the analysis in Sect. 4, \(p_{j}\) is explicitly given as p(vj) as shown in (4.1). The simulation of \(\mathbb {P}\left( L_{n}>nb\right) \) is then conducted in two steps. In step 1, the common factor V using the density function \(f_{V}(\cdot )\) is simulated and in step 2, the corresponding Bernoulli random variables are generated. When the portfolio size is very large, the event \(\{L_{n}>nb\}\) only occurs when V takes large values and it further leads to the default probability \(p_{j}\) for each Bernoulli variable is small. Thus, both steps in the simulation of \(\mathbb {P}\left( L_{n}>nb\right) \) are rare event simulations. Estimation by naive MC simulation becomes impractical due to the large number of samples needed, and therefore, one has to resort to variance reduction techniques. IS is a widely used variance reduction technique by placing greater probability mass on the rare event of interest and then appropriately normalizing the resulting output. Next, we briefly discuss how we apply IS in the two-step simulation.

In the first step, the tail behavior of large portfolio loss highly depends on the tail distribution of V, i.e., the key to the occurrence of the large loss event corresponds to V taking large value. Then a good importance sampling distribution for random variable V should be more heavy-tailed than its original distribution, so that a larger probability could be assigned to the event that the average portfolio loss conditioned on V exceeds the level b. Such importance sampling distribution can be obtained via hazard rate twisting on V. Let \(\tilde{f}_{V}(\cdot )\) denote the importance sampling density function for V after the application of IS. In the second step, we improve the efficiency of calculating the conditional probabilities by replacing each Bernoulli success probability \(p_{j}\) by some probability \(\tilde{p}_{j}\), for \(j\le |\mathcal {W}|\). In this case, exponentially twisting is a fairly well-established approach for Bernoulli random variables; see, e.g., Glasserman and Li (2005). Let \(\tilde{\mathbb {P}}\) denote the corresponding IS probability measure and \(\tilde{\mathbb {E}}\) be the expectation under the measure \(\tilde{\mathbb {P}}\). Then the following identity holds:

$$\begin{aligned} \mathbb {P}\left( L_{n}>nb\right) =\mathbb {E}\left[ 1_{\{L_{n}>nb\}}\right] =\tilde{\mathbb {E}}\left[ 1_{\{L_{n}>nb\}}\tilde{L}\right] , \end{aligned}$$
(5.1)

where \(\tilde{L}=\dfrac{d\mathbb {P}}{d\tilde{\mathbb {P}}}\) is the Radon-Nikodym derivative of \(\mathbb {P}\) with respect to \(\tilde{\mathbb {P}}\) and equals

$$\begin{aligned} \frac{f_{V}(V)}{\tilde{f}_{V}(V)}\prod _{j\le |\mathcal {W}|}\left( \frac{p_{j}}{\tilde{p}_{j}}\right) ^{n_{j}Y_{j}}\left( \frac{1-p_{j}}{1-\tilde{p}_{j}}\right) ^{n_{j}(1-Y_{j})}. \end{aligned}$$

In the above expression, \(Y_{j}=1_{\{U_{j}>1-l_{j}f_{n}\}}\) and \(n_{j}Y_{j}\) denotes the number of defaults in sub-portfolio j. We refer to \(\tilde{L}\) as the unbiasing likelihood ratio. The key finding from (5.1) is that calculating the tail probability \(\mathbb {P}\left( L_{n}>nb\right) \) is equivalent to evaluating either expectation \(\mathbb {E}\left[ 1_{\{L_{n}>nb\}}\right] \) or \(\tilde{\mathbb {E}}\left[ 1_{\{L_{n}>nb\}}\tilde{L}\right] \). We refer the estimator based on the latter expectation as the IS estimator and its efficiency crucially depends on the choice of the IS density function \(\tilde{f}_{V}(\cdot )\).

We now discuss two measures to characterize the performance of the proposed IS estimator. Asymptotically, the good performance commonly observed in realistic situations is a bounded relative error (see Asmussen & Kroese, 2006; McLeish, 2010). We say a sequence of estimators \((1_{\{L_{n} >nb\}}\tilde{L}:n\ge 1)\) under probability measure \(\tilde{\mathbb {P}}\) has bounded relative error if

$$\begin{aligned} \limsup _{n\rightarrow \infty }\frac{\sqrt{\tilde{\mathbb {E}}\left[ 1_{\{L_{n}>nb\}}\tilde{L}^{2}\right] }}{\mathbb {P}\left( L_{n}>nb\right) }<\infty . \end{aligned}$$

A slightly weaker form criterion called asymptotically optimal is also widely used (see Glasserman & Li, 2005; Glasserman et al., 2007, 2008) if the following condition holds,

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\log \tilde{\mathbb {E}}\left[ 1_{\{L_{n}>nb\}}\tilde{L}^{2}\right] }{\log \mathbb {P}\left( L_{n}>nb\right) }=2. \end{aligned}$$

This condition is equivalent to saying that \(\lim \limits _{n\rightarrow \infty }\tilde{\mathbb {E}}\left[ 1_{\{L_{n}>nb\}}\tilde{L}^{2}\right] /\mathbb {P}\left( L_{n}>nb\right) ^{2-\varepsilon }=0\), for every \(\varepsilon >0\). It is readily to check that bounded relative error implies asymptotically optimality.

5.2 Two-step importance sampling for tail probabilities

5.2.1 First step: twisting V

As a first step in providing our IS algorithm for LT-Archimedean copulas, we apply IS to the distribution of random variable V. In Assumption 4.1, we assume the generator \(\phi (1-\frac{1}{\cdot })\in \mathrm {RV}_{-\alpha }\), where \(\phi ^{-1}\) is the LS transform of random variable V. Then by Karamata’s Tauberian Theorem (see Feller, 1971, pp. 442–446) V is actually heavy-tailed with tail index \(1/\alpha \). As noted in Asmussen et al. (2000), traditional exponential twisting approach cannot work directly for distributions with heavy tails, since a finite cumulant generating function in (5.7) does not exist when a positive twisting parameter is required. So an alternative method must be used. In this subsection we describe an IS algorithm to assign a larger probability to the event \(\left\{ V>\frac{v^{*}}{\phi (1-f_{n})}\right\} \) by hazard rate twisting the original distribution of V; see Juneja and Shahabuddin (2002) for an introduction on hazard rate twisting. We prove that this leads to an estimator that is asymptotically optimal.

Let us define the hazard rate function associated to the random variable V as

$$\begin{aligned} \mathcal {H}(x)=-\log (\overline{F}_{V}(x)). \end{aligned}$$

By changing \(\mathcal {H}(x)\) to \((1-\theta )\mathcal {H}(x)\) for some \(0<\theta <1\), the tail distribution changes to

$$\begin{aligned} \overline{F}_{V,\theta }(x)=(\overline{F}_{V}(x))^{1-\theta }=\exp ((\theta -1)\mathcal {H}(x)), \end{aligned}$$
(5.2)

and the density function becomes

$$\begin{aligned} f_{V,\theta }(x)=(1-\theta )(\overline{F}_{V}(x))^{-\theta }f_{V}(x)=(1-\theta )\exp (\theta \mathcal {H}(x))f_{V}(x). \end{aligned}$$
(5.3)

Note that we have imposed an additional subscript \(\theta \) on both \(\overline{F}_{V,\theta }(x)\) and \(\overline{f}_{V,\theta }(x)\) to emphasize that these are the functions that correspond to the transformed variable \((1-\theta )\mathcal {H}(x)\). The prescribed transformation is similar to exponential twisting, except that the twisting rate is \(\theta \mathcal {H}(x)\) rather than \(\theta x\). By (5.2), one can also note that the tail of random variable V becomes heavier after twisting.

The key, then, is finding the best parameter \(\theta \). By (5.3), the corresponding likelihood ratio \(f_{V}(x)/f_{V,\theta }(x)\) is \(\frac{1}{1-\theta }\exp (-\theta \mathcal {H}(x))\), and this is upper bounded by

$$\begin{aligned} \frac{1}{1-\theta }\exp \left( -\theta \mathcal {H}\left( \frac{v^{*}}{\phi (1-f_{n})}\right) \right) \end{aligned}$$
(5.4)

on the set \(\left\{ V>\frac{v^{*}}{\phi (1-f_{n})}\right\} \). It is a standard practice in IS to search for \(\tilde{\theta }\) by minimizing the upper bound on the likelihood ratio, since this also minimizes the upper bound of the second moment of the estimator \(1_{\{L_{n}>nb\}}\frac{f_{V}(V)}{f_{V,\theta }^{*}(V)}\). By taking the derivative on the upper bound (5.4) w.r.t. \(\theta \), we obtain

$$\begin{aligned} \tilde{\theta }=1-\frac{1}{\mathcal {H}\left( \frac{v^{*}}{\phi (1-f_{n} )}\right) }. \end{aligned}$$

Then, the tail distribution in (5.2) corresponding to hazard rate twisting by \(\tilde{\theta }\) equals

$$\begin{aligned} \overline{F}_{V,\tilde{\theta }}(x)=\exp \left( -\frac{\mathcal {H} (x)}{\mathcal {H}\left( \frac{v^{*}}{\phi (1-f_{n})}\right) }\right) . \end{aligned}$$
(5.5)

Explicit form of (5.5) is usually difficult to derive, because the tail distribution for random variable V is only specified in a semiparametric way. Alternatively, we can replace the hazard function \(\mathcal {H}(x)\) by \(\tilde{\mathcal {H}}(x)\) where \(\mathcal {H}(x)\sim \tilde{\mathcal {H}}(x)\) and \(\tilde{\mathcal {H}}(x)\) is available in a closed form. Juneja et al. (2007) prove that estimators derived by such “asymptotic” hazard rate twisting method can achieve asymptotic optimality.

By Proposition B.1.9(1) of de Haan and Ferreira (2007), \(\overline{F}_{V} \in \mathrm {RV}_{-1/\alpha }\) implies \(\mathcal {H}(x)\sim \frac{1}{\alpha } \log (x)\) as \(x\rightarrow \infty \). This, along with (5.5), suggests that the tail distribution \(\overline{F}_{V,\tilde{\theta }}\) should be close to

$$\begin{aligned} \overline{F}_{V,\tilde{\theta }}(x)\approx x^{-1/\left( \log v^{*}-\log \phi (1-f_{n})\right) }. \end{aligned}$$

For considerably large n, we can even ignore the term \(\log (v^{*})\) to achieve further simplification. Hence, the corresponding density function can be taken as

$$\begin{aligned} \frac{1}{-\log \phi (1-f_{n})}x^{\frac{1}{\log \phi (1-f_{n})}-1}, \end{aligned}$$

which is a Pareto distribution with shape parameter \(-1/\log \phi (1-f_{n})\). Now we define

$$\begin{aligned} f_{V}^{*}(x)=\left\{ \begin{array}{lc} f_{V}(x), &{} x<x_{0},\\ \overline{F}_{V}(x_{0})x_{0}^{-1/\log \phi (1-f_{n})}\frac{1}{-\log \phi (1-f_{n})}x^{\frac{1}{\log \phi (1-f_{n})}-1} &{} x\ge x_{0}, \end{array} \right. \end{aligned}$$
(5.6)

where \(x_{0}\) is chosen to remain the ratio \(f_{V}(x)/f_{V}^{*}(x)\) upper bounded by a constant for all x. Thus, the tail part of random variable V becomes heavier from twisting, but the probability for small values remains unchanged.

Remark 5.1

The role of \(x_{0}\) is crucial for showing the asymptotic optimality of the algorithm, which is later seen in the proof of Lemma 5.1. Theoretically, its value relies on the explicit expression of the density function \(f_{V}(x)\). Practically, our numerical results are not sensitive to \(x_{0}\) and hence for ease of implementation, one may fix \(x_{0}\) to an arbitrary constant.

5.2.2 Second step: twisting to Bernoulli random variables

We now proceed to applying exponential twisting to Bernoulli random variables \(\{1_{\{U_{i}>1-l_{i}f_{n}\}},i=1,\ldots ,n\}\) conditional on the common factor V. A measure \(\tilde{\mathbb {P}}\) is said to be an exponentially twisted measure of \(\mathbb {P}\) by parameter \(\theta \), for some random variable X, if

$$\begin{aligned} \dfrac{d\tilde{\mathbb {P}}}{d\mathbb {P}}=\exp (\theta X-\Lambda _{X}(\theta )), \end{aligned}$$
(5.7)

where \(\Lambda _{X}(\theta )=\log \mathbb {E}[\exp (\theta X)]\) represents the cumulant generating function. Suppose random variable X has density function \(f_{X}(x)\), then the exponential twisted density has the form \(\exp (\theta x-\Lambda _{X}(\theta ))f_{X}(x)\).

Now we deal with the Bernoulli success probability \(p_{j}\), which is essentially p(vj) as defined in (4.1) by conditioning on \(V=\dfrac{v}{\phi (1-f_{n})}\). In order to increase the conditional default probabilities, followed by the idea in Glasserman and Li (2005), we apply an exponential twist by choosing a parameter \(\theta \) and taking

$$\begin{aligned} p_{j}^{\theta }=\frac{p_{j}e^{\theta c_{j}}}{1+p_{j}\left( e^{\theta c_{j} }-1\right) }, \end{aligned}$$

where \(p_{j}^{\theta }\) denotes the \(\theta \)-twisted probability conditional on \(V=\dfrac{v}{\phi (1-f_{n})}\). Note that \(p_{j}^{\theta }\) is a strictly increasing function in \(\theta \) if \(\theta >0\). With this new choice of conditional default probabilities \(\left\{ p_{j}^{\theta }:j\le |\mathcal {W}|\right\} \), straightforward calculation shows that the likelihood ratio conditioning on V simplifies to

$$\begin{aligned} \prod _{j\le |\mathcal {W}|}\left( \frac{p_{j}}{p_{j}^{\theta }}\right) ^{n_{j}Y_{j}}\left( \frac{1-p_{j}}{1-p_{j}^{\theta }}\right) ^{n_{j} (1-Y_{j})}=\exp \left( -\theta L_{n}|V+\Lambda _{L_{n}|V}(\theta )\right) , \end{aligned}$$
(5.8)

where

$$\begin{aligned} \Lambda _{L_{n}|V}(\theta )=\log \mathbb {E}\left[ e^{\theta L_{n}}\left| V=\frac{v}{\phi (1-f_{n})}\right. \right] =\sum _{j\le |\mathcal {W}|}n_{j} \log \left( 1+p_{j}\left( e^{\theta c_{j}}-1\right) \right) \end{aligned}$$

is the cumulant generating function of \(L_{n}\) conditional on V. For any \(\theta \), the estimator

$$\begin{aligned} 1_{\{L_{n}>nb|V\}}e^{-\theta L_{n}|V+\Lambda _{L_{n}|V}(\theta )} \end{aligned}$$

is unbiased for \(\mathbb {P}\left( L_{n}>nb\left| V=\frac{v}{\phi (1-f_{n})}\right. \right) \) if probabilities \(\left\{ p_{j}^{\theta } :j\le |\mathcal {W}|\right\} \) are used to generate \(L_{n}\). Equation (5.8) formally establishes that applying an exponential twist on the probabilities is equivalent to applying an exponential twist to \(L_{n}|V\) itself.

It remains to choose the parameter \(\theta \). A standard practice in IS is to select a parameter \(\theta \) that minimizes the upper bound of the second moment of the estimator to reduce the variance. As we can see,

$$\begin{aligned} \mathbb {E}_{\theta }\left[ 1_{\{L_{n}>nb\}}e^{-2\theta L_{n}+2\Lambda _{L_{n} }(\theta )}\left| V=\frac{v}{\phi (1-f_{n})}\right. \right] \le e^{-2nb\theta +2\Lambda _{L_{n}|V}(\theta )}, \end{aligned}$$

where \(\mathbb {E}_{\theta }\) denotes expectation using the \(\theta \)-twisted probabilities. The problem is then identical to finding a parameter \(\theta \) that maximizes \(nb\theta -\Lambda _{L_{n}|V}(\theta )\). Straightforward calculation shows that

$$\begin{aligned} \Lambda _{L_{n}|V}^{\prime }(\theta )=\sum _{j\le |\mathcal {W}|}n_{j}c_{j} p_{j}^{\theta }=\mathbb {E}_{\theta }\left[ L_{n}\left| V=\frac{v}{\phi (1-f_{n})}\right. \right] . \end{aligned}$$
(5.9)

By the strictly increasing property of \(\Lambda _{L_{n}|V}^{\prime }(\theta )\), the maximum is attained at

$$\begin{aligned} \theta ^{*}=\left\{ \begin{array}{lc} \text {unique solution to }\Lambda _{L_{n}|V}^{\prime }(\theta )=nb, &{}\quad nb>\Lambda _{L_{n}|V}^{\prime }(0),\\ 0, &{}\quad nb\le \Lambda _{L_{n}|V}^{\prime }(0). \end{array} \right. \end{aligned}$$
(5.10)

By (5.9), the two cases in (5.10) are distinguished by the value of \(\mathbb {E}\left[ L_{n}\left| V=\frac{v}{\phi (1-f_{n})}\right. \right] =\sum _{j\le |\mathcal {W}|}n_{j}c_{j}p_{j} \). For the former case, our choice of twisting parameter \(\theta ^{*}\) shifts the distribution of \(L_{n}\) so that the average portfolio loss is b; while for the latter case, the event \(\{L_{n}>nb\}\) is not rare, so we use the original probabilities.

5.2.3 Algorithm

Now we are ready to present the algorithm. It consists of three stages. First, a sample of V is generated using hazard rate twisting. Depending on the value of V, samples of the Bernoulli variables \(1_{\{U_{i}>1-l_{i}f_{n}\}}\) are generated in the second step, using either naive simulation (original probabilities) or importance sampling. The details on how to adjust conditional default probabilities have already been discussed in the previous subsections. Finally we compute the portfolio loss \(L_{n}\) and return the estimator after incorporating the likelihood ratio.

The following algorithm is for each replication.

figure a

Let \(\mathbb {P}^{*}\) and \(\mathbb {E}^{*}\) denote the IS probability measure and expectation corresponding to this algorithm. The likelihood ratio is given by

$$\begin{aligned} L^{*}=\frac{f_{V}(V)}{f_{V}^{*}(V)}\prod _{j\le |\mathcal {W}|}\left( \frac{p_{j}}{p_{j}^{*}}\right) ^{n_{j}Y_{j}}\left( \frac{1-p_{j}}{1-p_{j}^{*}}\right) ^{n_{j}(1-Y_{j})}. \end{aligned}$$

The following lemma is important in demonstrating the efficiency of our IS Algorithm.

Lemma 5.1

Under the same assumptions as in Theorem 4.1, we have

$$\begin{aligned} \frac{\log \mathbb {E}^{*}\left[ 1_{\{L_{n}>nb\}}L^{*^{2}}\right] }{\log f_{n}}\rightarrow 2,\quad \text {as }n\rightarrow \infty . \end{aligned}$$

In view of Theorem 4.1, which provides the asymptotic estimate of the tail probability \(\mathbb {P}\left( L_{n}>nb\right) \), we conclude in the following theorem that our proposed algorithm is asymptotically optimal.

Theorem 5.1

Under the same assumptions as in Theorem 4.1, we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\log \mathbb {E}^{*}\left[ 1_{\{L_{n}>nb\}}L^{*^{2}}\right] }{\log \mathbb {P}\left( L_{n}>nb\right) }=2. \end{aligned}$$

Thus, the IS estimator (5.11) achieves asymptotic zero variance on the logarithmic scale.

5.3 Importance sampling for expected shortfall

In risk management, one is usually interested in estimating the expected shortfall at a confidence level close to 1, which is again a rare event simulation. In this subsection, we discuss how to apply our proposed IS algorithm to estimate the expected shortfall.

First, note that the expected shortfall can be understood as follows,

$$\begin{aligned} \mathbb {E}\left[ L_{n}|L_{n}>nb\right] =nb+\frac{\mathbb {E}\left[ \left( L_{n}-nb\right) _{+}\right] }{\mathbb {P}\left( L_{n}>nb\right) }. \end{aligned}$$
(5.12)

By involving the unbiasing likelihood ratio \(L^{*}\), (5.12) is equivalent to

$$\begin{aligned} nb+\frac{\mathbb {E}^{*}\left[ \left( L_{n}-nb\right) _{+}L^{*}\right] }{\mathbb {E}^{*}\left[ 1_{\{L_{n}>nb\}}L^{*}\right] }, \end{aligned}$$

where \(\mathbb {E}^{*}\) is the expectation corresponding to the IS algorithm in Sect. 5.2.3. Suppose m i.i.d. samples \((L_{n}^{1},\ldots ,L_{n}^{m})\) are generated under measure \(\mathbb {P}^{*}\). Let \(L_{i} ^{*}\) denote the corresponding likelihood ratio for each sample i. Then the IS estimator of the expected shortfall is given as

$$\begin{aligned} nb+\frac{\sum _{i=1}^{m}(L_{n}^{i}-nb)_{+}L_{i}^{*}}{\sum _{i=1} ^{m}1_{\{L_{n}^{i}>nb\}}L_{i}^{*}}. \end{aligned}$$
(5.13)

Note that the samples generated to estimate the numerator in (5.13) take positive value only when large losses occur. Therefore, one can expect the IS algorithm that works for estimating the probability of the event \(\{L_{n}>nb\}\) should also work well in estimating \(\mathbb {E}[L_{n}-nb]_{+}\). This is later confirmed by our numerical results.

6 Conditional Monte Carlo simulations for large portfolio loss

In this section, we propose another estimation method based on the conditional Monte Carlo approach, which is another variance reduction technique; see, e.g., Asmussen and Kroese (2006) and Asmussen (2018). Our proposed algorithm is motivated by Chan and Kroese (2010), in which the authors derived simple simulation algorithms to estimate the probability of large portfolio losses under the t-copula.

By utilizing the stochastic representation (3.2) for LT-Archimedean and the asymptotic expansions in Theorem 4.1, the rare event \(\{L_{n}>nb\}\) occurs primarily when the random variable V takes large value, while \(\mathbf {R}=(R_{1},\ldots ,R_{n})\) generally has little influence on the occurrence of the rare event. This simply suggests that an approach by integrating out V analytically could lead to a substantial variance reduction.

To proceed, it is useful to define

$$\begin{aligned} O_{i}=\frac{R_{i}}{\phi (1-l_{i}f_{n})},i=1,\ldots ,n. \end{aligned}$$
(6.1)

The individual obligor defaults if \(U_{i}>1-l_{i}f_{n}\), then \(V>O_{i}\). Thus, the portfolio loss in (2.2) can be rewritten as,

$$\begin{aligned} L_{n}=\sum _{i=1}^{n}c_{i}1_{\{V>O_{i}\}}. \end{aligned}$$

We rank \(O_{1},\ldots ,O_{n}\) as \(O_{(1)}\le O_{(2)}\le \cdots \le O_{(n)}\), and let \(c_{(i)}\) denote the associated exposure at default with \(O_{(i)}\). Then, one can check that the event \(\{L_{n}>nb\}\) happens if and only if \(V>O_{(k)}\), where \(k=\min \{l:\sum _{i=1}^{l}c_{(i)}>nb\}\). Particularly, if \(c_{i}\equiv c\) for all \(i=1,\ldots ,n\), then \(k=\lceil nb/c\rceil \). Now conditional on \(\mathbf {R}\), we have

$$\begin{aligned} \mathbb {P}\left( L_{n}>nb|\mathbf {R}\right) =\mathbb {P}\left( V>O_{(k)}|\mathbf {R}\right) :=S(\mathbf {R}). \end{aligned}$$
(6.2)

We summarize our proposed conditional Monte Carlo algorithm, which is labelled as CondMC, in the following algorithm.

figure b

We now show that the conditional Monte Carlo estimator has bounded relative error, a stronger notion of asymptotic optimality than that for the IS estimator (5.11) established in Theorem 5.1.

Lemma 6.1

Under the same assumptions as in Theorem 4.1 except that \(\frac{1}{n}=O(f_{n})\), we have

$$\begin{aligned} \limsup _{n\rightarrow \infty }\frac{\mathbb {E}\left[ S^{2}(\mathbf {R})\right] }{f_{n}^{2}}<\infty . \end{aligned}$$

In view of Theorem 4.1, we immediately obtain the following theorem concerning the algorithm efficiency.

Theorem 6.1

Under the same assumptions as in Lemma 6.1, we have

$$\begin{aligned} \limsup _{n\rightarrow \infty }\frac{\sqrt{\mathbb {E}\left[ S^{2}(\mathbf {R} )\right] }}{\mathbb {P}\left( L_{n}>nb\right) }<\infty . \end{aligned}$$

In other words, the conditional Monte Carlo estimator (6.2) has bounded relative error.

7 Numerical results

In this section, we assess the relative performance of our proposed algorithms via simulations, and investigate their sensitivity to \(\alpha \) (heavy tailedness of the systematic risk factor V), n (size of the portfolio) and b (a pre-fixed number that controls the level of the proportion of obligors who default). The numerical results indicate that our proposed algorithms, especially the CondMC algorithm, provide considerable variance reductions when compared to naive MC simulations. This supports our theoretical result that our proposed algorithms are all asymptotically optimal.

Due to the assumption that \(\phi (1-\frac{1}{\cdot })\in \mathrm {RV}_{-\alpha }\) with \(\alpha >1\), we consider the Gumbel copula in our numerical experiment. The generator function of Gumbel copula is \(\phi (t)=(-\ln (t))^{\alpha }\) with \(\alpha >1\). By varying \(\alpha \), the Gumbel copula covers from independence (\(\alpha \rightarrow 1\)) to comonotonicity (\(\alpha \rightarrow \infty \)).

In all the experiments below, only homogeneous portfolios are considered. However, it should be emphasized that the performance of our algorithms is not affected for inhomogeneous portfolio. This is asserted by Theorems 5.1 and 6.1 that have been proved under a general setting for both homogeneous and inhomogeneous portfolios. To evaluate the accuracy of the estimators, for each set of specified parameters, we generate 50,000 samples for our proposed algorithms, estimate the probability of large portfolio loss, and provide the relative error (in \(\%\)), which is defined as the ratio of the estimator’s standard deviation to the estimator. More precisely, if \(\hat{p}\) is an unbiased estimator of \(\mathbb {P}\left( L_{n}>nb\right) \), its relative error is defied as \(\sqrt{\mathrm {Var} (\hat{p})}/\hat{p}\). We also report the variance reduction achieved by our proposed algorithms compared with naive simulation. For naive simulation, it is highly possible that the rare event would not be observed in any sample path with only 50,000 samples. Therefore, variance under naive simulation is estimated indirectly by exploiting the fact that variance for Bernoulli(p) equals \(p(1-p)\).

Table 1 provides a first comparison of our IS algorithm and CondMC algorithm with naive simulation as \(\alpha \) changes. The chosen model parameter values are \(n=500\), \(f_{n}=1/n\), \(b=0.8\), \(l_{i}=0.5\) and \(c_{i}=1\) for each i. As can be concluded from Table 1, both algorithms outperform the naive simulation, especially when \(\alpha \) is small, obligors have weaker dependence and the probability of large portfolio losses becomes smaller. Relative to the naive MC method, the variance reduction attained by the IS estimator is in the order of hundreds and thousands while the CondMC estimator is in the order of millions. This demonstrates that CondMC estimator significantly outperforms IS estimator.

Table 1 Performance of the proposed algorithms for Gumbel copula under different values of \(\alpha \)

In Table 2, we perform the same comparison by varying b while keeping \(\alpha \) fixed at 1.5. Under the setting that \(c=1\), the parameter b controls the level of the proportion of obligors that default. As is clear from the table, when b increases, the estimated probability decreases and the variance reduction becomes larger.

Table 2 Performance of the proposed algorithms for Gumbel copula under different values of b

Table 3 provides the relative error and variance reduction of our algorithms compared with naive simulation as the number of obligors changes. All other parameters are identical to previous experiments by fixing \(\alpha =1.5\) and \(b=0.8\). In the last column, we also derive the sharp asymptotic for the desired probability of large portfolio loss based on the expression in (4.4). Note that as n increases, both the accuracy of the sharp asymptotic and the reduction in variance improve.

Table 3 Performance of the proposed algorithms for Gumbel copula together with the sharp asymptotic derived in Theorem 4.1 under different values of n

In Table 4, we study the accuracy of the sharp asymptotic for expected shortfall as the number of obligors increases. Model parameters are taken to be \(f_{n}=1/n\), \(\alpha =1.5\), \(b=0.8\), \(l_{i}=0.5\) and \(c=1\) for each i. For estimating expected shortfall, we simply use all the 50,000 sample paths generated under the proposed IS measure, and then consider those with portfolio loss exceeding nb. As shown in Table 4, the accuracy is quite high even for small values of n. This is mainly due to the fact that the hazard rate density is chosen based on the asymptotic result in Theorem 4.1. Discrepancy here is measured as the percentage difference between the ES estimated via importance sampling and the sharp asymptotic in (4.7).

Table 4 The expected shortfall and its sharp asymptotic derived in Theorem 4.2 under different values of n

To conclude the section, we note again that to the best of our knowledge, this is the first paper that adopts the Archimedean copula in the analysis of the large credit portfolio loss and proposes the corresponding importance sampling and conditional Monte Carlo estimators. On the other hand, the importance sampling estimators of Bassamboo et al. (2008) and conditional Monte Carlo estimators of Chan and Kroese (2010) assume that the dependence structure of obligors are modeled with a t-copula. Because of the difference in the underlying assumed dependence structure, the estimators considered in this paper are not directly comparable to the corresponding estimators in those two papers. Nevertheless, by comparing our simulation results to theirs, it is reassuring that even under very different dependence structure, significant variance reduction, especially for the estimator based on the conditional Monte Carlo method, can be expected. Furthermore, regardless of the assumed dependence structure, all estimators exhibit consistent behavior in the sense that they perform better for weaker dependence structures and larger portfolio sizes.

8 Conclusion

In this paper, we consider an Archimedean copula-based model for measuring portfolio credit risk. The analytic expressions of the probability of such portfolio incurs large losses is not available and directly applying naive MC simulation on these rare events are also not efficient. We first derive sharp asymptotic expansions to study the probability of large portfolio losses and the expected shortfall of the losses. Using this as a stepping stone, we develop two efficient algorithms to estimate the risk of a credit portfolio via simulation. The first one is a two-step full IS algorithm, which can be used to estimate both probability and expected shortfall of portfolio loss. We show that the proposed estimator is logarithmically efficient. The second algorithm is based on the conditional Monte Carlo simulation, which can be used to estimate the probability of portfolio loss. This estimator is shown to have bounded relative error. Through extensive simulation studies, both algorithms, especially the second one, show significant variance reductions when compared to naive MC simulations.