1 Introduction

The essence of control in the financial world relies on how well the used models mimic reality and on the precision of the computational methods used. Monte Carlo simulation is often one of the best alternatives as it leads to confidence intervals on the pin-point values. The subject of this paper is to develop a simulation methodology to quantify credit risk, risks related with the obligors’ default. For quantification of credit risk, two of the credit risk models that are used intensively in the industry are Credit Risk Metrics of JP Morgan and Credit Risk\(^{+}\) of Credit Suisse Financial Products (1997).

Although credit risk simulations seem to be quite easy, as in each repetition we sum the exposures of the defaulted obligors, rare-event settings are problematic as the number of repetitions should increase enormously to get meaningful estimates of the risk. This is not desired as it degrades the efficiency of the simulation. Efficient Monte Carlo methods for the Gaussian copula model of Gupton et al. (1997) (see, e.g., Glasserman and Li 2005; Sak and Hörmann 2012), for the t-copula model (see Bassamboo et al. 2008; Chan and Kroese 2010; Sak 2010) and for the mixed Poisson model (see Glasserman and Li 2003) of credit risk were proposed in the literature. In this paper, we propose a new method to estimate tail loss probability and conditional excess for the Bernoulli mixture model of credit risk.

Our main contribution is to present a new algorithm to measure credit risk under the general Bernoulli mixture models. These models are credited as more convenient for statistical fitting purposes in McNeil et al. (2005) and all credit risk models proposed in literature can be represented as Bernoulli mixture models. We could not find any simulation method for the general Bernoulli mixture models in the literature, and it is not obvious how to find an equivalent latent variable model for a given Bernoulli mixture model.

In our general algorithm, we implement a combination of stratification and importance sampling for the simulation of the random variables that introduce the dependence across obligors. We use an importance sampling strategy based on the cross-entropy method and increase the probability of rare defaults. The subsequent stratification enhances the variance reduction by optimal sample allocation throughout strata. The remaining source of the variance is the simulation of obligors’ defaults, for which we employ inner replications using the geometric shortcut method (see Sak and Hörmann 2012). The new algorithm yields reasonable variance reduction over the existing methods designed for specific risk models.

Another important contribution of this paper is the efficient estimation of conditional excess for the total loss of a credit portfolio. Conditional excess has a ratio estimator, i.e., the ratio of two different estimators. Therefore, when the stratification is implemented, the optimal sample allocation sizes that minimize the variance of the conditional excess must be derived differently than the ones which minimize the variance of a regular estimator (see Başoğlu and Hörmann 2014). To our knowledge, we are the first to use a stratification algorithm for the estimation of conditional excess for credit portfolios. Most papers in literature just consider implementing importance sampling for this problem and the proposed importance sampling density is usually the same as the one used for tail loss probability estimation (see, e.g., Glasserman 2005). In our algorithm, stratification allows us to derive formulas for sample allocations that minimize the variance of the conditional excess estimate. These formulas are useful for any simulation applying stratified ratio estimators.

The paper is organized as follows: Sect. 2 gives an overview of the Bernoulli mixture models. In Sect. 3, we explain the implementation details of inner replications of geometric shortcut, importance sampling based on cross-entropy, and stratification, and how these methods can be combined in simulating tail loss probabilities. Section 4 extends the use of the same methodology to conditional excess simulation and explains the stratification of the ratio estimator. Finally, in Sect. 5, we present our numerical results for some credit portfolio examples whereas Sect. 6 provides final comments.

Note that, in this paper, vectors and matrices are set in bold to enhance readability.

2 Bernoulli mixture model

Before discussing the model, we give the notation used throughout the paper. There are J obligors in the portfolio and \(Y_j\) denotes the Bernoulli random variable for the jth obligor (1 if jth obligor defaults, 0 otherwise). The marginal probability that the jth obligor defaults is \(p_j\) and \(c_j\) denotes the loss resulting from the default of the jth obligor. The total loss of the portfolio is calculated as \(L=\sum _{j=1}^{J}c_{j}Y_{j}\). We only consider a fixed horizon, over which we are interested in the values of tail loss probability \(y=P\left( L> \tau \right) = E\left[ \mathbbm {1}_{\left\{ L>\tau \right\} }\right] \) and conditional excess \(r=E[L|L > \tau ]\) for a fixed threshold value \(\tau \).

Following McNeil et al. (2005); given the D-dimensional (\(D<J\)) random vector \({\varvec{\Psi }} = \left( \varPsi _1,\ldots ,\varPsi _D\right) '\), the random vector \(\mathbf {Y}=\left( Y_1,\ldots ,Y_J\right) '\) follows a Bernoulli mixture model if there are functions \(p_j: \mathbb {R} ^D\rightarrow \left[ 0,1\right] \), \(j=1,\ldots ,J\), such that, conditional on \({\varvec{\Psi }}\), the components of \(\mathbf {Y}=\left( Y_1,\ldots ,Y_J\right) '\) are independent Bernoulli random variables. We define \(p_j\left( {\varvec{\Psi }}\right) \) as conditional default probabilities for a given \({\varvec{\Psi }}\) vector: \(p_j\left( {\varvec{\Psi }}\right) = P\left( Y_j = 1| {\varvec{\Psi }}\right) \). For \(\left( \varepsilon _1,\ldots ,\varepsilon _J\right) '\) in \(\left\{ 0,1\right\} ^J\),

$$\begin{aligned} P\left( Y_1 = \varepsilon _1,\ldots ,Y_J = \varepsilon _J|{\varvec{\Psi }}\right) = \prod \nolimits _{j=1}^J p_{j}\left( {\varvec{\Psi }}\right) ^{\varepsilon _j} \left( 1-p_j\left( {\varvec{\Psi }}\right) \right) ^{\left( 1-\varepsilon _j\right) }. \end{aligned}$$
(1)

The unconditional distribution of \(\mathbf {Y}\) is found by integrating (1) over the distribution of \({\varvec{\Psi }}\).

We consider three specific examples of this general model to test our method in Sect. 5. The first one is CreditRisk\(^+\) model which is represented as a Bernoulli mixture model (see, Frey and McNeil 2002). \({\varvec{\Psi }} = \left( \varPsi _1,\ldots ,\varPsi _D\right) '\) are independent Gamma random variables with shape parameters \(\alpha _d=\sigma _d^{-2}\) and scale parameters \(\beta _d = \sigma _d^2\) for \(d=1,\ldots ,D\).

Furthermore, the conditional default probabilities for a given \({\varvec{\Psi }}\) vector are

$$\begin{aligned} p_j\left( {\varvec{\Psi }}\right) =1-\exp \left( -a_{j0}-a_{j1}\varPsi _{1}-\ldots -a_{jD}\varPsi _{D}\right) , \quad j=1,\ldots ,J, \end{aligned}$$
(2)

where \(a_{j0},\ldots ,a_{jD}\) are positive coefficients.

The second example is the Gaussian copula model which is a multi-dimensional probit-normal mixing distribution (see McNeil et al. 2005, p. 354). The Gaussian copula model introduces a multivariate normal vector \(\mathbf {Z}=\left( Z_{1},\ldots ,Z_{J}\right) '\) of latent variables to obtain dependence across obligors. The relationship between the default indicators and the latent variables is described by \(Y_{j}=\mathbbm {1}_{\left\{ Z_{j}>z_{j}\right\} }\), \(j=1,\ldots ,J,\) where \(Z_{j}\) follows standard normal distribution, \(z_{j}=\varPhi ^{-1}\left( 1-p_{j}\right) \) and \(\varPhi ^{-1}\) denotes the inverse of the cumulative distribution function (CDF) of the standard normal distribution. Obviously, the choice of the threshold value, \(z_{j}\), implies that \(P\left( Y_{j}=1\right) =p_{j}\).

The correlations among the \(Z_{j}\) values are modeled by defining

$$\begin{aligned} Z_{j}=b_{j}\epsilon _{j}+a_{j1}\varPsi _{1}+\ldots +a_{jD}\varPsi _{D}, \quad j=1,\ldots ,J, \end{aligned}$$
(3)

where \(\epsilon _{j}\) and \(\varPsi _{1},\ldots ,\varPsi _{D}\) are independent standard normal random variables with \(b_{j}^{2}+a_{j1}^{2}+\ldots +a_{jD}^{2}=1.\) While, \(\varPsi _{1},\ldots ,\varPsi _{D}\) are systematic risk factors affecting all obligors, \(\epsilon _{j}\) is the idiosyncratic risk factor affecting only obligor j. Furthermore, \(a_{j1},\ldots ,a_{jD}\) are constant and non-negative factor loadings, assumed to be known. Thus, given the vector \({\varvec{\Psi }}=\left( \varPsi _{1},\ldots ,\varPsi _{D}\right) '\), we have the conditional default probabilities

$$\begin{aligned} p_{j}\left( {\varvec{\Psi }}\right) = P\left( Y_{j} = 1|{\varvec{\Psi }}\right) = \varPhi \left( \left( {\mathbf {a}_{j}{\varvec{\Psi }}+\varPhi ^{-1}(p_{j})}\right) {b_{j}^{-1}}\right) , \quad j=1,\ldots ,J, \end{aligned}$$
(4)

where \(\mathbf {a}_{j} = \left( a_{j1},\ldots ,a_{jD}\right) \).

The last example is the t-copula model in which latent variables follow multivariate t-distribution instead of multivariate normal distribution. The model that has been widely used (see, e.g., Bassamboo et al. 2008; Kang and Shahabuddin 2005) is

$$\begin{aligned} {T_j} = \left( {{b_j}{\epsilon _j} + \sum \nolimits _{d = 1}^{D - 1} {{a_{jd}}{\varPsi _d}} } \right) {\left( {{{{\varPsi _D}} \big / \eta }} \right) ^{ - {1/2}}},\quad j=1,\ldots ,J, \end{aligned}$$

where the definitions of \(\varPsi _{1},\ldots ,\varPsi _{D-1}\), \(\epsilon _{j}\), \(a_{j1},\ldots ,a_{j(D-1)}\) and \(b_{j}\) are the same as in (3), and \(\varPsi _{D}\) denotes a chi-square random variable with \(\eta \) degrees of freedom that is independent of \(\varPsi _{1},\ldots ,\varPsi _{D-1}\) and \(\epsilon _{j}\). The relationship between the default indicators and the latent variables is described by \(Y_{j}=\mathbbm {1}_{\{T_{j}>t_{j}\}}\). Since \(T_{j}\) is a t-distributed random variable, to preserve the marginal default probabilities, we select \(t_{j}=F_{\eta }^{-1}(1-p_j)\) where \(F_{\eta }\) denotes the CDF of the t-distribution with \(\eta \) degrees of freedom. Finally, given the vector \({\varvec{\Psi }}=(\varPsi _{1},\ldots ,\varPsi _{D})'\), we have the conditional default probabilities

$$\begin{aligned} {p_j}\left( {\varvec{\Psi }}\right) = P\left( {{Y_i} = 1|{\varvec{\Psi }} } \right) =\varPhi \left( {\frac{{{{\tilde{\mathbf {a}}}_j}\tilde{{\varvec{\Psi }}} -\sqrt{{{{\varPsi _D}} \big /\eta }} F_\eta ^{ - 1}\left( {1 - {p_j}} \right) }}{{{b_j}}}} \right) ,\quad j = 1, \ldots ,J, \end{aligned}$$
(5)

where \(\tilde{\mathbf {a}}_j = (a_{j1},\ldots ,a_{j(D-1)})\) and \(\tilde{{\varvec{\Psi }}}=(\varPsi _{1},\ldots ,\varPsi _{D-1})'\).

3 Efficient simulation of tail loss probabilities

In order to simulate tail loss probabilities under the Bernoulli mixture model, in each replication of the simulation algorithm, we need to generate the input vector \({\varvec{\Psi }}\) with density \(f({\varvec{\Psi }};\mathbf {u})\) where \(\mathbf {u}\) denotes the set of parameters of the specific model. Then, we calculate the conditional default probabilities \(p_j({\varvec{\Psi }})\), \(j=1,\ldots ,J\) again for the specific model. Once, \(p_j({\varvec{\Psi }})\) is calculated, it can be used to generate the default indicator \(Y_{j}\) of the jth obligor. Finally, we compute the total loss L of the portfolio and the response function \(\mathbbm {1} _{\{L>\tau \}}\) as an estimator of the tail loss probability. Before giving details on the variance reduction methods, we describe the naive simulation (NV) method in Algorithm 1 as it is a simpler guide on how Monte Carlo can be applied for quantifying credit risk measures. Note that, this algorithm and all other algorithms given in this paper are designed to estimate both tail loss probability and conditional excess.

figure a

The random variables \({\varvec{\Psi }}\) and \(Y_j\), \(j=1,\ldots ,J\) are the only sources of randomness of the tail loss probability estimates. We call the generation of \(Y_j\), \(j=1,\ldots ,J\) values for a given \({\varvec{\Psi }}\) “inner simulation” and the generation of the input vector \({\varvec{\Psi }}\) “outer simulation”. In order to increase the efficiency of the tail loss probability estimates, we need to focus on these two parts separately.

To reduce the variance coming from the inner simulation, one can use exponential twisting (see, e.g., Glasserman and Li 2005). However, in a recent paper, Sak and Hörmann (2012) propose a more efficient (see Sect. 5 for the definition of the efficiency measure) method called geometric shortcut (GS) algorithm, which we also use in this paper to reduce the inner variance. Implementing GS alone is not sufficient to decrease the variance for highly dependent obligors. Therefore, we employ importance sampling based on the cross-entropy (CE) (see chapter 8 of Rubinstein and Kroese 2008) for decreasing the variance coming from the outer simulation. To enhance the efficiency of CE, we combine it with stratification. In the following subsections, we explain the implementation details of these methods under the general Bernoulli mixture models.

3.1 Inner replications using the geometric shortcut

The geometric shortcut idea was introduced in Sect. 3 of Sak and Hörmann (2012) to simulate tail loss probability \(P(L>\tau )\) and conditional excess \(E[L|L>\tau ]\) for independent obligors. The idea is simply to generate instead of many Bernoulli random variables, a geometric random variate that is used as index of the next default in the Gaussian copula framework. To be able to use this idea, the number of repetitions of the simulation of the defaults is increased from 1 to \(N_\mathrm{in}>1\) (\(N_\mathrm{in}\) denotes the number of inner replications), thus, the generated \({\varvec{\Psi }}\) vector is not used once but \(N_\mathrm{in}\) times by replacing the generation of Bernoulli random variables by geometric random variables. They argue that the optimum number of \(N_\mathrm{in}\) depends on the contribution of the inner and the outer repetitions on the variance. Although, they give approximate analytical results on \(N_\mathrm{in}\), their final suggestion is to use \(N_\mathrm{in}=\min ( \lfloor 1/\bar{p}({\varvec{\Psi }}) \rfloor ,J)\) where \(\bar{p}({\varvec{\Psi }})\) denotes the average value of the default probabilities, \(p_j({\varvec{\Psi }})\), \(j=1,\ldots ,J\) for the current \({\varvec{\Psi }}\) vector. Figure 1 provides a visualization of the geometric shortcut method in the flowchart of the full algorithm in Sect. 3.3. For the details of the GS methodology, we refer to Sak and Hörmann (2012). The same idea was applied successfully to the t-copula model in Sak (2010).

3.2 Importance sampling based on cross-entropy method

We consider changing one of the parameters of \(\varPsi _d\), \(d=1,\ldots ,D\) to increase \(p_j({\varvec{\Psi }})\) values for the importance sampling (IS). This increases the observed frequency of defaults in the simulation. Most of the papers in quantitative risk management literature consider changing the scale parameter (such as the scale parameter of the gamma distribution) and/or the shift parameter (mean of normal distribution) of the distributions at hand (see, e.g., Glasserman and Li 2005; Sak et al. 2010). It is our numerical experience that this is a sensible approach as changing the parameters other than these in importance sampling increases the variance of the likelihood ratios in higher dimensions. For a more theoretical approach to choose IS density, we refer the reader to Geweke (1989).

Suppose that \(f({\varvec{\Psi }};\mathbf {u})\) is the density of the original vector \({\varvec{\Psi }}\). Here, \(\mathbf {u} \in \mathbb {R}^D\) denotes the vector of original parameters which are subject to change in IS. Let \(\mathbf {v} \in \mathbb {R}^D\) be the vector of new parameters for the IS distribution. The CE method aims to find the optimal IS parameter, \(\mathbf {v}^*\), by solving the maximization problem (for details see Rubinstein and Kroese 2008, pp. 136–141):

$$\begin{aligned} \mathop {\max }\limits _{\mathbf {v}} \int \mathbbm {1}_{\{L({\varvec{\Psi }})>\tau \}} \log (f({\varvec{\Psi }};\mathbf {v})) f({\varvec{\Psi }};\mathbf {u}) d{\varvec{\Psi }}. \end{aligned}$$
(6)

The problem in (6) is a stochastic optimization problem as the indicator function is uncertain for a given \({\varvec{\Psi }}\), i.e. it depends also on the Bernoulli random variables \(Y_1,\ldots ,Y_J\). Therefore, we replace \(\mathbbm {1}_{\{L({\varvec{\Psi }})>\tau \}}\) with \(E\left[ {{\mathbbm {1}_{\left\{ {L\left( {\varvec{\Psi }} \right)> \tau } \right\} }}|{\varvec{\Psi }} } \right] = P\left( {L\left( {\varvec{\Psi }} \right) > \tau |{\varvec{\Psi }} } \right) \).

The solution of the problem in (6) can be estimated by using Monte-Carlo simulation:

$$\begin{aligned} \max _{\mathbf {v}} \sum \nolimits _{k=1}^M P( {L( {\varvec{\Psi }}^{(k)} ) > \tau } | {\varvec{\Psi }}^{(k)}) \log (f({\varvec{\Psi }}^{(k)};\mathbf {v})), \end{aligned}$$
(7)

where M is the number of replications in the CE method and \({\varvec{\Psi }}^{(1)},\ldots ,{\varvec{\Psi }}^{(M)}\) are generated from \(f(.;\mathbf {u})\), independently.

If the distribution of \({\varvec{\Psi }}\) is of exponential family, then a closed-form solution for (7) is possible (see Appendix A.3 of Rubinstein and Kroese 2008). Not to lose the generality of the method, we do not assume any distribution for \({\varvec{\Psi }}\) here.

Finding the exact value of \(P( {L( {\varvec{\Psi }}^{(k)} ) > \tau } | {\varvec{\Psi }}^{(k)})\) is possible but generally difficult, since the number of obligors in the portfolio can be large. However we can use one of the approximations listed in Glasserman and Li (2005). Here we use the simplest approach based on the normal approximation. Since,

$$\begin{aligned} E[L|{\varvec{\Psi }}] = \sum \nolimits _{j=1}^{J}c_{j}p_{j}({\varvec{\Psi }}) \end{aligned}$$

and

$$\begin{aligned} Var[L|{\varvec{\Psi }}] = \sum \nolimits _{j=1}^{J}c_{j}^2\left[ p_{j}({\varvec{\Psi }})-p_{j}({\varvec{\Psi }})^2\right] , \end{aligned}$$

we obtain the approximation:

$$\begin{aligned} {P( {L( {\varvec{\Psi }}^{(k)} ) > \tau } | {\varvec{\Psi }}^{(k)})} \approx 1 - \varPhi \left( {\frac{{\tau - \sum \nolimits _{j = 1}^J {{c_j}{p_j}\left( {{{\varvec{{\varPsi }}}^{\left( k \right) }}} \right) } }}{{\sqrt{\sum \nolimits _{j = 1}^J {c_j^2\left[ {{p_j}\left( {{{\varvec{\Psi }} ^{\left( k \right) }}} \right) - {p_j}{{\left( {{{\varvec{{\varPsi }}}^{\left( k \right) }}} \right) }^2}} \right] } } }}} \right) . \end{aligned}$$
(8)

Instead of searching for the optimal IS parameters \(\mathbf {v}=\left( v_1,\ldots ,v_D\right) '\) independently, we combine identical elements of \({\varvec{\Psi }}\), say \(\left( \varPsi _1,\ldots ,\varPsi _{D'}\right) '\), in a group such that

$$\begin{aligned} v_d = \sum \nolimits _{j=1}^J a_{jd}c_j\theta , \quad d=1,\ldots ,D', \end{aligned}$$
(9)

and optimize for \(\theta \) given that we know \(v_{D'+1},\ldots , v_D\). This is simply related with finding the direction of loss function in \(\varPsi _d\), \(d=1,\ldots ,D'\), which is:

$$\begin{aligned}&{{{{\left( {\sum \nolimits _{j = 1}^J {{a_{j1}}{c_j}} , \ldots ,\sum \nolimits _{j = 1}^J {{a_{jD'}}{c_j}} } \right) }^\prime }}\bigg / {\left\| {{{\left( {\sum \nolimits _{j = 1}^J {{a_{j1}}{c_j}} , \ldots ,\sum \nolimits _{j = 1}^J {{a_{jD'}}{c_j}} } \right) }^\prime }} \right\| }}. \end{aligned}$$

The optimal value for \(\theta \) determines the IS parameters for \(\left( \varPsi _1,\ldots ,\varPsi _{D'}\right) '\) along this direction. This reflects the importance of \({\varvec{\Psi }}\) on the final probabilities \(p_j\left( {\varvec{\Psi }}\right) \), \(j=1,\ldots ,J\) and decreases the dimension of the optimization problem in (7) to \(D-D'+1\). We can easily decide on how to construct the group by looking at the conditionally independent default probability formula for the model at hand. For example, for the Credit Risk\(^{+}\) and Gaussian copula models, the structure of (2) and (4) suggests that we can combine all of the elements of \({\varvec{\Psi }}\) and thus choose \(D'=D\) (dimension of the problem is one in (7)). However, for (5) it is appropriate to combine \(\varPsi _{1},\ldots ,\varPsi _{D-1}\) thus \(D'=D-1\) (dimension of the problem is two in (7)). Note that, \(\varPsi _D\), a chi-squared random variable with \(\eta \) degrees of freedom, is a Gamma random variable with shape and scale parameters equal to \({\eta \big /2}\) and 2 respectively.

Using the information that \(\varPsi _d\), \(d=1,\ldots ,D\) are independent in our models, (7) can be written as

$$\begin{aligned} \begin{array}{*{20}{c}} {\mathop {\max }\limits _{\theta ,{v_{D' + 1}}, \ldots ,{v_D}} } &{} {\sum \nolimits _{k = 1}^M {{P( {L( {\varvec{\Psi }}^{(k)} ) > \tau } | {\varvec{\Psi }}^{(k)})}\left[ {\sum \nolimits _{d = 1}^{D'} {\log \left( {f\left( {\varPsi _d^{\left( k \right) };\sum \nolimits _{j = 1}^J {{a_{jd}}{c_j}\theta } } \right) } \right) } } \right. } } \\ {} &{} {\left. {\qquad + \sum \nolimits _{d = D' + 1}^D {\log \left( {f\left( {\varPsi _d^{\left( k \right) };{v_d}} \right) } \right) } } \right] }, \\ \end{array} \end{aligned}$$
(10)

where \({P( {L( {\varvec{\Psi }}^{(k)} ) > \tau } | {\varvec{\Psi }}^{(k)})}\) is given in (8). The full algorithm to compute \(\mathbf {v}\) is given in Algorithm 2.

figure b

The likelihood ratio under the computed IS parameter \(\mathbf {v}\) is calculated using

$$\begin{aligned} \rho ({\varvec{\Psi }}^{(k)};\mathbf {u},\mathbf {v}) = \prod \nolimits _{d=1}^D {f( \varPsi ^{(k)}_d; u_d)}{\left( f( \varPsi ^{(k)}_d; v_d)\right) ^{-1}}. \end{aligned}$$
(11)

The combination of IS based on cross-entropy and the inner replications using the geometric shortcut (CEGS) is presented as Algorithm 3.

figure c

3.3 Stratifying the random input

Stratified sampling aims to find the Monte Carlo estimate by conditioning on disjoint and covering subsets (or strata) of the random input domain. Suppose we intend to estimate \(E[q({\varvec{\Psi }})]\) with the property that \({\varvec{\Psi }} \in \mathbb {R}^D\), \(q:\mathbb {R}^D \rightarrow \mathbb {R}\), and \(E\left[ q\left( {\varvec{\Psi }}\right) ^2\right] < \infty \). We define the random vector \(\mathbf {S} \left( {\varvec{\Psi }}\right) \), as a surjective mapping from \(\mathbb {R}^D\) onto \(\mathbb {R}^\delta \) where \(\delta \le D\). Let \(\xi _1,\ldots ,\xi _I\) denote equiprobable strata of \(\mathbb {R}^\delta \). Then, the expectation can be calculated using conditional expectations,

$$\begin{aligned} E\left[ {q\left( {\varvec{\Psi }} \right) } \right] = {I^{ - 1}}\sum \nolimits _{i = 1}^I {E\left[ {q\left( {\varvec{{\varPsi }}} \right) |\mathbf {S}\left( {\varvec{\Psi }}\right) \in {\xi _i}} \right] } = {I^{ - 1}}\sum \nolimits _{i = 1}^I {E\left[ {q\left( {{{\varvec{\Psi }} _i}} \right) } \right] }, \end{aligned}$$

where \(I^{-1}\) is the probability of each equiprobable strata and \({\varvec{\Psi }}_i\) follows the distribution of \({\varvec{\Psi }}\) conditional on \(\mathbf {S}\left( {\varvec{\Psi }}\right) \in \xi _i\). For \(E\left[ q\left( {\varvec{\Psi }}\right) \right] \),

$$\begin{aligned} \hat{y} = I^{-1}\sum \nolimits _{i = 1}^I {{{\hat{y} }_i}} = I^{-1}\sum \nolimits _{i = 1}^I {{N_i^{-1}}\sum \nolimits _{k = 1}^{{N_i}} {q\left( {{{\varvec{\Psi }} _i^{\left( k \right) }}} \right) } } \end{aligned}$$

is the stratified estimator, where \(N_i\) is the size of the sample drawn from stratum i, \({{{\varvec{\Psi }} _i^{\left( k \right) }}}\) is the kth drawing of the random input \({\varvec{\Psi }}\) conditional on \(\mathbf {S}\left( {\varvec{\Psi }}\right) \in \xi _i\), and \(\hat{y}_i\) is the sample mean in stratum i. The variance of the stratified estimator is:

$$\begin{aligned} Var\left[ {\hat{y} } \right] = I ^{-2}\sum \nolimits _{i = 1}^I {{{{N_i^{-1} }}}\left( s_i ^y\right) ^2}\, , \end{aligned}$$
(12)

where \(s_i ^y\) is the sample standard deviation of the responses \({q\left( {{{\varvec{\Psi }} _i^{\left( k \right) }}} \right) }\), \(k=1,\ldots ,N_i\).

The variance given in (12) can be minimized by allocating the sample sizes, \(N_i\), proportional to conditional standard deviations \(s_i^y\) (see, e.g., Glasserman 2004). As \(N_i\) values must be integers, we simply use the formula:

$$\begin{aligned} {N_i} = \left\lceil {{{s_i^yN} \bigg / {\sum \nolimits _{l = 1}^I {s_l^y} }}} \right\rceil ,\quad i = 1, \ldots ,I. \end{aligned}$$
(13)

Unfortunately, we have no prior information on \(s_i^y\), \(i=1,\ldots ,I\). A practical solution is to use the estimates of conditional standard deviations obtained through a pilot study with \(N_p\) replications. We suggest selecting a sufficiently large \(N_p\) to assure the normality of the conditional estimates, \(\hat{y}_i\), and to obtain accurate estimates for the optimal allocation sizes. The remaining \(N-N_p\) replications, then, can be used in the main run according to the allocation rule in (13). In the end of the simulation, the sample generated in the pilot study is combined with the sample generated in the main run. By this approach, no drawings are wasted.

The variance of the stratified estimator also depends on the random vector \(\mathbf {S} ({\varvec{\Psi }})\). By using the conditional variance formula, the total variance of \(q({\varvec{\Psi }})\) can be decomposed into two parts:

$$\begin{aligned} Var\left[ {q\left( {\varvec{\Psi }} \right) } \right] = Var\left[ {E\left[ {q\left( {\varvec{\Psi }} \right) |\mathbf {S}\left( {\varvec{\Psi }} \right) } \right] } \right] + E\left[ {Var\left[ {q\left( {\varvec{\Psi }} \right) |\mathbf {S}\left( {\varvec{\Psi }} \right) } \right] } \right] . \end{aligned}$$
(14)

The variance of the stratified estimator in (12) is only depending on the conditional variances. Thus, it is only influenced by the second component of (14). Therefore, we try to choose \(\mathbf {S}({\varvec{\Psi }})\) such that it maximizes, as far as possible, the first component of (14). In other words, we should choose \(\mathbf {S}({\varvec{\Psi }})\) such that the variance between the conditional estimates of each stratum is large. On the other hand, \(\mathbf {S}({\varvec{\Psi }})\) should be computationally tractable, so that we can generate the conditional vector \({\varvec{\Psi }}_i\) easily.

The general idea is to stratify normally distributed elements of \({\varvec{\Psi }}\) along the IS shift and we suggest stratifying the maximum of the remaining elements. For example, for the CreditRisk\(^+\) model, we choose \(\mathbf {S}\left( {\varvec{\Psi }} \right) = \max \left\{ {{\varPsi _1}, \ldots ,{\varPsi _D}} \right\} \). For the Gaussian copula model, we stratify the \({\varvec{\Psi }}\) vector along the direction of the IS parameters, \(\mathbf {v}\). So, \(\mathbf {S}\left( {\varvec{\Psi }} \right) = \mathbf {v} '{\varvec{\Psi }}\). Finally, for the t-copula model, we stratify the \(\tilde{{\varvec{\Psi }}}\) vector along the direction of the IS parameters, \(\tilde{\mathbf {v}}=\left( v_1,\ldots ,v_{D-1}\right) '\), and we also stratify \(\varPsi _D\). So, we choose \(\mathbf {S}\left( {\varvec{\Psi }} \right) = {\left( \tilde{\mathbf {v}} '\tilde{{\varvec{\Psi }}},\varPsi _D\right) ^\prime }\).

The stratified version of the CEGS method (called STCEGS) is presented as Algorithm 4. We also provide a flowchart diagram to illustrate the steps of the STCEGS method in Fig. 1.

figure d
Fig. 1
figure 1

The flowchart of STCEGS method provided in Algorithm 4

4 Conditional excess simulation

We described our new algorithm for tail loss probability computation in Sect. 3. This section explains how a similar methodology can be used for the computation of conditional excess simulation.

If we assume that \(P(L> \tau ) > 0\), the conditional excess \(r = E[L|L> \tau ]\) can be written as \(r = {x \big /y}\) where \(x=E\left[ L \, \mathbbm {1}_{\{L> \tau \}}\right] \) and \(y=P(L> \tau )\). This ratio can be estimated as \(\hat{r}={\hat{x}}/\hat{y}\) with the approximate variance:

$$\begin{aligned} Var\left[ {\hat{r}} \right] \approx {x^2}{y^{ - 4}}Var\left[ {\hat{y}} \right] - 2x{y^{ - 3}}Cov\left[ {\hat{x},\hat{y}} \right] + {y^{ - 2}}Var\left[ {\hat{x}} \right] , \end{aligned}$$
(15)

which is found by using multivariate Taylor series expansion of the variance of the ratio (see, e.g., Glasserman 2004).

The ratio estimator \(\hat{r}\) is biased and the bias has the form

$$\begin{aligned} Bias\left[ {\hat{r}} \right] = x{y^{ - 3}}Var\left[ {\hat{y}} \right] + {y^{ - 2}}Cov\left[ {\hat{x},\hat{y}} \right] + O\left( {{N^{ - 2}}} \right) , \end{aligned}$$

see, e.g., Fishman (1996, p. 109). Here, the leading term is of order \(O(N^{-1})\). It is possible to reduce the bias by subtracting the estimate of the leading term from the ratio estimate. However, the squared bias is of order \(O(N^{-2})\), thus small compared to the variance. Therefore, it is enough to use the simple ratio estimate without the bias correction.

Let \(\hat{x}^{NV}\) and \(\hat{y}^{NV}\) denote the naive estimators for x and y. The naive estimator for r is:

$$\begin{aligned} {{\hat{r}}^{NV}} = {{{{\hat{x}}^{NV}}} \Big /{{{\hat{y}}^{NV}}}} = {{\left( {\sum \nolimits _{k = 1}^N {{L^{\left( k \right) }}{\mathbbm {1}_{\left\{ {{L^{\left( k \right) }}> \tau } \right\} }}} } \right) } \bigg /{\left( {\sum \nolimits _{k = 1}^N {{\mathbbm {1}_{\left\{ {{L^{\left( k \right) }} > \tau } \right\} }}} } \right) }}, \end{aligned}$$
(16)

and its variance can be estimated using (15). Algorithm 1 gives all the details of how to use this estimate.

Following Glasserman (2005) and Sak and Hörmann (2012), for the simulation of \(E[L|L>\tau ]\), we use the IS distribution computed for the simulation of \(P(L>\tau )\). If we use CEGS, our new estimate of conditional excess is

$$\begin{aligned} {{\hat{r}}^{CEGS}} = {{{{\hat{x}}^{CEGS}}} \Big /{{{\hat{y}}^{CEGS}}}} = {{\left( {\sum \nolimits _{k = 1}^N {{\rho ^{\left( k \right) }}} \bar{L}_\mathrm{in}^{\left( k \right) }} \right) } \bigg /{\left( {\sum \nolimits _{k = 1}^N {{\rho ^{\left( k \right) }}} \bar{p}_\mathrm{in}^{\left( k \right) }} \right) }}, \end{aligned}$$
(17)

where

$$\begin{aligned} \bar{L}_\mathrm{in}^{\left( k \right) } = {\left( {N_\mathrm{in}^{\left( k \right) }} \right) ^{ - 1}}\sum \nolimits _{l = 1}^{N_\mathrm{in}^{\left( k \right) }} {{L^{\left( {k,l} \right) }}{\mathbbm {1}_{\{ {L^{\left( {k,l} \right) }} > \tau \} }}} \end{aligned}$$

denotes the average of the inner replications \(L^{(k,l)} \mathbbm {1}_{\{L^{(k,l)}> \tau \}}\), \(l=1,\ldots ,N_\mathrm{in}^{(k)}\),

$$\begin{aligned} \bar{p}_\mathrm{in}^{\left( k \right) } = {\left( {N_\mathrm{in}^{\left( k \right) }} \right) ^{ - 1}}\sum \nolimits _{l = 1}^{N_\mathrm{in}^{\left( k \right) }} {{\mathbbm {1}_{\{ {L^{\left( {k,l} \right) }} > \tau \} }}} \end{aligned}$$

denotes the average loss probability, and \(N_\mathrm{in}^{(k)}\) is the number of inner repetitions for the kth outer replication.

To estimate the accuracy of (17), we use a general result for ratio estimators given on p. 234 of Glasserman (2004). Since the values (\(\bar{L}_\mathrm{in}^{(k)},\bar{p}_\mathrm{in}^{(k)})\) for \(k=1,\ldots ,N\) are independent and identically distributed, the variance of the ratio estimator under the CEGS method is calculated as:

$$\begin{aligned} Var[\hat{r}^{CEGS} ] \approx \left( \sum \nolimits _{k=1}^{N}\rho ^{(k)} \bar{p}_\mathrm{in}^{(k)}\right) ^{-2}{\sum \nolimits _{k=1}^{N}\left( \rho ^{(k)} \bar{L}_\mathrm{in}^{(k)}-\hat{r}^{CEGS}\rho ^{(k)}\bar{p}_\mathrm{in}^{(k)}\right) ^2}. \end{aligned}$$
(18)

Equations (17) and (18) can also be found in Sak and Hörmann (2012) and Sak (2010). The full algorithm of simulating tail loss probability and conditional excess using CEGS method is given in Algorithm 3.

When we add stratification to CEGS, the ratio estimate can be calculated as

$$\begin{aligned} {{\hat{r}}^*} = {{{{\hat{x}}^*}} \big /{{{\hat{y}}^*}}} = {{\sum \nolimits _{i = 1}^I {{{\hat{x}}_i}} } \bigg /{\sum \nolimits _{i = 1}^I {{{\hat{y}}_i}} }}. \end{aligned}$$

Here, \({{\hat{x}}_i} = N_i^{ - 1}\sum \nolimits _{k = 1}^{{N_i}} {{\rho ^{\left( {i,k} \right) }}\bar{L}_\mathrm{in}^{\left( {i,k} \right) }}\) and \({{\hat{y}}_i} = N_i^{ - 1}\sum \nolimits _{k = 1}^{{N_i}} {{\rho ^{\left( {i,k} \right) }}\bar{p}_\mathrm{in}^{\left( {i,k} \right) }}\) are estimators conditional to the ith stratum, and \(\bar{L}_\mathrm{in}^{(i,k)}\), \(\bar{p}_\mathrm{in}^{(i,k)}\) are the same as in (17) but computed conditional to the ith stratum.

The variance of the stratified ratio estimator can be estimated again using (15). However, the variances and the covariance of the estimators must be replaced by

$$\begin{aligned} Var\left[ {\hat{x}^*} \right] = {I^{ - 2}}\sum \nolimits _{i = 1}^I {N_i^{ - 1}{{\left( {s_i^x} \right) }^2}},\\ Var\left[ {\hat{y}^*} \right] = {I^{ - 2}}\sum \nolimits _{i = 1}^I {N_i^{ - 1}{{\left( {s_i^y} \right) }^2}}, \end{aligned}$$

and

$$\begin{aligned} Cov\left[ {\hat{x}^*,\hat{y}^*} \right] = {I^{ - 2}}\sum \nolimits _{i = 1}^I {N_i^{ - 1}s_i^{xy}}. \end{aligned}$$

Here, \(s_i^x\) denotes the sample standard deviation of \({\rho ^{\left( {i,k} \right) }}\bar{L}_\mathrm{in}^{\left( {i,k} \right) }\), \(k=1,\ldots ,N_i\), \(s_i^y\) denotes the sample standard deviation of \({{\rho ^{\left( {i,k} \right) }}\bar{p}_\mathrm{in}^{\left( {i,k} \right) }}\), \(k=1,\ldots ,N_i\), and \(s_i^{xy}\) denotes the covariance between these two samples. We plug these formulas into (15) and get:

$$\begin{aligned} Var\left[ {{{\hat{r}}^*}} \right] = {I^{ - 2}}\sum \nolimits _{i = 1}^I {N_i^{ - 1}{{\left( {s_i^r} \right) }^2}}, \end{aligned}$$
(19)

where

$$\begin{aligned} s_i^r = {\left( {{{y^{-4}}}{{{x^2}}}{{\left( {s_i^y} \right) }^2} - {{y^{-3}}}{{2x}}s_i^{xy} + {{y^{-2}}}{{\left( {s_i^x} \right) }^2}} \right) ^{{1/2}}},\quad i = 1, \ldots ,I, \end{aligned}$$
(20)

is the conditional standard deviation of the ratio estimator corresponding to the ith stratum.

Similar to what we describe in Sect. 3.3, the variance given in (19) can be minimized by allocating the sample sizes \(N_i\) proportional to conditional standard deviations \(s_i^r\) (see Başoğlu and Hörmann 2014):

$$\begin{aligned} {N_i} = \left\lceil {{{s_i^r N} \bigg /{\sum \nolimits _{l = 1}^I {s_l^r} }}} \right\rceil ,\quad i = 1, \ldots ,I. \end{aligned}$$
(21)

We again use a pilot study to estimate \(s_i^r\), \(i=1,\ldots ,I\) and determine the allocation sizes for the main run. The final estimate is calculated by combining the samples generated during both the pilot study and the main run. The full algorithm of simulating tail loss probability and conditional excess using STCEGS method is given in Algorithm 4. As stated earlier, we use the same IS strategy to estimate tail loss probability and conditional excess. However, in stratification, the optimal sample allocations are found with different formulas for these two estimates.

5 Numerical results

In this section, we compare the efficiency of the new methods CEGS and STCEGS with other available algorithms in literature. The efficiency of a simulation method is inversely proportional to the product of the variance of its estimator and the execution time (TM) of the simulation. We, therefore, report as a main result of our comparison, the efficiency ratio of the STCEGS estimator

$$\begin{aligned} ER\left( {\hat{y}^*} \right) = {Var\left[ {{{\hat{y}}^{BM}}} \right] TM\left( {{{\hat{y}}^{BM}}} \right) }\left( {{Var\left[ {\hat{y}^*} \right] TM\left( {\hat{y}^*} \right) }}\right) ^{-1}, \end{aligned}$$

where \(\hat{y}^{BM}\) is the benchmark estimator that corresponds to the best method existing in literature.

For the first model, we use the numerical example given in Glasserman and Li (2003). It is a portfolio with \(J=1000\) obligors and an exposure level of \(c_j= 0.04 + 0.00196j\) for \(j=1,\ldots ,J\). Furthermore, \(a_{j0}=0.002\) and \(a_{jd} =0.0002\) for \(d=1,\ldots ,D=10\). We assume \(\sigma _d=9\) for \(d=1,\ldots ,D\).

For the Gaussian copula model, we use the first numerical example given in Glasserman and Li (2005). It is a portfolio of \(J=1000\) obligors in a 10-factor model. The marginal default probabilities \(p_{j} = 0.01(1+\sin (16\pi j/J))\) for \(j=1,\ldots ,J\), thus, vary between 0 and 2 %; the exposures \(c_{j} = (\lceil {5j/J}\rceil )^2\) for \(j=1,\ldots ,J\), take the values 1, 4, 9, 16, and 25, with 200 obligors at each level. These parameters represent a significant departure from a homogeneous model. The factor loadings \(a_{j}\) are generated independently and uniformly from the interval \((0,1/\sqrt{10})\); the upper limit of this interval ensures that the sum of squared entries for each obligor does not exceed 1. Note that this upper limit also implies that, for some of the obligors, the sum of the squares of the elements of \(a_{j}\) are close to 1 indicating that this credit portfolio contains strongly correlated obligors.

For the t-copula model, we use the third numerical example of Sak and Hörmann (2012) and Sak (2010). It is a 5-factor model with 1200 obligors. Default probabilities, \(p_j\), are generated independently and uniformly from the interval [0, 0.02] and exposure levels are defined by \(c_{j} = (\lceil {20j/J}\rceil )^2\) for \(j=1,\ldots ,J\). To define the factor loadings, the obligors are separated into six segments of size 200. For each segment, the factors are generated uniformly from the interval \((0,\max )\). For the structure of the matrix and the \(\max \) values, we refer the reader to Sak and Hörmann (2012) (Table 3, p. 1566).

For each specific model, we tabulate the half length of the 95 % confidence intervals as a percentage of the tail loss probability and conditional excess estimators for the available methods in literature (see Table 1). These methods are naive simulation (NV), two-step IS of Glasserman and Li (2003) (TS-IS), the geometric shortcut (GS), the combination of IS and GS given in Sak and Hörmann (2012) (ISGS), and finally, the new methods CEGS and STCEGS introduced in this paper. We indicate the benchmark method for each specific model and give the efficiency ratio of STCEGS method in the last column of Table 1.

The number of equiprobable strata is \(I=I_1=50\) for CreditRisk\(^+\) and the Gaussian copula model, and \(I=240\) (\(I_1=30\), \(I_2=8\)) for the t-copula model. The sample size that we use for the pilot study in STCEGS is \(N_p=30,000\). This leaves approximately 70,000 drawings (N in (13) and (21)) to be allocated optimally in the main run. For CEGS and STCEGS, the number of replications used for determining IS parameters is \(M=10,000\).

Table 1 Performance comparisons of the methods under the three models

Summarizing the numerical results given in Table 1, the efficiency of STCEGS is higher than the efficiency of the methods proposed on those specific credit risk models. The main source of this efficiency improvement comes from the stratification which allows us to allocate more replications to regions with high variances. This aspect of stratification contributes to the objective of the IS method which is to move the sampling process to important regions. Moreover, our experiments showed that stratification yields better results when it is combined with IS.

We report the results of two different IS methods, ISGS and CEGS. The difference between ISGS (Sak and Hörmann 2012; Sak 2010) and CEGS is the set-up that selects the IS parameters. While ISGS uses the mode of the zero-variance IS function, this paper uses cross-entropy method with a dimension reduction for finding the IS parameters. For both methods, the geometric shortcut is common. Although, the half lengths produced by CEGS are no better than the ones produced by ISGS, CEGS has the advantage of being more general as it can be used for all three specific models. This is the main reason why we combine stratification with CEGS in the STCEGS algorithm.

Throughout the paper, we give algorithms to compute tail loss probabilities. For practical purposes, we may need to calculate risk measures like Value-at-Risk (VaR), which is simply the quantile of the loss distribution. Computing VaR requires simulating tail loss probabilities for a series of threshold values. An efficient algorithm for the estimation of multiple tail loss probabilities in a single stratified simulation is given in Başoğlu et al. (2013).

6 Conclusion

In this paper, we considered the problem of efficient estimation of tail loss probability and conditional excess for the Bernoulli mixture model of credit risk. We presented a new efficient simulation method which is a combination of stratification, importance sampling based on cross-entropy, and inner replications using the geometric shortcut method. It is an important contribution as all the credit risk models proposed in literature can be represented as a Bernoulli mixture model and it is more convenient for statistical fitting purposes compared to threshold models. We also formulated the optimal sample allocation for stratification of ratio estimators. Thus, we obtain further variance reduction for conditional excess estimators. We evaluated the efficiency of our method on three different credit risk models: CreditRisk\(^+\), the Gaussian, and the t-copula model. Numerical results suggest that the proposed general algorithm is more efficient than the benchmark methods for these specific models.