Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

4.1 PD Corporate SME Model Development

This section describes the main activities underlying the developmental steps of a model for the estimation of the PD (see Fig. 4.1). Our focus is mainly on the customer segment of corporate small and medium-sized enterprises (corporate SMEs). We refer the reader to Sect. 6.4 for a description of the main validation tests; these should be performed after the model estimation and before its final functional specification and passage to the production phase.

Fig. 4.1
figure 1

Main steps in developing a rating model

4.1.1 Step 1: Perimeter of Applicability and Definitions

Whatever the future application of the model to be developed, to establish a firm foundation for the entire process, it is important to pay great attention in the initial phase (Step 1) to the regulatory and operative reference framework, and to the definition of the event to be forecast: the default probability (see Table 4.1).

Table 4.1 Main steps in developing a rating model

The main objective of the model is the estimation of the probability of default within a determined temporal horizon (typically, one year) to classify customers in a portfolio according to their degree of risk.

The central role in the design of a rating model is the definition of default, which allows (future) insolvent customers (defined as the “bads” within the estimation samples) to be distinguished from solvent customers (the “goods”). The definition of default has to be set sufficiently far in advance (far enough from the onset of a problematic situation) to permit the identification of a default before it is too late to take corrective action and, in the meantime, sufficiently close to the moment of default to make an effective distinction between bads and goods.

The default definition used in model development should also be consistent with that used elsewhere in the bank and in line with the default definition required by the regulator. The default definition provided by the New Capital Accord includes bad debts, sub-standard loans, restructured exposures, and past due and overdrawn positions (see Basel Committee 2006).

To develop an effective rating tool, it is essential to establish a heterogeneous working group, characterized by a range of quantitative technical skills (mathematical, statistical and computer science) for:

  • descriptive and inferential analysis;

  • model design, the architecture of the rating system, the analysis of the origin of existing credit, and monitoring processes;

  • the management of databases and implementation of the IT environment for the estimation and validation processes;

  • and qualitative skills (economical and juridical) for:

    • the analysis of the enterprises’ financial situation and balance sheet data;

    • the assessment of scenario and sector components; and

    • an in-depth knowledge of the bank’s internal norms, and national and international rules.

Further requirements are solid experience in the field of the estimation and validation of rating systems, sufficient seniority and knowledge of the main internal processes of a banking group.

The working group should first analyze:

  • the internal regulatory framework (of the bank or the banking group) and the external regulatory framework (supervisory regulations, and domestic and international guidelines);

  • the credit process underlying the origination of the credit and monitoring of the corporate SME counterparts; and

  • the IT procedures that support this process.

The working group should then analyze the corporate SME segment using the most recent data available (for example, up to December 31 of the previous year) with respect to the main classification variables (industry sector, geographic area, company size, and so on) both in terms of position and volumes (that is, credit limit and outstanding debts).

The portfolio analysis represents a central activity within the estimation process: the segment data analyzed in the recent portfolio should be the main reference for the working group in relation to:

  • the editing of the data request finalized to the construction of the estimation and model validation samples;

  • the definition of existing fields for the indicators; and

  • the management of outliers, exceptions and preliminary factor transformations and normalizations, in order to reduce the impact of outliers and to make the multi-factor regression analysis more efficient and factor weights easier to interpret.

4.1.2 Step 2a: Data Collection and Sampling

After analyzing the availability, length of historical series and the quality of the databases underlying the credit processes, the next step is to edit the designated “long list” of potential predictors of default. This list is based on the academic literature, as well as on the input from the experiences of relationship managers and personnel from the credit department of the bank: the so-called “experts” of the working group (see the first activity of Step 2 in Table 4.2). In order to carry out a proper statistic-economic analysis, the indicators included in the initial long list should be grouped into areas and informative categories, obtaining the definition of as many long lists as the number of areas of information considered. Typical information areas to be analyzed in the development of an estimation model for probability of default for the corporate SME segment are financial, internal behavioral, external behavioral and qualitative.

Table 4.2 Developing a rating model: main activities of Step 2

The risk indicators belonging to each of the four inquiry areas will be grouped successively into categories for analysis; this is to facilitate the economic interpretation of the subsequent statistical evidence and to verify that, during the reduction that the area’s initial long lists will undergo, all the informative categories will be adequately represented.

Table 4.3 presents examples of indicators belonging to the financial area, grouped into information categories.

Table 4.3 Financial indicators grouped by categories: an illustrative example

After finalizing the indicators’ long lists and extracting all necessary data, a thorough analysis of the databases must be performed, paying particular attention to:

  • the possible presence of duplicated positions for the same analysis key;

  • the consistency of elementary variables;

  • their economic coherence, both in terms of content and number of expected observations per period (month);

  • the variation of indicator values; and

  • their stability over time, also with respect to their relative risk by sub-segments of analysis (industry sector, geographic area, company size, and so on).

After carefully carrying out data cleaning, the next step is estimation sample extraction and model validation, ensuring:

  • sufficient cardinality and sample depth;

  • the correct identification of goods and bads, both in the development and in the model validation samples;

  • an adequate proportion of bads and goods, which permits an adequate representation of the event to be forecast within the estimation samples; and

  • the stability/representativeness of the samples with respect to the reference portfolio.

Generally, for the construction of the estimation samples of a rating model, all the positions that went into default in the observation horizon (bad customers) and a sub-set of the positions that never went into default in the observation horizon (good customers) are adopted. In certain cases, the samples could be balanced – that is, the same number of bads and goods.

One possible sampling methodology is the random extraction of positions, without repetition, stratified with respect to the representative variables and to the year of default, with constant sampling probability (simple sampling) within layers. Of the extracted samples, one must verify carefully the completeness of information and the existing fields (ranges) observed in the recent portfolio. The possible infeasibility of one of the above conditions requires the re-extraction of the sample.

The linking of information (financial, behavioral and qualitative) to the sample positions must be performed in a manner coherent with the effective availability of the information (updating time, source, and so on). This allows for the construction of the indicators defined in the long lists to be carried out early enough to respect the time of default, both for the single bad position and for the corresponding (twin) good positions in the sample.

A possible information-linking rule is depicted in Fig. 4.2.

Fig. 4.2
figure 2

Information-gathering rules: an illustrative example

If “d” denotes the instant (month) of entrance into default of a generic bad position, the period of data observation of the bad position and of the corresponding good one varies between:

  • “d-12” and “d-24” for the information of a qualitative nature – to evaluate the possible variation of this kind of information across the interval of 12 months;

  • “d-12” and “d-24” for the behavioral information – to build relevant derived indicators such as quarterly, semi-annual and annual averages/variations;

  • “d-19” and “d-43” for the financial variables – to simulate the effective availability of at least two balance sheets in the production phase.

Once a preliminary sample analysis has been performed (quality, numeracy and observation depth), it is possible to design the model structure and define the best methodological approach to be followed during the model development.

4.1.3 Step 2b: Model Structure

The most widespread rating model structure is modular, with the number of modules equal to the number of information areas that feed the model – in this case, four: one financial module, two behavioral modules and a qualitative module. Each module, according to the chosen methodology, produces as output a score that expresses, in numerical terms, the credit merit of the counterpart, depending on the type of information computed: the accounting data (financial module); the borrower behavior with the bank (internal behavioral module), or with the banking system (external behavioral module); and the qualitative judgment expressed by the relationship manager (qualitative module).

Depending on the practical availability of data (financial, behavioral and qualitative), it is possible to develop models on a statistical basis (in the presence of sufficient robust data) or an expert basis (judgmental).

As shown in Fig. 4.3, the score produced by a module developed on a statistical base is transformed, successively, into a default probability that is expressed on a scale from 0 (minimal risk) to 1 (maximum risk) to the likelihood that, during a period of 12 months, the borrower will become insolvent, according to the default definition adopted. The (modular) PDs obtained separately are then integrated, according to an algebraic formula, in a unique default probability, associated successively with a rating class of the bank’s master scale.

Fig. 4.3
figure 3

Main steps in the development of statistical models

The score produced by the modules developed on a judgmental basis (inside the upper dotted line in Fig. 4.4) is generally not transformed into a default probability but, rather, is used to correct – upward (upgrading) or downward (downgrading) – the rating class assigned by the statistical component of the model (inside the lower dotted line shown in Fig. 4.4).

Fig. 4.4
figure 4

Main steps in the development of statistical/expert-based models

Finally, in the presence of modules and components developed only on an expert basis, the judgmental score can be employed to correct (upward or downward) the rating class corresponding to the default probability assigned (ex ante) to the portfolio segment, following the analysis of its current and historical default rates in the medium to longer term (see Fig. 4.5).

Fig. 4.5
figure 5

Main steps in the development of purely expert-based models

4.1.4 Step 2c: Methodological Approach

As far the methodological approach is concerned, for the segments characterized by databases that are sufficiently broad and stable and that have an adequate number of defaults (called a “high default portfolio”), it is possible to adopt a statistical approach for the assessment of qualitative information in cases supported by judgmental techniques .

The most frequently adopted statistical technique for the corporate SME segment is logistic regression: alternative techniques are discriminant analysis; probit models; and the more recent inductive models of a heuristic nature, such as genetic algorithms and neural networks.

For insights regarding the listed approaches, see Resti and Sironi (2007). Next, we describe the development of a default probability estimation model based on the logit method.

4.1.5 Statistical Methodology

In the literature, it is recognized that logistic regression is one of the best methodologies for the estimation of a function capable of linking the probability of the possession of a dichotomous attribute (in this case, bad = 1; good = 0) to a set of explicative variables (financial, behavioral or qualitative).

The logistic regression represents a specific case of regression analysis: the dependent variable, Y, is dichotomous, its distribution is binomial and the estimation of Y, varying from 0 to 1, assumes the meaning of a probability: P{Y = 1| x} = π(x) that is:

$$ Y=\Big\{\begin{array}{ll}1,\hfill & \mathrm{with} \mathrm{probability} \pi (x)\hfill \\ {}0,\hfill & \mathrm{with} \mathrm{probability} 1-\pi (x)\hfill \end{array} $$

The logistic regression function has the form:

$$ \mathrm{logit}\left(\pi (x)\right)={\beta}_i+{\displaystyle \sum_{i=1}^n{\beta}_i}\cdot {x}_i=x\cdot \beta $$

where logit (π(x)) denotes the natural logarithm of the ratio of the probability of “success” (that is, the probability that the analyzed position defaults in the 12 months successive to the evaluation) and the probability of “no success” (solvent) given the vector x of n predictive variables (for example, the vector x could contain behavioral variables of the customer):

$$ \mathrm{logit}\left(\pi (x)\right)= \ln \left[\frac{\pi (x)}{1-\pi (x)}\right] $$

As π(x) denotes the probability that Y is 1, conditional to the explicative variables x, the probability of Y can be expressed as a logistic function:

$$ \pi (x)=\frac{{\mathrm{e}}^x\cdot \beta }{1+{\mathrm{e}}^{x\cdot \beta }} $$

The choice of the logit to describe the function that links the probability of Y to the combination of predictive variables is determined by the observation that the probability gets gradually close to the limits “0” and “1”, describing an “S” shape (called a “sigmoid”).

While it is not a unique function that permits the modeling of the probability of a phenomenon, the logit is privileged with respect to the others as it represents a transformation of the ratio of two complementary probabilities (a quantity known as “odd”); that is, the ratio of the number of successes over each failure of the examined phenomenon.

4.1.5.1 Expert-based Methodology

The modules developed according to an expert approach are generally inspired by a multi-attribute value theory such as the Analytical Hierarchical Process™ (AHP) proposed by Saaty at the end of the 1970s.The AHP method allows the modeling of a decision problem by means of a hierarchy of levels (see Fig. 4.6) and by the conversion of qualitative and quantitative information in a uniform manner by means of the concept of relative importance in a finite set of alternatives.

Fig. 4.6
figure 6

Schematic view of the proposed hierarchy

The choice of a hierarchical approach for the definition of the expert-based components is often preferred to alternative techniques; this is for reasons of conceptual and implementable simplicity, methodological transparency and the possibility of performing fine-tuning on all the parts of the structure, also in an independent manner.

Following a top-down approach, the main objective of the analysis – that is, the determination of the quantity of the improvement/worsening of the counterparty risk estimated by the statistical component of the model – is decomposed according to a hierarchy of sub-objectives at lower levels of the hierarchy specifically for the segment to which the borrower belongs.

Such decomposition allows us to design a sort of “conceptual map” of the expert-based component and, at the same time, to formalize the basic hierarchical structure.

Following this method, it is possible to define the mathematical formalization of one or more (expert-based) modules of a rating model in parallel with the definition of the conceptual map(s), with these main objectives:

  • to establish the criteria to be used for dealing with differing information, according to its type (continuous or categorical) to ensure the correct transformation of indicators into model variables;

  • to assure the uniqueness of the variables’ value range;

  • to define the criteria for dealing with missing values;

  • to identify the model variables to which to assign a weight;

  • to establish the criteria for the computation of weights to manage possible diversity in the “discriminant capability” of some risk indicators.

At the highest level of the hierarchy, the total risk function is computed – the score (integrated if it results from more than one module) which determines the size of the correction of the statistical rating class – whose value depends on the nodes at the lower hierarchy level.

The hierarchy proposed consists of four levels.

  • “Level 0” (or the “starting level”) contains the main objective (or “goal”) of the evaluation: the risk expert-based score to be assigned to the examined positions.

  • “Level 1”, containing the evaluation criteria (financial and/or qualitative) that specify the content and meaning of the goal: the Level 1 criteria are divided into more specific objectives.

  • The objectives of “Level 2” (the categories of information to be analyzed which, in case of a qualitative module, can be: demand/offer in the reference market; competitive position of the company; proprietary structure/account quality; and so on) that are themselves sub-divided in Level 3.

  • The single terminal objectives of “Level 3” of the hierarchy, originated from single module variables.

A value is assigned to each modality of the variables that feed the expert-based component – continuous for continuous variables and discrete for categorical variables in the interval – for example, from 0 (maximum risk) to 10 (minimum risk).

To each objective of the structure, a “local weight” is assigned ranging from 0 to 1, which determines the relative importance with reference to the objective of the higher level.

The importance of each terminal objective in relation to the goal is determined by the “hierarchy composition rule”:

  • the local weights assigned to the different terminal objectives are multiplied by the value of the corresponding variables;

  • the values so computed are summed up to obtain the values of the objectives of the higher level; and moving from the bottom to the top, the weighted sums of the variables, first, and then the categories/types of information lead to the determination of the score (integrated, where more than one module is present) of the expert-based model component.

4.1.6 Step 3: Univariate Analysis

The aim of the univariate analysis is to investigate the link between the single variable (financial, behavioral, qualitative) and the default, and the consequent reduction of the factors’ long lists to medium lists that are logically and methodologically sound, removing factors that do not perform well or that show a high percentage of missing values (see Table 4.4).

Table 4.4 Developing a rating model: main activities of Step 3

The univariate analysis follows the preliminary explorative sample analysis (data quality and representativeness) and after the rebuilding of the factor algebra (by association with all the sample observations the indicators defined in the long lists).

The aims of the univariate analysis – performed separately for each informative category of the single areas of enquiry – are:

  • to analyze the distribution (in classes or quantiles according to the type) of all the variables in their fields of existence;

  • to verify the economic soundness of the factors; and their proper relationship with the default.

As an example, in Figs. 4.7, 4.8 and 4.9 three variables are characterized by identical distributions for a range of values (shaded bars), but by three different relations with the risk (default rate of the population in the eight ranges, shown by the curve on the graph). Figure 4.7 shows a trend growing with the risk, Fig. 4.8 shows a decreasing trend and Fig. 4.9 illustrates uncertainty.

Fig. 4.7
figure 7

Example of a variable growing monotonically with the risk

Fig. 4.8
figure 8

Example of a variable decreasing monotonically with the risk

Fig. 4.9
figure 9

Example of an uncertain relation with the risk

In the first two cases, if the trend with respect to the risk is confirmed by the economic interpretation of the indicators under consideration, the two variables will be included in the factors’ medium list(s) to be analyzed, at multivariate level, in Step 4.

The variable represented in Fig. 4.9 will be excluded from the successive analysis process because of its undetermined relation with respect to the event to be forecast – the default.

It is necessary to work out the analysis of distribution and its relation with the default, both before and after the preprocessing of data. This is intended to eliminate problems such as missing data, outliers and exceptions (for example, “0/0”, “missing/0” and so on).

There are a number of ways to manage missing data: elimination of the indicators not available for a significant percentage of observations (vertical missing data), substitution of the missing data with predefined values, or the elimination of observations for which a significant number of indicators from the long lists are not available (horizontal missing data).

A common approach to the management of outliers is to define their data variability in order to assess their economic and statistical feasibility ranges and the consequent substitution of values outside the range of pre-fixed thresholds. Definition of these feasibility ranges requires special attention; if the ranges are too narrow, this could lead to models the fit of which is biased by an arbitrary variance reduction of the input data.

As with the missing data and the outliers, the exceptions also require specific treatment.

In the construction of variables derived across time horizons of three, six, twelve months and so on – as minimum, maximum, correlation, coefficient of variation and so forth – it is necessary to define the minimum thresholds for the presence of information; below such thresholds, the value obtained for the indicator should be considered to be missing.

Generally, for indicators built on a number of n months, it is it may be necessary to have at least n + 1 information if n is odd, or n if n is even.

There are two other important activities related to univariate analysis: the management of the “U-shaped” factors; and their transformation, inside the feasibility interval, to emphasize their relation with the default.

The first of these two analyses, performed separately on each factor of the long lists, is devoted to identifying the possible “U” relation – which must also be confirmed by the economic analysis – between the range of values assumed by the indicator and the default rate (see Fig. 4.10, upper chart).

Fig. 4.10
figure 10

Example of a “U-shaped” factor

The analysis is carried out by dividing the interval of assumed values into quantiles, from which the default rate is computed.

The median value of each quantile and the corresponding default rate are identified, respectively, on the x and y axes of the Cartesian plane, allowing the graphical representation of the relation of each indicator with the default (see Fig. 4.10, lower chart).

In the event of a “U-shaped” pattern, once the point (x 0; y 0) of the derivative sign change has been set – that is, the minimum of the function, ideally a parabola with the two branches going upward – it is possible to identify the best preliminary transformation that ensures a cross near the point (x 0; y 0) and, simultaneously, to minimize the deviation between the interpolating curve and the observed values.

At the end of such transformation, the most significant factors of the long lists will show a monotonous trend (increasing or decreasing, according to their economic meaning) with respect to the default. They may also be subjected to a final phase of (deterministic) transformation and normalization to reduce the impact of outliers, and to make the multifactor regression analysis more efficient and the factor weights easier to interpret.

As an example, for continuous variables, one can identify, for each indicator, the value interval [x l ; x u ], where a significant portion of observations falls (equal, e.g., to 75–80 %) and, at the same time, the monotonic relation with the default event appears with specific evidence.

Then, the upper and lower bounds are denoted, respectively, as xu and xl – and it is possible, by means of a deterministic transformation (e.g. logit) to enhance the discriminatory capability of the single factor in the interval [xl; xu] and flatten it outside the interval, where the relation with the default is less important. Following this transformation, the analysis of the ordering capability of individual indicators at univariate level is carried out using a discriminatory power test on both the developing sample and the validation sample.

By setting the minimum level of acceptability for the discriminatory power tests required for the variables belonging to the same types of information (financial, behavioral or qualitative) and by assessing the coherence of the indicators’ behavior (values and relation to the default) with respect to their economic significance, it becomes possible to select from the corresponding long list the three sub-sets of factors (financial, behavioral and qualitative) that are:

  • most predictive of the default event;

  • intuitive from the economic point of view; and

  • capable of ensuring coverage of the main risk categories, which the panel of experts considers to be the determinants in the evaluation of creditworthiness.

Such sub-sets of indicators are usually referred to as the “medium” list. It is very important to eliminate factors with low predictive power before initiating the multifactor analyses: including a factor with no ability to differentiate between bad and good clients creates unwanted noise and increases the risk of over-fitting the model to the sample data.

4.1.7 Step 4: Multivariate Analysis

The aim of the multivariate analysis is to determine the optimal variable selection and weight of each indicator (see the main activities in Table 4.5). First, a further reduction of indicators is carried out, to eliminate from the medium lists those that are highly correlated with other, more predictive indicators.

Table 4.5 Developing a rating model: main activities of Step 4

In this phase of the analysis, the indicators are compared at multivariate level inside the informative categories to which they belong, applying techniques such as cluster analysis and logistic regression inside the identified clusters.

In this way, the single short lists of indicators can be defined, one for each information category analyzed (see Table 4.6).

Table 4.6 From the long list to the final model indicators

Successively, the short lists of the same enquiry area are merged, obtaining, in this case, four lists of variables to be tested jointly through the logistic regression analysis performed by:

  • applying the step-by-step selection technique – without setting the maximum number of predictors;

  • according to the cluster analysis identified in the hierarchical manner – where each class (cluster) of variables belongs to a larger cluster, which is again contained in a larger one and so on until the cluster that contains the whole set of analyzed factors is reached; and

  • relying on identification through logical-economic considerations, starting with the short list, the sub-set of “best” variables – in relation to their economic interpretation, capability of covering the main risk categories, forecasting power and in relation to the correlation matrix – to be provided as input to the regression analysis for the enquiry area.

The final list of factors of each module is chosen from among the optimal candidates and constructed using both statistical and experience-based criteria. The factor weights of the single module and significance level of each factor are then calculated through a statistical regression (typically, a logistic regression). In general, for each area of analysis, there are several modules that are near optimal and present only minor differences in terms of performances: to select a final model, it is necessary to consult the bank experts, to make sure that all the above-mentioned criteria have been satisfied.

Four illustrative modules are presented in Tables 4.7, 4.8, 4.9, 4.10: (financial, external behavioral, internal behavioral, and qualitative); these could potentially be employed in the evaluation of the creditworthiness of corporate SME counterparties. (Table 4.11)

Table 4.7 Financial module: an illustrative example
Table 4.8 External behavioral module: an illustrative example
Table 4.9 Internal behavioral module: an illustrative example
Table 4.10 Qualitative module: an illustrative example
Table 4.11 Developing a rating model: main activities of Step 5

The coefficients of the first three modules, estimated by means of logistic regression, are expressed as percentages.

Indeed, setting the existing monotonic relation between the logistic function:

$$ \pi (x)=\frac{{\mathrm{e}}^{x\cdot \beta }}{1+{\mathrm{e}}^{x\cdot \beta }} $$

and the exponential function argument:

$$ x\cdot \beta ={\beta}_0+{\displaystyle \sum_{i=1}^n{\beta}_i}\cdot {x}_i $$

it is possible to compute the weights p 1 , p 2 , … , p n of the n variables of each module as:

$$ {p}_i=\frac{\beta_i}{{\displaystyle \sum_{i=1}^n{\beta}_1}} $$

with

$$ {\displaystyle \sum_{i=1}^n{p}_{1=1}} $$

and

$$ 0\le {p}_i\le 1 for any $$
$$ i=1,\dots, n $$

and postpone, to the following phase of calibration, the transformation of the risk score into a default probability.

Put differently, the weights assigned to the variables (questions) of the qualitative module have been assigned in a directly judgmental way, as an alternative to the proposed multi-attribute value theory method.

4.1.8 Step 5: Calibration, Integration and Mapping to the Master Scale

The output of the logistic regressions assumes values in the interval [0; 1] and could be interpreted as a default probability. Yet, the regression output is correctly “calibrated” when bank’s risk manager estimates the average probability on the perimeter under consideration close to the one-year forecast default rate (the so-called “calibration point”) and not by the average frequency of the default of the sample.

The calibration process, which allows the transformation of the logistic regression output in a default probability to 12 months, can be represented in the steps shown in Table 4.11:

  • estimation of the calibration point (CP), which represents the level of average PD considered coherent with the portfolio under examination;

  • computation of the default rate of the sample used for the calibration DRsample;

  • sub-division of the sample in n quantiles, ordered with respect to the regression output (the score);

  • computation of the median score associate with each quantile (i = 1, … , n);

  • computation of the default rate relative to each quantile, DR i (i = 1, … , n);

  • re-apportionment of the default rate of each quantile with respect to the CP, by applying Bayes theorem:

$$ D{R}_i^{\mathrm{calibrated}}=\frac{D{R}_i\cdot \frac{AP}{D{R}^{\mathrm{sample}}}}{D{R}_i\cdot \frac{AP}{D{R}^{\mathrm{sample}}+\left(1-DR\right)\cdot \frac{\left(1-AP\right)}{1-D{R}^{\mathrm{sample}}}}} $$

where DRcalibrated denotes the re-apportioned default rate of the i quantile, constrained to the interval [0; 1]; and

  • the estimation of the (a) and (b) parameters which specify the exponential curve equation that relates to the score and the (re-apportioned) default rate observed in the quantiles:

$$ \ln \left(D{R}_i^{\mathrm{calibrated}}\right)=\alpha \cdot {s}_i+b $$

so obtaining the punctual (granular) values of default probability for each sample position contained in the interval [0; 1], and such that the average PD estimated on the whole sample will be equal to the calibration point.

The re-calibrated (and standardized) output of every module can eventually be integrated using both statistical methodologies (if a sufficiently large sample is available on which all the model indicators are computed; see Table 4.6), and internal bank experience alone. Table 4.12 presents examples of integration weights for the default probabilities estimated (and calibrated) separately for every module.

Table 4.12 Module integration weights

It is a reasonable suggestion initially to assign a limited weight to the qualitative module (in this case, 5 %) and to increase it progressively after comparing the judgment assigned by the relationship managers (by means of a questionnaire) with the quantitative model components (financial, external and internal behavior) and testing their correctness.

The integrated default probability is then associated with a rating class; that is, to one (and only one) of the ordered and disjoint sets that determines the partition of the possible values that the probability can assume.

The table on the left-hand side of Fig. 4.11, representing the so-called “master scale” of a generic rating system, illustrates the method for associating a default probability with a corresponding rating class.

Fig. 4.11
figure 11

An illustrative master scale

For the definition of the master scale, the numerosity and amplitude of the rating classes should be set so that the scale:

  • divides the portfolio customers into a sufficient number of risk classes;

  • avoids excessive concentrations (both in terms of the number of positions and outstanding debts) in single rating classes; and

  • allows a direct comparison with the final assessment (rating class) expressed, with the same counterparties, and the main external agencies and banking groups adopting a comparable master scale both in terms of average PDs and default definition.

Figure 4.12 shows, for the purposes of illustration, a possible portfolio distribution analyzed by rating class.

Fig. 4.12
figure 12

Rating class distribution

The risk judgment expressed by the integrated model can be corrected (in general, worsening the outcome) in the presence of events/behavior that represent eminent risk to the counterparty or its risk group. Corrections following policy rules or discriminatory events, even if they do not modify the default probability estimated by the algorithm, increase the attention level of the counterparty during the origination phase. This may lead the counterparty to assign its credit evaluation to higher power delegation, and, in the monitoring phase, the counterparty may move to a dedicated management unit. Before releasing the model into production, it is necessary to submit it to a thorough validation, correcting/integrating it and documenting the whole estimation process to ensure that the nature of the results is replicable.

4.1.9 Step 6: Embedding the Model in the Banking Processes

The model release happens, generally, by means of a preliminary prototype development, which allows us to test the calibration impact on bank credits and commercial policies (see Table 4.13).

Table 4.13 Developing a rating model: main activities of Step 6

As stated in Table 4.13, among the main uses of a rating model within the banking processes are:

  • the definition of delegation powers in relation to the expected loss associated with the single risk position;

  • the definition of the pricing for the required facility;

  • the cost of risk computation; and

  • the optimization of the risk/return profile of the bank.

Some of these will be detailed in later chapters of this book.

4.2 PD Corporate SME Sub-segment Models

In relation to the practical availability of data (financial, behavioral and qualitative), it is possible to estimate the different modules of a PD model either on a statistical basis (in the presence of sufficiently robust data) or on an expert basis. Also, in the presence of company samples that fall into the good/bad type, representative of the bank’s portfolio and statistically robust, expert evaluation always plays a part, both in the selection of final financial and behavioral modules, and in the development of the qualitative module (Tables 4.14, 4.15, 4.16).

Table 4.14 Start-up model: an illustrative financial module
Table 4.15 Consortia model: an illustrative financial module
Table 4.16 Financial company model: an illustrative financial module

In the absence of robust databases, the expert-based component simply assumes a more relevant role in the framework of the definition of the whole structure of the model.

In particular, models composed from expert-based modules refer to customer sub-segments characterized by portfolios that are:

  • rarefied in terms of counterparts (for example, insurance companies); or

  • constituted by a reduced number of defaults (non-profit organizations); or

  • lacking a historical database of clearly codified balance sheets (non-profit organizations) or sufficiently reliable.

The release of models with expert-based modules also aims to make known the rating discipline in terms of number of positions/default rates for portfolios/sub-segments that are less relevant than others.

This contributes to the settling down of a data collection process on a systematic base on these bank portfolios.

As soon as a reliable database is available for these modules, it will be possible to start the “objectivization” phase of weights and variables following statistical techniques.

4.2.1 Statistical Expert-based Models

Possible models constituted both by statistical components and by expert-based modules are devoted to the evaluation of corporate SME counterparties belonging, for example, to the following segments: farmers, start-ups, consortia and financial companies.

In the case of farmers, the expert-based component could be represented by the qualitative module; in the remaining three models (devoted to start-ups, consortia and financial companies), one could assume that the expert-based score would be the result of the weighted average of the scores produced by the financial and qualitative modules.

The following two sub-sections present a brief description of the process of derivation of the financial and qualitative expert-based modules, as illustrated earlier in the chapter.

As explained in Figure 4.3, such modules/components will be allowed to modify, in a limited manner (in terms of notches), the behavioral (or behavioral and financial) evaluation expressed by the model’s statistical component.

4.2.1.1 Qualitative Modules

In the definition of the qualitative modules of the models devoted to the evaluation of farmers, start-ups, consortia and financial companies, all the variables suggested by the expert are generally inserted into the final components, with a weight variable from 0 to 1 in relation to its recognized importance to the insolvency forecast capability.

The weights indicated by the experts are differentiated according to their “vintage”, assuming that, for “new” customers, no answer could be found for certain questions (variables): in a first approximation, the relative weights could simply be redistributed proportionally over the remaining questions.

The score assigned to each indicator included in the interval [0; 1] must be obtained according to the examined variable type:

  • for indicators similar to continuous variables, a score can be assigned by means of linear regression, analogous to what was undertaken for the variables of a financial nature; or

  • for indicators of a categorical type, the expert team must identify the possible outcomes and set the relative risk score.

Tables 4.17, 4.18, 4.19 and 4.20 describe the structure of four possible quantitative modules for the evaluation of, respectively, farmers, start-ups, consortia and financial corporate SMEs.

Table 4.17 Farmers model: an illustrative qualitative module
Table 4.18 Start-up model: an illustrative qualitative module
Table 4.19 Consortium model: an illustrative qualitative module
Table 4.20 Financial company model: an illustrative qualitative module

4.2.1.2 Integration of the Statistical and Expert-based Components

As mentioned earlier in the chapter, the rating class of a counterparty in the sub-segments of farmers, start-ups, consortia and financial companies, estimated by means of the statistical component of the corresponding rating model, can be corrected upward or downward, according to the score level assigned to the same counter party from the expert-based component.

As every variable of the expert-based component has a value between 0 and 1, as well as other possible intermediate expert-based scores, according to the hierarchical structure, the final score will also be included in the interval [0; 1].

Having sub-divided the score variation range into seven risk sub-intervals, the magnitude of correction upward or downward of the rating class, estimated statistically, could be defined, agreeing with the expert team, as shown in Table 4.21, or be further differentiated in relation to the rating class estimated by means of the model’s statistical component.

Table 4.21 Expert-based correction entity

Following such a correction, it is possible to associate the counterparties belonging to particular corporate SME sub-segments, such as farmers, start- ups, consortia, financial companies, with a final rating class and a default probability to be employed for both regulatory and management purposes (delegation powers, remuneration and pricing).

4.2.2 Pure Expert-based Models

Pure expert-based models are, for example, those that can be developed for the corporate SME counterparties in the sub-segments of insurance companies, holding companies and non-profit organizations.

As illustrated in Fig. 4.5, the model structure is still modular: the financial module and the qualitative/behavioral module compute, separately, two scores that express in numerical terms the creditworthiness of the counterparty.

The scores generated by the two modules are combined, adopting a weighted average, in a final score variable between 0 (maximum risk) and 10 (minimum risk), expressing the size of upward correction (upgrading) or downward correction (downgrading) to be applied to the rating corresponding to the average risk of the segment under examination, possibly corrected in a through-the-cycle perspective (the calibration point).

For the correction, one can refer to a structure similar to that proposed in Table 4.21.

4.2.2.1 Financial Modules

Tables 4.22, 4.23 and 4.24 summarize the structure of three possible financial modules for the evaluation, respectively, of insurance companies, holding companies and non-profit organizations.

Table 4.22 Insurance companies model: an illustrative financial module
Table 4.23 Holding companies model: an illustrative financial module
Table 4.24 Organizations model: an illustrative financial module

4.2.2.2 Qualitative/Behavioral Modules

Tables 4.25, 4.26 and 4.27 describe the structures of three possible qualitative/behavioral models for the evaluation of insurance companies, holding companies and non-profit organizations, respectively.

Table 4.25 Insurance companies model: an illustrative qualitative/behavioral module
Table 4.26 Holding companies model: an illustrative qualitative/behavioral module
Table 4.27 Example of default data

4.2.2.3 Integration of Pure Expert-based Modules

As anticipated at the beginning of this section, the scores generated separately by the financial and qualitative/behavioral modules are integrated according to a weighted average (convex combination) in a final score variable, which is also in the interval [0; 10].

Table 4.28 proposes possible integration weights for the two modules, differentiated for types of counterpart (insurance companies, holding companies and non-profit organizations).

Table 4.28 Mapping of suggested master scale to S&P grades

The integrated score, when divided, for example, into the seven classes presented in Table 4.20, can be used to establish whether the risk of the single counterparty is greater or smaller than the average of a sub-segment, and to assign to these a specific default probability.

4.3 Term Structure of Probability of Default

The effects of grade migration over a period of time create a term structure of PDs. For example, an AAA-rated borrower cannot improve in rating over time and so, on average, is likely to deteriorate. However, a CCC-credit rated borrower, if it survives, can only improve.

4.3.1 Observed Term Structures

Figure 4.13 shows the term structure observed for Standard & Poor’s (S&P) rated companies. It can be seen from this figure that higher-quality credits tend to deteriorate over time and lower-quality credits improve.

Fig. 4.13
figure 13

Observed term structure of S&P rated companies (based on one-year forward PD) (Source: Internal Rating Model Development Handbook – Capitalia Banking Group)

4.3.2 Marginal, Forward, and Cumulative Probability of Default

The PDs for each year shown in Fig. 4.10 are forward PDs; they are the PDs that would be expected that year expressed as a percentage of companies that have survived. The number of companies that survive can be determined from the cumulative default rate. To illustrate these concepts, consider the simple example in Table 4.27.

Consider three different questions. What is the probability that:

  1. 1.

    a company will default over a four-year period?

  2. 2.

    a company in year four will default over the next year?

  3. 3.

    a company will default in the fourth year of a facility?

The answers require different combinations of the numbers presented in Table 4.27:

  1. 1.

    Of 100 companies, 10 default in the first four years: 10 %.

  2. 2.

    The Cumulative Default Rate in year four is 10 %.

  3. 3.

    Of the 94 companies that survived until year four, 4 will default in year four: 4.2 % is the Forward Default Rate in year four.

  4. 4.

    Of the 100 companies, 4 that have been granted loans default in the fourth year of their life: 4.0 % is the Marginal Default Rate in year four.

The pricing model requires both the cumulative PD and forward PD for the discounted cash flow calculation. The cumulative PD is required to determine the probability of which revenues and costs are incurred in any given year (that is, to account for survivorship) and the forward PD is required to calculate expected loss and regulatory capital.

4.3.3 Mapping PD Ratings to Observed Term Structures

Once the marginal PDs have been calculated (Fig. 4.14), it is then possible to calculate the forward PDs using the following equation:

$$ P{D}_{\mathrm{forward},\mathrm{year} n}=\frac{P{D}_{\mathrm{marginal},\mathrm{year} n}}{1-{\displaystyle \sum_{\mathrm{year}=0}^nP{D}_{\mathrm{marginal}}}} $$
Fig. 4.14
figure 14

Calculating marginal PD from the migration matrix

As not all grades of the suggested 22-point grade system master scale can be mapped directly onto the S&P grade system (as some of them are intermediate grades), the simplified mapping shown in Table 4.28 can be used to determine the forward PDs. The result based on the suggested 22-point rating system master scale is shown in Table 4.29.

Table 4.29 Forward PD for suggested master scale with 22-point ratings (illustrative, (%))

4.4 Transition Matrix State – Dependent

In the previous sections, an analysis was used that was indifferent to the phases of the economic cycle. This section approaches the production of European transition matrices based on the different phases of the cycle itself. The type of transition matrix states of the economy dependent on each business segment are summarized in Table 4.30. The average downgrading and upgrading probability states of the economy dependent on all of the business segments are shown in Table 4.31.

Table 4.30 List of transition matrix states of the economy dependent on each business segment
Table 4.31 Transition probabilities in terms of stability, downgrading and upgrading (%)

Downgrading probabilities are, on average, increasing from recovery to hard landing.

Upgrading probabilities decrease from recovery (higher probabilities) to hard landing.

Tables 4.32, 4.33, 4.34 and 4.35 show state-dependent transition matrices for large corporates, corporates, SME corporates and SME retail.

Table 4.32 Large corporate transition matrices
Table 4.33 Corporate transition matrices
Table 4.34 SME corporate transition matrices
Table 4.35 SME retail transition matrices

4.5 Validation of Internal Credit Rating Models

A credit rating system undergoes a “validation process”. This consists of a formal set of activities, instruments and procedures aimed at ensuring that the design of a model is conceptually sound; that its implementation is accurate and consistent with the theory; and to assess the accuracy of the estimates of all material risk components and the regular operation, predictive power and overall performance of the internal rating system.

A model validation process will be triggered whenever a new model is developed, or when any significant changes are made to one that has been previously approved. Models are also subject to periodic reviews, which aim to reassess the adequacy of their performance over time (e.g. the verification of the validity of their assumptions under different market conditions; investigation of mismatches between realized and model-predicted values; and comparisons with competitors’ best practice).

Hence, model validation must be seen as an ongoing process: at least once a year, banks have to verify the reliability of the results generated by the rating system on an ongoing, iterative basis and also its continued consistency with regulatory requirements, operational needs and changes in the reference market.2

The rating system validation process is complementary to the developmental process (see Fig. 4.15).

Fig. 4.15
figure 15

Rating system life-cycle

The initial validation, before a model’s implementation, aims to consolidate all new models; the ongoing validation ensures the reliability and robustness of the regulatory parameters over time.

It is possible to select the three most relevant areas for analysis:

  • validation of the rating model;

  • validation of the rating process; and

  • validation of the dedicated IT system.

This chapter selects and describes the main set of analyses and statistical tests to be performed in order to assess, the appropriate aspects of a rating model for each relevant risk component (PD, LGD and EAD):

  • the model design;

  • the estimation of the risk parameters; and

  • the model’s performance beyond the evaluation of the impact of company processes and the evaluation of the judgmental revisions of in relation to the performance of the statistical components of the rating models.

4.6 Validation of the PD Model

As we can infer from Fig. 4.16 and Fig. 4.17, the validation of a PD model requires the use of both qualitative and quantitative analyses.

Fig. 4.16
figure 16

Rating system validation: areas of analysis

Fig. 4.17
figure 17

PD model validation: areas of assessment

The main relevant areas of a PD qualitative validation are:

  • the model’s design (model type, model architecture, default definition);

  • the rating process (attribution of the rating, IT requirements of the rating system); and

  • the use test (relevance of the rating information across the credit/reporting processes).

Conversely, a quantitative validation analysis focuses on:

  • the model’s discriminatory power; that is, the ability of the rating model to discriminate ex ante between defaulting and non-defaulting borrowers (rank ordering and separation tests);

  • the stability of the model and representativeness of the development samples over time; and

  • the model’s adequacy in associating a PD with each rating grade, which gives a quantitative assessment of the likelihood that graded obligors will default (concentration and calibration tests).

The following sections summarize the main analysis to be performed in the PD validation.

4.6.1 PD Model Design Validation

Model design validation is essentially about investigating the methodological approach selected to assess the credit risk profile of obligors assigned to the portfolio under consideration, the rationales supporting the choice, underlying architectural features and the definition of default addressed in the model.

Table 4.18 presents a possible checklist of analyses related to the area of model design validation, grouped by the three dimensions listed in Fig. 4.17: model type, model architecture and default definition.

4.6.2 PD Estimation Process Validation

Table 4.36 illustrates a list of analyses that should be executed during the estimation process validation.

Table 4.36 Model design validation analyses: PD parameter

For the dynamic properties of a rating system, refer to: Bangia et al. (2002), Lando and Skodeberg (2002), Bardos (2003) and Basel Committee on Banking Supervision (2005b). For the purposes of estimating risk parameters, banks may elect not to classify so-called “technical defaults” as defaulted – that is, positions that do not reflect a state of financial difficulty on the part of the obligor, such as to generate losses – so long as this is consistent with reference to the various risk parameters (see Bank of Italy 2006) (Table 4.37).

Table 4.37 Estimation process validation analyses: PD parameter

4.7 PD Performance Assessment and Backtesting

The performance assessment and backtesting consists in analyses such as those listed in Table 4.38.

Table 4.38 Performance assessment and backtesting: PD parameter

4.7.1 Process Impact on the PD Model’s Performance

Finally, regarding the process impact on the performance of the statistical model, Table 4.39 offers a possible analysis checklist. The quantitative valuation anlysis of PD estimation models are finalized to evaluate, on a ongoing basis:

Table 4.39 Process impact on the model’s performance: PD parameter
  • the ability of a model to discriminate the in bonis positions from the future defaults (ordering and separation tests);

  • its adequacy in representing the correct risk profile of the reference portfolio (calibration); and

  • the model’s stability and the development samples’ representativeness with respect to the current portfolio.

Next, we offer a brief description of the most common default probability validation tests on portfolio segments characterized by an enough number of defaults.

4.7.2 PD Discriminatory Power Tests

The accuracy ratio (AR) or Gini coefficient is the most common rank ordering power test: it measures the model’s ability to order a sample/population according to its level of risk.

The indicator assumes values between 0 and 1: the higher the AR, the greater the model’s discriminant power. A model that does not discriminate at all has a null AR, while the perfectly discriminating model is characterized by an AR (in absolute value) equal to 1. The Lorenz curve or cumulative accuracy profile (CAP) is the graphical analysis tool with which to evaluate the efficacy of a model’s ordering power.

The x-axis in Figure 4.21 shows the counterparts subject to evaluation rates from more to less risky according to the model’s score; the y-axis identifies the cumulative percentage of the insolvencies.

From this, we can obtain the CAP curve corresponding to the analyzed model; this is compared graphically with the curve of the perfect model and of the random model. The curve of the perfect model is obtained by assuming a model capable of assigning the worst possible scores to future insolvents; the random model – represented by the diagonal – corresponds to a model with no discriminant ability that uniformly distributes both in bonis and defaulted customers.

A “real” model falls unavoidably between the two curves: the better its discriminant ability, the closer its CAP curve will be to that of the perfect model.

The receiver operating curve (ROC) is a graphical representation of the “false alarm rate” (FAR) and “hit rate” (HR); this is obtained by letting the separation of solvent and future insolvent customers’ cut-off “C” vary from 0 to 1. The false alarm rate identifies the frequency of effectively solvent subjects that have been incorrectly classified as in default; the hit rate identifies the percentage of correct classification of future insolvents (see Fig. 4.18).

Fig. 4.18
figure 18

Cumulative accuracy profile: an illustrative example

Fig. 4.19
figure 19

Score distribution of good and bad positions of the sample

Fig. 4.20
figure 20

The cumulative distribution of bads and goods per score decile: an illustrative example

Fig. 4.21
figure 21

The Kolmogorov–Smirnov statistic per score decile: an illustrative example

Fig. 4.22
figure 22

An illustrative example of the percentage distribution of bad and default rates per score decile: development versus validation sample

Fig. 4.23
figure 23

An illustrative example of a comparison between default rate and PD per rating class

Fig. 4.24
figure 24

An illustrative example of the percentage distribution of bads and goods per rating class: validation sample binomial test usually includes in its workings the regular asset correlation with respect to different levels of confidence

The information contained in the ROC can be synthesized in the measure denoted as the area under the receiver operating curve (AUROC). The AUROC assumes a value of 0.5, corresponding to a random model with no discriminatory capabilities, and 1 in the event of a perfect model: the higher the value, the better the model.

The AUROC and the AR parameters are linked by the relation: AR = 2 AUROC - 1

The corrected Gini coefficient (Gini) is defined as: Gini = AR ⋅ (1 − DR) where DR represents the sample default rate.

In Table 4.40, the contingency tables synthesize, within the four possible quadrants illustrated, the information relative to the:

  • percentage of counterparties correctly foreseen in bonis by the model (Specificity);

  • percentage of bad counterparties incorrectly foreseen in bonis (Type I error);

  • percentage of good counterparties incorrectly foreseen in default (Type II error or FAR); and

  • percentage of bad counterparties correctly classified (Sensitivity or HR).

Table 4.40 Contingency table: an illustrative example

As shown in Fig. 4.20, the number of errors of the first and second type depend strongly on the cut-off value (C), settled as a separator of future default (counterparties characterized by a score value equal or less than C) from the futures in bonis (score value greater than the cut-off value).

In general, an error of the first type generates a loss corresponding to the capital and the interest lost due to the insolvency of a counterparty having been incorrectly classified as “healthy” and, hence, approved.

An error of the second type, conversely, produces a more limited loss (at least, in the corporate segment), originating from lost earnings in terms of fees and interest margin due to the incorrect classification of the healthy customer as a future insolvent. Once the cut-off has been defined, the following indicators are determined:

  • the misclassification rate (MR) – the percentage of counterparties wrongly classified (good as future default; bad as future solvent) over the whole sample positions set; and

  • the hit rate (HR) – the percentage of correct classifications of bads over the total of the defaulted positions.

Table 4.41 shows the two rates of correct (HR) and incorrect (MR) classification, coherent with the illustrative contingency table proposed in Table 4.40.

Table 4.41 Hit rate and misclassification rate: an illustrative example

The Kolmogorov–Smirnov distance (KS) evaluates the degree of separation between the solvent and defaulted positions, measuring the maximum vertical distance (in absolute values) between the empirical cumulative distributions of goods and bads. The variation in its values is the [0; 1] interval: the greater the index, the better the model’s separation ability.

On the basis of the KS computation, Figure 4.20 illustrates the cumulative distribution of goods and bads in the same sample; Fig. 4.21 compares the trends of the KS test on two different samples: development and validation.

For further insights into discriminant power tests, see Brier (1950), Bamber (1975), Lee (1999), Engelmann et al. (2003), Sobehart and Keenan (2004) and Basel Committee (2005b).

4.7.3 PD Calibration Tests

The aim of calibration analysis is to evaluate the accuracy of the estimated (and calibrated) PDs with respect to the default rates effectively observed per rating class. Such analysis has particular importance: a rating system that underestimates the probability of insolvency of one or more credit portfolio segments requires careful monitoring (and, in some cases, a deep revision), because the estimation of capital requirements could be not aligned with the risks effectively assumed by the bank. (Fig. 4.23)

Before beginning the calibration test, a series of descriptive analyses (both graphical and tabular) must be conducted to represent and compare by quantiles and rating classes:

  • the distributions, joint and separate, of the bads and goods of the estimation and validation samples; and

  • the trend and the level of the observed default rate, with respect to the PD forecast by the model.

Tables 4.42 and 4.43, and Figs. 4.21, 4.23 and 4.24 give some examples.

Table 4.42 The Kolmogorov–Smirnov statistic per score decile: an illustrative example
Table 4.43 An illustrative example of risk and distribution per rating class: validation sample

Generally, three types of tests are used to check the adequacy of the model to represent the correct risk profile of the reference portfolio, :

  • binomial (with and without asset correlation);

  • Hosmer–Lemeshow χ 2 (chi-square); and

  • the traffic lights approach.

The binomial test is based on a comparison, for every rating class, of the default rate observed values with the estimated PD. It is a “conservative”, unidirectional test applied to single classes and – in its original formulation – based on the default independence within the risk classes.

For a given level of confidence, the null hypothesis (H0) underlying the test is: “the PD estimated for single rating class is correct”; and the alternative hypothesis (H1) is: “the PD is underestimated”. As outlined in Basel Committee on Banking Supervision (2005b), the default independence hypothesis is not adequately confirmed by the empirical evidence. For this reason, the Hosmer–Lemeshow χ 2 (chi-square) test consists of overriding one of the binomial test limits: the verification of the model’s capacity at a single class level separated from the synthetic indication of the whole model calibration. The Hosmer–Lemeshow test applied to the whole portfolio presumes a default independence within and among the rating classes.

Setting a determined level of confidence, the test verifies the alignment between the estimated PDs and the number of observed defaults in the classes: a null hypothesis rejection can imply, therefore, both an underestimation, and an overestimation of the effective number of defaults. Finally, the traffic lights approach – applied to single rating classes – is a parametric test of a conservative type. Setting a determined level of confidence, it is possible to identify two thresholds – lower (PDinf ) and upper (PDsup) for each rating class (i = 1, ... , 10).

If the default rate observed in the class i (DR i ) is lower than PDinf , the test outcome is “green for go” (overestimation of the effective insolvency rate); if it is “red for stop” (underestimation) a re-calibration action is needed; otherwise the outcome is “yellow” (coherent estimation).

For further insights on calibration tests, see Blochwitz et al. (2003), Tasche et al. (2003) and Basel Committee on Banking Supervision (2005b).

4.7.3.1 PD Stability Tests

Stability analysis checks the alignment over time between the distributions of the development and validation samples, in order to identify possible differences that could originate future possible model instabilities.

Internal stability is evaluated by means of (i) the computation of the population stability index, and (ii) the transition matrix analysis.

The population stability index (PSI), is a synthetic indicator used to measure the representativity of the estimation sample with respect to the current portfolio, and for the stability of a single indicator or of the entire model, respectively, for bands of assumed values or for rating classes.

Once the variable subject to examination (e.g. the rating class), its possible modality (the 10 classes effectively evaluated) and the percentage distribution of the variable (with respect to the rating classes) of the estimation and validation samples have been identified, it is possible to define the PSI as follows:

$$ PSI={\displaystyle \sum_{i=1}^k\left({P}_i-{C}_i\right)}\cdot \log \left(\frac{P_i}{C_i}\right) $$

where k is the number of modalities subject to analysis (in this example, the 10 evaluated classes), P i (i = 1, … , k) denotes the percentage of the validation sample assigned to the class i, while C i (i = 1, … , k), the percentage of the estimation sample.

The indicator defined in this way assumes a value of between zero and +∞: the small values of PSI are expressions of a good level of stability/representativeness of the sample used for the model estimation; high values are a symptom of instability.

Transition matrices allow us to examine the evolution of the portfolio over time, highlighting possible variations in the positions of the different rating classes, both upgrading and downgrading.

The population stability degree is evaluated through the calculation of the permanence rate in the same class (persistence rate, or PR), the migration rates within one or two classes (migration rates M1C or M2C) with respect to the rating assigned initially and at the rating reversal analysis.

Table 4.44 shows figures and percentages of the class changes of opposite signs, inferred by the observation of the rating assigned across a consecutive three-year horizon, confirming the stability over time of the PD model adopted for illustrative purposes.

Table 4.44 An illustrative example of rating reversal analysis over three consecutive years