Predicting US Banks Bankruptcy: Logit Versus Canonical Discriminant Analysis

Affes, Zeineb; Hentati-Kaffel, Rania

doi:10.1007/s10614-017-9698-0

Predicting US Banks Bankruptcy: Logit Versus Canonical Discriminant Analysis

Published: 31 May 2017

Volume 54, pages 199–244, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Computational Economics Aims and scope Submit manuscript

Predicting US Banks Bankruptcy: Logit Versus Canonical Discriminant Analysis

Download PDF

1337 Accesses
22 Citations
Explore all metrics

Abstract

In this paper, we use random subspace method to compare the classification and prediction of both canonical discriminant analysis and logistic regression models with and without misclassification costs. They have been applied to a large panel of US banks over the period 2008–2013. Results show that model’s accuracy have improved in case of more appropriate cut-off value $C^*_{ROC}$ that maximizes the overall correct classification rate under the ROC curve. We also have tested the newly H-measure of classification performance and provided results for different parameters of misclassification costs. Our main conclusions are: (1) The logit model outperforms the CDA one in terms of correct classification rate by using usual cut-off parameters, (2) $C^*_{ROC}$ improves the accuracy of classification in both CDA and logit regression, (3) H-measure and ROC curve validation improve the quality of the model by minimizing the error of misclassification of bankrupt banks. Moreover, it emphasizes better prediction of banks failure because it delivers in average the highest error type II.

Predicting Financial Distress of Banks Using Random Subspace Ensembles of Support Vector Machines

Two-stage adaptive integration of multi-source heterogeneous data based on an improved random subspace and prediction of default risk of microcredit

Article 11 November 2020

Bankruptcy Prediction using the XGBoost Algorithm and Variable Importance Feature Engineering

Article 23 January 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The financial crisis starting in 2007 is considered as the first real crisis of excess financial complexity. It has illustrated the degree of the existing inter-connectivity between banks and financial institutions and highlighted the phenomenon of contagion that might exist in the interbank market. Since then, a swarming literature has been developed on the subject of quantification, prediction and control of systemic risk.

One of the methods proposed to prevent contagion of bank failures is to assess the bank failure rate. This approach helps to establish an early warning model of bank difficulties. Thus, interactions between solvency and refinancing risk can identify banks which have the most difficulties to refinance and therefore it would be perceived as risky by the other institutions. This flagging process would limit counterparty risk and warn the financial authorities of a pending liquidity risk in case of default of these banks.

The financial literature is rich of methods and models which aim to identify institutions whose financial situations appear alarming and call supervisors to act. In this study, we use random subspace method to compare the classification and prediction of both Canonical Discriminant Analysis (CDA) and Logistic Regression (LR) model with and without misclassification costs, applied to a large panel of US banks over the period 2008–2013.

Main questions raised in our paper are how to aggregate classifications results, what EWS should be proposed to regulators and also how Area Under Curve and H-measure (Hand 2009) perform classification problem?

The specificity of our study lies in the extensive list of used financial ratios (solvency ratios, quality of assets, cash or liquidity...) and the depth of sample (US banks:1224 per years) including large ans small bank. The choice of the period is justified by the high number of failed banks which have taken place there. Also, we show in this paper how the clustering quality is improved by using a more appropriate cut-off. By contrast, studies in the literature wrongly use the probability “0.5” as a cut-off value in the logit and “0” in the CDA model. Moreover, the majority of papers discussing the same topics did not consider the cost of misclassifying an active or a defaulted bank. Our main contribution here is the use of the misclassification cost to enhance the conformity between obtained results and reality.

The empirical literature distinguishes two methods: parametric and non-parametric validation. Beaver (1966) was the pioneer in using a statistical model for predicting bankruptcy. The approach is to select from thirty financial ratios those which are the most effective indicators of financial failures. The study concludes that the (Cash flow/total debt) ratio is the best forecasting indicator.

Altman (1968) tested Multiple Discriminant Analysis (MDA) to analyze 70 companies, first by identifying the best five significant explanatory variables from a list of 22 ratios and then by applying the (MDA) to calculate a Z-Altman score for each company. This score was almost accurate in predicting bankruptcy one year ahead. This model was then subsequently improved in Altman and Narayanan (1997) by proposing the zeta model that includes seven variables and classified correctly 96% of companies one year before bankruptcy and 70% five years before bankruptcy.

Since then, the use of discriminant analysis has grown through the different published studies (Bilderbeek 1979; Ohlson 1980; Altman 1984; Zopounidis and Dimitras 1993...). The vast majority of studies achieved after 1980 used the logit models to overcome the drawbacks of the DA method (Zavgren 1985; Lau 1987; Tennyson et al. 1990...). The logit analysis fits linear logistic regression model by the method of maximum likelihood. The dependent variable(the probability of default) gets the value “1” for bankrupted banks and “0” for healthy banks.

Numerous comparative studies were carried out (Keasey and Watson 1991; Dimitras et al. 1996; Altman and Narayanan 1997; Wong et al. 1997; Olmeda and Fernández 1997; Adya and Collopy 1998; O’leary 1998; Zhang et al. 1998; Vellido et al. 1999; Coakley and Brown 2000; Aziz and Dar 2004; Balcaen and Ooghe 2006; Balcaen et al. 2004; Kumar and Ravi 2007). However, the supremacy of one method over another remains subject to various controversies because of the heterogeneity of used data in the validation (database, number of points in the data, sample selection, validation methods for forecasting, the number and the nature of explanatory variables tested in the models (financial, qualitative...).

The aim of this paper is twofold: descriptive and predictive. Descriptive is to be understood as a detailed analysis of used models inputs from financial and statistics point of view. Thus, we proceed by describing and analyzing key financial ratios of the active and non-active banks for the entire period from 2008 to 2013.

We combined two parametric models (Canonical Discriminant Analysis and Logit) with the descriptive Principal Component Analysis model (PCA) to construct an Early Warning System (EWS).

First, (PCA) reduced the dimension size of data and insure an uncorrelated blend of variables framework. Then, factor scores were estimated for each bank. These scores were used to estimate (CDA) and Logit models.

One among the important results of this paper is to have compared several methods to calculate the theoretical value of the probability of default that will serve as threshold to split the bank universe into two set : failed and healthy.

The paper consists of five sections. After the introduction, an overview of the existing literature dealing with the bank failure prediction is given. Section 2 describes used data and the methodology. In Sect. 3 we implement Principal component Analysis on our data. Section 4 provides the empirical results with and without misclassification costs. Final section contains concluding remarks.

2 Description of the Methodology and the Variables

This section focuses on the data gathered for the estimation of our models. We begin with the description of the collected data and the variables selection process. Next, we present the financial and economic ratios, then we provide some descriptive statistics and correlation analysis.

We examine whether one can enhance bankruptcy prediction accuracy by a careful examination of the functional relationship between explanatory variables and the probability of bankruptcy.

2.1 Data Description and Methodology

We proceed to the constitution of our database of US banks from mainly two sources: “BankScope” and FDIC. It covers the period 2008–2013. Statistics shows that the period from 2008 to 2013 was marked by a wave of bank failures in the United States: more than 450 bank failures.

After data reprocessing, sampled banks were split into two categories: active banks and non-active banks. Non-active banks are those which have been declared as bankrupted by the Federal Deposit Insurance Corporation (FDIC). The information on the identity and the bank’s balance sheet data are obtained from the FDIC website. Indeed, all US banks must report their financial statements in the Uniform Bank Performance Report. Some treatments have been applied to our sample to allow homogeneity between banks. A bank that has been declared bankrupted in the first quarter of year “N” will be reclassified and considered as bankrupted in late “N−1”.

Banks declared bankrupt by the FDIC after 01/04/N and for which there is no information for the current year, will be considered as inactive for the year “N”. For banks that will bankrupt later and for which data are available on 31/12/N, will be considered as active for “N”. Financial variables of active banks were retrieved from the database “BankScope”. We notice that, data were available for only 928 banks each year in the period 2008–2013.

On this basis, the number of failed banks was reduced to 410 over the entire period. Table 1 gives more details as such constituted database.

To investigate robust performance of classification and prediction models, we split data in to Testing Set (TES) and Training Set (TRS). In fact, it was proven that classification tends to favor active banks (AB) which represent the majority class.

Thus, a random sampling was used to avoid the selection bias due to the concentration of specific bank-year samples in Training Set (TRS). (TES) represent 20% of data and were randomly selected.

Otherwise, one of the main questions raised in this study is how to improve parametric model accuracy by using the best cut-off value to classify banks.

Table 1 Data analysis

Categories CAMEL	Variables	Definition
Capital adequacy	EQTA	Total equity/total assets
Capital adequacy	EQTL	Total equity/total loans
Assets quality	NPLTA	Non performing loans/total assets
	NPLGL	Non performing loans/gross loans
	LLRTA	Loan loss reserves/total assets
	LLRGL	Loan loss reserves/gross loans
Earnings ability	ROA	Net income/total assets
Earnings ability	ROE	Net income/total equity
Liquidity	TLTD	Total loans/total customer deposits
Liquidity	TDTA	Total customer deposits/total assets

Predicting US Banks Bankruptcy: Logit Versus Canonical Discriminant Analysis

Abstract

Similar content being viewed by others

Predicting Financial Distress of Banks Using Random Subspace Ensembles of Support Vector Machines

Two-stage adaptive integration of multi-source heterogeneous data based on an improved random subspace and prediction of default risk of microcredit

Bankruptcy Prediction using the XGBoost Algorithm and Variable Importance Feature Engineering

1 Introduction

2 Description of the Methodology and the Variables

2.1 Data Description and Methodology

2.2 Variables: Review of the Literature

3 Principal Component Analysis

3.1 Statistic Description of Variables

3.2 PCA’s Results

4 Empirical Results

4.1 Score’s Results by Model

4.1.1 Canonical Discriminant Analysis

4.1.2 Logit Regression

4.2 The Predictive Performance of Models According to the Cut-Off Value

4.2.1 Prediction Accuracy of the Traditional D* Versus the Minimization of Errors (\(C^*_{CDA}\)) Cut-Off

4.2.2 Prediction Accuracy of the Traditional P* = 0.5 Versus the Minimization of Errors (\(C^*_{Logit}\) ) Cut-Off

4.2.3 Conclusion

4.3 Performance of Predictability of Models with Cost

4.3.1 Cost of Misclassification

4.3.2 H-Measure

4.4 Bankruptcy Prediction as a Classification Problem

5 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation