1 Introduction

The financial crisis, which started in 2007, has dramatically affected banks sector throughout the world. The shock wave epicenter was in the US and it took long time for regulators to stop the default chain and save big banks. Therefore, the prevention against the systemic risk - failure of the banking system- becomes an ineluctable concern and the need of new forecasting tools is of major importance to not only regulators but also academics.

In this sense, federal banking supervisors [the Federal Reserve, the Federal Deposit Insurance Corporation (FDIC), and the Office of the Comptroller of the Currency (OCC)] and other financial supervisory agencies provide a supervisory rating (convenient summary of bank conditions at the time of an exam). This helps investors to detect banks that have a great default probability ratio. A key outcome of such an on-site exam is a CAMELS rating. The acronym “CAMEL” refers to the five components of a bank’s condition that are evaluated: Capital adequacy, Asset quality, Management, Earnings, and Liquidity. A sixth component, a bank’s Sensitivity to market risk, was added in 1997; hence the acronym was changed to CAMELS.

The FDIC developed also a Statistical CAMELS Off-site Rating system (SCOR) to perform the bank’s stability evaluation. Collier et al. (2003) examine the performance of this model over the period 1986–2002 and point out the limitations of this model despite the usefulness of SCOR which is based only on financial ratios. Cole and Gunther (1995) prove the same results and report that the CAMELS ratings decay rapidly.

Predicting bank bankruptcy has reached a specific interest in financial literature. Thus, numerous models have been developed since the early 70s. All proposed models are based on classification methods in a multidimensional space defined by a set of specific variables.

The literature is rich of non-parametric and parametric models. With regard to the later, Beaver (1966) was one of the first researchers who focused on univariate analysis to study bankruptcy prediction. He tested the power of financial ratios to classify and predict bankrupt firms. Cash flow and debt ratios appeared to be the important predictors of bankruptcy. Altman (1968) used Multivariate Discriminant Analysis (MDA) to develop a five-factor model to calculate the well-known “Z-score” and predict bankruptcy of manufacturing firms.

As an examples of well-known statistical methods to predict failure, the logistic regression (logit) (Ohlson 1980; Demirgüç-Kunt and Detragiache 1997; Affes and Hentati-Kaffel 2017), Probit analysis (Zmijewski 1984; Hanweck et al. 1977) and factor analysis. West et al. (1985) demonstrated that that the combination of factor analysis and Logit estimation was promising in evaluating banks’ condition. The factors identified by the Logit model as important descriptive variables for the banks’ operations are similar to those used for CAMELS ratings.

Martin (1977) used both Logit and MDA statistical methods to predict bank failures. Results proved that the two models have similar classifications of defaulted and healthy banks. In the same sense, Jones and Hensher (2004) presented mixed Logit model for distress firm prediction and compared it with Multinomial Logit Models (MNL). They concluded that mixed Logit obtained substantially better predictive accuracy than Multinomial Logit models.

Affes and Hentati-Kaffel (2017) identified over the period 2008–2013 US banks leading to bankruptcy by conducting a comparative analysis based on both Canonical Discriminant Analysis and Logit models and highlighted also that suitability of models is improved by comparing different optimal cut-off score.

A second stream of literature tested non-statistical methods to set up a model for default prediction. For example, Kolari et al. (2002) use a Trait Recognition Model (TRA) (a kind of the image recognition algorithm). Empirical results recognized that these methods perform better than the statistical approach. Lanine and Vander Vennet (2006) showed that TRA approach outperforms Logit in predicting failures among Russian commercial banks.

Other non-statistical methods include Intelligence techniques such as induction of classification trees and Neural Networks methods (NM). NM procedures follow a process similar to the human brain and contain mathematical and algorithmic elements that mimic the biological neural networks of the human nervous system (see Odom and Sharda 1990; Lenard et al. 1995; Zhang et al. 1999; McKee and Greenstein 2000; Anandarajan et al. 2001). Boyacioglu et al. (2009) tested various neural networks techniques and multivariate statistical methods to the problem of predicting bank failures in Turkey and presented a comparison of the classification performances of the techniques tested. They used similar financial ratios to those used in CAMELS ratings.

More recently, Decision Trees (DT) and regression tree (CART) methods were used to implement the financial distress prediction. Chen (2011) compared empirically DT with Logit for Taiwan firms, and found that DT got higher accuracy than Logit in short run (less than 1 year), while Logit performed better in long run (above one and half year).

Iturriaga and Sanz (2015) developed a hybrid neural network model to study the bankruptcy of U.S banks by combining a Multilayer Perceptrons (MLP) network and Self-Organizing Maps (SOM). They found that the MLP-SOM can detect 96.15% of the failures in the period between May 2012 and December 2013 and outperforms traditional models of bankruptcy forecast.

De Andrés et al. (2011) and Sánchez-Lasheras et al. (2012) combined Multivariate Adaptive Regression Splines (MARS) model with fuzzy clustering and Self Organized Map (SOM). They found that these hybrids models outperform a single classification models (i) LDA, NN (Feed-forward neural networks) and single MARS (ii) NN (back propagation neural networks) and MARS in terms of correct classification and of the identification of the companies that go bankrupt.

In this paper, we aim to model the relationship between ten financial variables and default probability of US banks by using the so-called non-parametric or flexible models. The main purpose is to test the accuracy of non-parametric methods of classification, to increase their prediction ability and to reduce misclassification problem. Here, we propose a blend of k-means and MARS model. We suggest, for the first time in the bank failure literature, the use of MARS and also a hybrid model which combines K-means and MARS model.

The method consists, in addition to Classification And Regression Trees (CART), to validate the Multivariate Adaptive Regression Splines (MARS) model which gained an increasing interest in financial literature. MARS was first proposed by Friedman (1991). The main advantage of this model is the capacity to explore the complex nonlinear relationships between response variable and various predictor variables.

Unlike the used methodology, the empirical part newly contributes to the existing literature by implementing all these models to a large universe of US banks, over the period spanning 6 years, from 2008 to 2013, under a K-fold Cross validation. According to the size of our data set we apply a 10-fold cross validation to separate it into training and testing sets. In order to evaluate models fitting, we consider the confusion matrix for both the training and the testing samples. Also, we use the Receiver Operating Characteristic Curves (ROC) for evaluating classification success. Finally, we evaluate the performance of each model according to the Area under the ROC Curve.

Our main findings highlights the promising functionality of MARS model and suggest that:

  1. (i)

    Either in training or in the testing sample, MARS provide better correct classification than CART model in average (97.88–97.58% versus 95.02–93.4%)

  2. (ii)

    Hybrid approach enhanced the classification accuracy for the training sample

  3. (iii)

    Relying on misclassification rate of bankrupt banks, MARS underperformed, especially in 2008, 2009 and 2010

  4. (iv)

    According to the AUC of the Receiver Operating Characteristic Curve we note the supremacy of the hybrid model

  5. (v)

    Finally, CART provides a better interpretation of splitting values at the root node variables. Splitting values can be considered as early warning signals. CART method allows, in this sense, to carry out and to define the target values sheets that the regulators should take into account in order to be able to identify upstream banks in difficulty.

The paper is organized as follows. Section 2 presents the methodology and data used. Section 3 describes results of CART and MARS models. In Sect. 4, we analyze used models’ predictability. Section 5 concludes. Models outcomes are in the Appendix.

2 The model

For empirical validation, we consider a large panel of US banks. We collect data for active banks (AB) from BankScope and non-active (NAB) ones from FDIC, over the period from 2008 to 2013.

We extract all ratios to build 10 financial variables, detailed as follow:

Categories CAMEL

Variables

Definition

Capital adequacy

EQTA

Total equity/total assets

EQTL

Total equity/total loans

Assets quality

NPLTA

Non performing loans/total assets

NPLGL

Non performing loans/gross loans

LLRTA

Loan loss reserves/total assets

LLRGL

Loan loss reserves/gross loans

Earnings ability

ROA

Net income/total assets

ROE

Net income/total equity

Liquidity

TLTD

Total loans/total customer deposits

TDTA

Total customer deposits/total assets

The choice of these ten ratios was conducted and justified by an abundant literature (Sinkey 1979; Martin 1977; Thomson 1991; Barr and Siems 1994; Pantalone et al. 1987; Godlewski et al. 2003).

We adopt the same rule of bank status selection as in Affes and Hentati-Kaffel (2017). Thus, the number of (NAB) banks was 410 over the entire period 2008–2013. The total number of active banks obtained in 2013 is 835, 850 in 2012, 883 in 2011, 955 in 2010, 1077 in 2009 and 1205 in 2008.

However, it was proved that classification tends to favor dominant category, here active banks (AB). This means that the original database has a highly skewed distribution. To create homogeneous groups, we apply K-fold Cross validation.

We apply a 10-fold cross validation to separate our data set into training and testing sets. 10-fold is the most widely used number of fold in cross validation.

The procedure for each model is the same and summarized as follow:

  1. 1.

    For each of 10 experiments, we use 9 folds for training and the remaining one for testing,

  2. 2.

    we reiterate 10 times for each 10-fold cross validation experiments,

  3. 3.

    we select parameters of the best model and then we minimize the cross validation error rate.

In order to evaluate the classification suitability of models, we established the confusion matrix for the training and the testing samples. Moreover, we use the Receiver Operating Characteristic Curves (ROC) to evaluate classification performance.

The ROC curve presents the possible distributions of scores for the banks. We determine the optimal cut-off value that maximize the correct classification rate (default and healthy banks correctly classified) and then classify the banks as a potential default bank when the score is higher than the cut-off or as healthy if the score is lower than the cut-off value.

3 Running classifications methods

3.1 Multivariate adaptive regression splines (MARS) implementation

Introduced by Jerome Friedman in 1991, Multivariate Adaptive Regression Splines is a form of stepwise linear regression which can model non-linearity between variables.

MARS is based on three parameters:

  1. 1.

    the maximum number of basis functions (term)

  2. 2.

    the smooth parameter (called also the penalty parameter), which is recommended to be equal to 3

  3. 3.

    and the maximum number of iteration between variables (equal to 2) (see Andalib and Atry 2009).

In MARS, the basis function (term) is fitted to segregate independent variable intervals by using recursive splits. In this model, all possible splitting points are estimated with a linear spline (also called piecewise polynomials). The best splitting point (knot positions) is the one for which the model extensions minimize a squared error criterion. Knot is the point at which two polynomial pieces connect. The best splitting point is the one for which the model extension minimize a squared error criterion. Knots uses two-sided truncated power functions as spline basis functions, described in Eqs. (1) and (2)

$$\begin{aligned} h^{+}\left( \varkappa ;t\right)= & {} \left[ +\left( \varkappa -t\right) \right] _{+} \end{aligned}$$
(1)
$$\begin{aligned} h^{-}\left( \varkappa ;t\right)= & {} \left[ -\left( \varkappa -t\right) \right] _{+} \end{aligned}$$
(2)

where \(\left[ q\right] _{+}=\max \left\{ 0,q\right\} \) and t is a univariate knot. MARS is represented as a combination of piecewise linear or hinge functions. The later have a knot or hinge at t, are zero on the one side of the knot and are linear on the other side.

The MARS fit a linear model in basis functions \(\left\{ h_{m}\left( \mathbf {\varkappa }\right) _{m=1}^{M}\right\} \):

$$\begin{aligned} \widehat{f}_{M}\left( \mathbf {\varkappa };\beta \right) =\beta _{0}+\sum _{m=1}^{M}\beta _{m}h_{m}\left( \mathbf {\varkappa }\right) , \end{aligned}$$
(3)

where \(h_{m}\left( \mathbf {\varkappa }\right) \) is a basis function of the form described below in Eqs. (1) and (2), M is the number of linearly independent basis functions, and \(\beta _{m}\) is the unknown coefficient for te mth basis function.

As mentioned above, a knot is the point in a range at which the slope of the curve changes. Both the number of the knots and their placement are unknown at the beginning of the process. A stepwise procedure is used to find the best points to place the spline knots. In its most general form, each value of the independent variable is tested as a possible point for the placement of a knot. The model initially developed is overfit (forward phase). A statistical criterion (generalized cross-validation) that tests for a significant impact on a goodness of fit measure is used to remove knots. Only those that have a significant impact on the regression are retained (backward phase).

MARS can perform regressions on binary variables. In binary mode, the dependent variable is converted into a 0 (non-failed banks) or a 1 (failed bank). Ordinary Least Square (OLS) regression is then performed.

We conduct our methodology in three steps:

  1. 1.

    We apply a 10-fold cross validation to divide our data sets into training and testing, and we repeat this process 10 times to get different partition of the database.

  2. 2.

    We choose the best number of basis functions according to the model that minimizes the misclassification rate. MARS does not display relationships in terms of the original 10 financial ratios, but reclassifies the target-predictor variable dependencies into a set of basis functions (BFs to represent the calculated splines. To find the optimal number of BFs and target values, MARS employs a forward/backward stepwise approach to determine the knot points in the data set. At the beginning, the model is tested by allowing more basis functions than are needed (100 BFS). Afterward, this model is shrunken to describe an optimal model. During this stage, basis functions are removed one by one from the over-fit model based on a ‘residual sums of squares’ criterion. The best model will have a GCV \(\hbox {R}^{2}\) score converging to 1 (see Table 1).

  3. 3.

    We conduct the analysis of variance (ANOVA) decomposition procedure to assess the parameter relative importance based on the contributions from the input variables and the BFs (see Tables 2 and 3). In fact, interpretation of the MARS model is greatly facilitated through its ANOVA decomposition.

For purpose of simplification, only 2013 basis functions and corresponding equations are detailed in this section. Results for other years are detailed in the Appendix.

We start our analysis by detecting interaction between variables.

Table 1 delivers the best MARS model selected based on on the 10 times repeated K-fold Cross validation by minimizing the misclassification rate and maximizing ROC. The optimal model has the lowest value of GCV, an index for measuring generalized mean square errors. We use a backward method by minimizing the value of GCV .

Table 1 MARS models: N basis functions

Tables 2 and 3 displays the ANOVA decomposition of the built MARS models and exhibits the importance of each ratio in the model. Generalized Cross Validation (GCV) gives the amount of degradation in the model when a ratio is deleted. A model with minimum GCV should be chosen. In fact, GCV reaches its lowest value where the quantity of error is most minimized in the model.

Table 2 MARS versus CART: variables importance
Table 3 MARS: ANOVA decomposition

For 2013, the liquidity ratio (TDTA) is the most important variables. Both of capital adequacy variables and the assets quality proxy (NPLGL) have 51.22, 30.61 and 40.53 percent of importance, whereas percent of the rest of variables are null.

According to the ANOVA decomposition outputs, we note that function 1 has the greatest effect on the model with a GCV score of 0.01452 which means that the most important variable (TDTA) impacts significantly the dependent variable.

Basis functions (BF) in 2013 are as follow:

$$\begin{aligned} \begin{array}{l} BF1 = max( 0, TDTA - 0.9180);\\ BF6 = max( 0, 0.0132 - EQTA) * BF1;\\ BF28 = max( 0, TDTA - 0.9310);\\ BF32 = max( 0, NPLGL - 0.1040);\\ BF37 = max( 0, 0.0618 - EQTA) * BF32;\\ BF48 = max( 0, EQTL - 0.0297) * BF28;\\ \end{array} \end{aligned}$$

The above basis functions prove the non-linear relationships between the dependent and independent variables.

The final model is expressed as follow:

$$\begin{aligned} Y = 0.0055 - 462.114 * BF6 + 19.3076 * BF28 + 59.9474 * BF37 - 357.153 * BF48; \end{aligned}$$
(4)

It appears that, for example, in BF1, on variable TDTA (liquidity proxy), data is grouped into two sets: the first one is assigned 0 for all TDTA values that are below a threshold (e.g., c: 0.9180) and the second set contains values exceeding 0.9180. The BF1 does not appear in the final model but it contributes in the construction of BF6. Indeed, BF6 is defined as a combination between two variables EQTA and TDTA. This basis function has a negative effect on the target variable only when the value of EQTA is less than 0.0132 and the value of TDTA is greater than 0.9180.

In the final model MARS produces a single regression equation, taking into account only BF6, BF28, BF37 and BF48 (see Table 4). Thus, BF1 and BF32 were removed from the regression model because they have an indirect effect and they especially create BFs assigned in the model.

The viability of the bank depends positively on the (TDTA) variable with a positive beta coefficient of 19.3976. Thus, the greater the liquidity caused by the total customer deposit, the better is the financial health of the bank. The MARS model also suggests a negative correlation with BF6. Thus, if a bank has a TDTA > 0.9180 and a level of (Total equity/Total asset) < 0.0132 then this has a negative influence on the score attributed to each bank studied.

Table 4 MARS: final model (after backward stepwise elimination)

3.2 CART algorithm to build tree classifiers

The CART (Classification and Regression Trees) algorithm proposed by Breiman et al. (1984) is a widely used statistical procedure. It produces classification and regression models via tree-based structure. It is based on a hierarchy of univariate binary decisions and operates by selecting the best variable for splitting the data into two groups at the root node. CART is a form of binary recursive partitioning in which partitions can be split into sub-partitions.

This classifier assigns a predicted class membership obtained under a specific measurement (\(x_{1}, x_{2}, \ldots , x_{k}\)). Let X the measurement space of all possible values of X. Tree’s classifiers are constructed by making repetitive splits of X and the subsequently obtained subsets of X. As consequence, a hierarchical structure is formed.

In Finance feature, Frydman et al. (1985) were the first who employed decision trees to forecast default. After that, many research highlighted the accuracy of this method to predict bankruptcy (Carter and Catlett 1987; Gepp et al. 2010).

To build a tree by CART, the procedure should specify a number of parameters:

  1. 1.

    the splitter that will allow to visualize the left branch if the splitter’s variable < value split.

  2. 2.

    the competitor identifier variable. In our bank failure problem, the dependent variable is either bankrupt or non-bankrupt, so classification tree is suitable for our case. Under this assumption and rules, we implement CART on our data.

CART methodology consists of three steps: (i) Construction of maximum tree (ii) Choice of the right tree size (iii) and classification of new data using constructed tree.

Table 5 CART: root node competitors in order of improvement

Table 5 identifies the node competitors by order of improvement. Thus, at the upper levels of the tree there are more significant variables, and less significant at the bottom of the tree. Splitting in regression trees is made in accordance with squared residuals minimization algorithm which implies that expected sum variances for two resulting nodes should be minimized.

In fact, the process of finding the smallest tree that fits the data could reduce the number of important variables. The tree obtained, is the one that yields the lowest cross-validated error.

Table 2 exhibits the importance of each ratio in the building of CART tree. For example, from 2009 to 2013, capital adequacy variables are the most important ones. However, in 2008, the (NPLGL) is the most important variable.

Figures 1, 2, 3, 4, 5 and 6 illustrate the final classification Tree.

In 2008, as showed by Fig. 1, for the first node, splits are formed on (NPLGL) ratio and in a particular location (0.08). NPLGL produces the greatest “separation”.

This tree is split at each branch by a decision boundary [left is Yes (class “0”), right is No (class “1”)]. Thus, if a bank has a ratio \( NPLGL\le 0.08\), then the bank is considered as healthy (AB) and non-active (NAB) otherwise.

Classification must normally converge to the following clusters: among the 1242 banks analyzed in 2008, 97% are in (AB) group (1205 banks) and only 37 banks (3%) are in the (NAB) group. According to the right branch of the tree (\(\hbox {NPLGL} > 0.08\)), among the 1242 banks, 160 are classified in (NAB) (class 1). However, only 33 banks were actually NAB, yielding to a misclassification of 127 (AB). Looking at the left branch of the tree (\( NPLGL\le 0.08\)), 1078 banks are classified in the (AB) group. Among these 1082 banks, 1078 banks are actually active banks meaning that 4 (NAB) are misclassified. In the left branch of the second level, a second distinction is based on the target value of “\(-\,0.05\)” of the ratio ROA. For banks having a ROA less than \(-\,0.05\), among the 1082 (AB), only 25 are classified as (NAB) and for banks having a ROA ratio \(> -\,0.05\), 1057 are classified as (AB). Among the 25 banks classified as (NAB), only 2 are actually (NAB) meaning that 23 (AB) are misclassified. Among the 1057 banks classified as (AB), 1055 are actually (AB) meaning that 2 (NAB) are misclassified. In the right branch of the second level, a second distinction is based on the target value of “0.07” of the ratio EQTA. For banks having an EQTA less than 0.07, among the 160 (NAB), 82 are classified as (NAB) and for banks having an EQTA ratio > 0.07, 78 are classified as (AB). Among the 82 banks classified as (NAB), 32 are actually (NAB) meaning that 50 (AB) are misclassified. Among the 78 banks classified as (AB), 77 are actually (AB) meaning that only 1 (NAB) is misclassified.

In addition to this hierarchical clustering, CART classification allows regulator to provide a lot of information about banks by pointing to banks that show signs of financial fragility. Indeed, in our case we have checked that banks that were misclassified by CART actually default in the following years (from 2008 to 2013).

Fig. 1
figure 1

Classification tree diagram 2008

Fig. 2
figure 2

Classification tree diagram 2009

Fig. 3
figure 3

Classification tree diagram 2010

Fig. 4
figure 4

Classification tree diagram 2011

Fig. 5
figure 5

Classification tree diagram 2012

Fig. 6
figure 6

Classification tree diagram 2013

A second interpretation linked to the early warning system, values of splitters allows the regulator to set target values that should be used to detect these suspected banks.

For 2008, we can consider that banks with simultaneously an \(NPLGL \le 0.08 \) and \(ROA \le -0.05\) or \(\hbox {NPLGL} > 0.08\) and \(EQTA \le 0.07\) should be classified in the group of fragile banks (bordering on default zone) which must be controlled.

4 Models accuracy and prediction results

In this section, models’ accuracy analysis relies on the confusion matrix estimation. Type I error happens when the model incorrectly predicts a (NAB) to survive, whereas a type II error occurs when the model predicts an (AB) to go bankrupt. Thus, the predictive capacity of model’s classification is based on sensitivity and specificity rates. Accuracy rates are the proportion of the total number of predictions that were correct and specificity is calculated as the number of correct (NAB) predictions divided by the total number of (NAB). The best specificity is 1, whereas the worst is 0. We notice that the choice of the cut-off is crucial. Indeed, in a crisis period type I error decreases but at the same time, the number of (AB) classified as (NAB) increases, bringing on an suitableness cost in terms of economic policy.

Additionally, in this paper the quality of models has been measured with the use of the Receiver-Operating characteristic (ROC) curves and also the Area under (ROC) Curve. (ROC) curve shows the relation between specificityFootnote 1 and sensitivityFootnote 2 of the given test or detector for all allowable values of threshold (cut-off). In a ROC curve the true positive rate (Sensitivity) is plotted in function of the false positive rate (1-Specificity) for different cut-off points of a parameter. Each point on the ROC curve represents a (sensitivity/specificity) pair corresponding to a particular decision threshold.

In this section, we test the performance of MARS, CART and K-MARS models by comparing its classification and predictions with the actual bankruptcies between 2008 and 2016.

4.1 MARS versus CART

Tables 6, 13 and 12 displays sensitivity, accuracy rates and misclassifications rates (Type I and II errors) both in Testing Sample and Training Sample.

For 2013 and in Testing Sample (TS), MARS model was able to correctly classify 99.29%. Only three (NAB) was misclassified (type I error: 20%) and three (AB) banks were considered by the model as (NAB) (type II error 0.36%). The same results were observed in the “Training Sample, (TRS)”. The results for 2012 and 2011 were much better according to type I error, both in the “Testing Sample” and “Training Sample, (TRS)”.

For 2008, 2009 and 2010, MARS underperformed CART by sensitivity scale. For example, Type I error rate was relatively high in 2008 [59.46% in the (TS)]. However, percentage of (AB) correctly predicted was 100, 98.05 and 98.32% respectively for the years 2008, 2009 and 2010 in the (TS). On the other hand, in terms of correct classification rate, MARS model performed better for 2008 (98.23%) than 2009 (94.7%) and 2010 (96.94%).

To summarize, we conclude that MARS model has a good predictive performance, measured by its ability to reduce the type I error and also by generating the best signal to to monitor fragile banks among the (AB).In this sense, MARS would be a powerful tool to enhance identification of the most outstanding patterns to forecast banking distress.

Tables 7, 13 and 12 highlighted also that (CART) produces a high level of correct classification for 2013.

We observe that 98.59% of the banks in the (TNS) are correctly classified and 97.53% of the banks in (TS) are classified in their adequate groups. CART for the (TNS) classify correctly all the failed banks (sensitivity: 100%) and for the testing only one (NAB) was predicted as (AB) (error type I: 6.67%). We also notice that, in the (TNS), Type I error is null for 2012.

However, results obtained for 2008, 2009 and 2010 showed that CART didn’t procure a high correct classification rate. In fact, for 2009 we note a correct classification rate of about 89% in both (TNS) and (TS). Moreover, for this period, CART model provided a high misclassification rate. We noticed an average of 8.99% for Type II error and 8.59% for Type I error in the (TS).

Finally, in order to validate the failure prediction efficiency of the two non-parametric models, we propose to analyze the type II error as follows: From 2008 to 2013, we retrieve the (NAB) banks classified by MARS and CART models. We identify banks that are misclassified ((AB) that the model has considered as failing).

We check the survival of each bank for the next (N + 5) years. The results are summarized in Tables 8 and 9.

For example, for the CART model, 95 banks were incorrectly classified in 2009. 69.30% will default in 2010, 21.05% in 2011, 8.77% in 2012, and 0.88% in 2013. For MARS, the number of misclassified banks is less important. Among the 19 misclassified, which will be failing in the next 5 years, 94.74% will default in 2010 and the rest in 2012.

We can thus pick up the following conclusions: studied models provide a powerful EWS (Early Warning System). On average, CART put a red flag on 66.11% of unstable banks in year \(\hbox {N} + 1\). The MARS model is much more efficient and prevents the failure, in year \(\hbox {N} + 1\), of 80.81% of analyzed cases.

Table 6 MARS accuracy and prediction results
Table 7 CART accuracy and prediction results
Table 8 Analysis of type II error achieved with the CART and MARS
Table 9 Analysis of predictability power of CART and MARS
Table 10 K-means accuracy and prediction results
Table 11 Hybrid model accuracy and prediction results

The CART model is much more penalizing in terms of classification. This explains the greater number of misclassified (AB).

4.2 Hybrid model accuracy

In order to improve the results of both models CART and MARS, we propose to build a hybrid model based on the classification model K-means and MARS.

Clustering is a method of grouping (Anderberg 2014; Hartigan 1975; Jain and Dubes 1988) a set of objects into groups according to criteria predefined similarities between objects. Most clustering methods are based on a distance measure between two objects. Technically, clustering can be regarded as a minimization problem.

Let X the matrix of dimension data (Nn):

$$\begin{aligned} X= \begin{pmatrix} x_{_{11}} &{} ... &{} x_{_{1n}} \\ \begin{array}{c} . \\ . \end{array} &{} ... &{} \begin{array}{c} . \\ . \end{array} \\ x_{_{N1}} &{} ... &{} x_{_{Nn}} \end{pmatrix} , \end{aligned}$$

N corresponds to the number of banks, n the number of years and \(x_{ij}\) ratios variables.

From \(\hbox {N} \times \hbox {n}\) dimensional data set K-means algorithms allocates each data point to one of c clusters to minimize the within-cluster sum of squares:

$$\begin{aligned} \sum _{i=1}^{c}\sum _{k\in A_{i}}\left\| x_{k}-v_{i}\right\| ^{2} \end{aligned}$$
(5)

where \(A_{i}\) is banks in the cluster i and \(v_{i}\) is the mean for these banks group over cluster i. This equation denotes actually a distance norm. In K-means clustering \(v_{i}\) is called the cluster prototypes, i.e. the cluster centers:

$$\begin{aligned} v_{i}=\frac{\sum _{x_{k}\in A_{i}}x_{k}}{N_{i}} \end{aligned}$$
(6)

where \(N_{i}\) is the number of banks in \(A_{i}\).

In our paper Z-score standardization is applied to find clusters and the number of cluster solutions is Two.

To implement the hybrid model, we proceed as follows:

  1. Step 1

    We apply the K-means clustering method on all banks. The cluster number is two (AB and NAB). In Table 10 we provide results of the classification based only on K-means.

  2. Step 2

    We do not run the MARS model on the data with their actual identifications. We use the classification generated by k-means. Hybrid’s results are summarized in Table 11.

For 2013, the hybrid model provides a satisfactory rate of correct classification but we notice a slight gap between the training and the testing sample (98% against 96%). The model classifies correctly all the bankrupt banks both in (TNS) and (TS) testing sample (sensitivity: 100%). Misclassification rate of (AB) in (TS) is higher than those of TNS (4.20% against 2.10%).

For the others years, we mainly observe the same results in term of correct classification rate. Moreover, the hybrid model provides a lower Type I error. However, we notice that the misclassification of (AB) (Type II error) is more important in the testing than in the learning sample.

Finally, according to the Area under Curve (ROC) (see Table 12), we conclude that on average MARS model provides a better accuracy results than CART model in the (TS) (94.84% against 93.3%). During all the period MARS outperforms CART, except for 2013, where the AUC of CART is slightly greater than MARS in the testing sample (95.89% against 95.10%).

The hybrid model K-MARS outperforms all the other models CART and MARS in both training and testing samples by using AUC.

To sum up, if we think in terms of average rates over the entire period we can confirm that MARS model provides better results than CART both in terms of correct classification rate and AUC for the testing sample. But it failed to classify correctly the bankrupt banks.

The combined model K-means and MARS is the best model in terms of accuracy in both training and testing sample in terms of average sensitivity (see Table 13; Figs. 7, 8, 9). Results of the training sample show the supremacy of the K-MARS model. It correctly classifies 98.91% of banks. It also provides a low type I error, meaning that only 0.19% of the bankrupt banks are misclassified. In (TS), results proved the supremacy of MARS in terms of correct classification rates (97.58% against 93.4% for CART and 97.04% for K-MARS) and in terms of type II error (1.15% against 6.7% for CART and 3.2% for K-MARS).

In (TS), the lowest level of type I error of K-MARS (1.16%) highlights the ability of the model to better classify (NAB). We note also that CART model delivers the highest level of type II error. This means that CART was able to predict the failure of banks in advance (The predictive power of the model). Also, in (TS), we observe the same trend during the period P1 (2008–2010) and P2 (2011–2013). Indeed, MARS model outperforms the other models in terms of specificity but it provides a less performance in terms of type I error. K-MARS was more accurate to classify banks during P1 that P2. In fact, in P2, we observe an upgrade of the correct classification rate only for MARS and CART and a decrease of the level of type I error for all the models.

Table 12 Area under curve (AUC) results
Table 13 Comparative results in average
Fig. 7
figure 7

ROC curve MARS

Fig. 8
figure 8

ROC curve CART

Fig. 9
figure 9

ROC curve K-means MARS

5 Conclusion

In this paper, we developed a blend model based on two non-parametric classification models to study the bankruptcy of US banks. We provide a comparative approach between CART, MARS and K-means-MARS. Our main objective is to predict bank defaults some time before the bankruptcy occurs, and to build an early warning system based on CAMEL’s ratios.

We based our empirical validation on a large panel of US banks gathered from both Bankscope and from the Federal Deposit Insurance Corporation.

The main contribution of our paper with regard to the existing literature is twofold:

  • Methodological and conceptual: First, we propose, for the first time, a hybrid model that combines K-means and MARS models. We provide a comparative framework not only to non-parametric models but also to parametric models Logit and CDA (Affes and Hentati-Kaffel 2017).

  • Empirical validation:

    1. (i)

      Our study focuses on a large sample of US banks with different size. The paper analyses the behavior of banks over a 6-year period rich in events (it encompasses tow sub-periods, stress period 2008–2009 and a recovery one 2010–2013).

    2. (ii)

      The comparative approach highlighted the supremacy of the proposed hybrid model in terms of accuracy classification for both training and validation samples.

    3. (iii)

      The model enhanced the classification accuracy by 1% for the training sample

    4. (iv)

      We established that MARS underperforms, by the misclassification rate of the bankrupt banks, notably for 2008 and 2009. Also, according to the Area under Curve (ROC), MARS model showed better accuracy results than CART model

    5. (v)

      The results differ from 1 year to another, but a general behavior for all distressed banks could be conducted. CART classification shows that among the 10 tested ratios, the most important predictors are capital adequacy variables. Also, we note that the asset quality ratios (NPLTA) and (NPLGL) are much more important than the other two components (LLRTA) and (LLRGL). According to MARS the most important variables was also the components of the capital adequacy. The Liquidity variables (TLTD and TDTA) have an importance in detecting bank failure only in 2010 and 2013. We note that, with respect to parametric models (see Affes and Hentati-Kaffel 2017) the asset quality was also an important component to explain the financial conditions of banks (except for 2009 and 2010).

Finally, as mentioned in the introduction, the ultimate goal of this paper is to provide regulators and investors an early warning model. The study we carried out meets this objective in two ways:

First, our detailed study shows how for a CART model it is possible to identify and detect target variables that enable banks in fragile financial situations to be detected in advance. For example, in 2008 banks which are in fragile and alarming situations are those who presents the following characteristics simultaneously: (i) \(NPLGL \le 0.08 \) and \(ROA \le -0.05\) (ii) NPLGL > 0.08 and \(EQTA \le 0.07\).

Second, MARS and CART models are a useful tool to identify in advance financial institutions in stress and so will be deserved with a special attention by supervisors. On average, CART put a red flag on 66.11% of unstable banks in year N + 1. The MARS model is much more efficient and prevents the failure, in year N + 1, of 80.81% of analyzed cases.

Finally, we believe that further extensions can be developed by including more financial variables and macroeconomic variables.