Predicting Corporate Failure Using Ensemble Extreme Learning Machine

Veganzones, David

doi:10.1007/978-3-031-18552-6_7

David Veganzones¹¹

Part of the book series: International Series in Operations Research & Management Science ((ISOR,volume 336))

784 Accesses

Abstract

Corporate failure prediction has become a major topic in the accounting and finance literature. Effective prediction models are essential for banks and financial institutions to solve financial decision-making problems. In general, artificial intelligence and machine learning techniques have been mainly employed to develop corporate failure models due to their prediction superiority in comparison to the traditional statistical method. Extreme learning machine is a newly developed artificial intelligence technique with an extremely fast learning speed. Nonetheless, its performance instability may be a major constraint for its practical application. The literature documents that the ensemble is one of the widely used methods to improve the generalization performance of weak classifiers. Therefore, we propose in this study an ensemble of extreme learning machine for improving the prediction performance on corporate failure task. In particular, we compare four benchmark ensemble methods (multiple classifiers, bagging, boosting, and random subspace) to evaluate which is best suited for extreme learning machine. Experimental results on French firms indicated that bagged and boosted extreme learning machine showed the best-improved performance.

Access provided by Autonomous University of Puebla. Download chapter PDF

Forecasting Bank Failure: Base Learners, Ensembles and Hybrid Ensembles

Article 12 September 2016

Bank Failure Prediction: A Comparison of Machine Learning Approaches

Credit Scoring Models Using Ensemble Learning and Classification Approaches: A Comprehensive Survey

Article 01 October 2021

Keywords

1 Introduction

The global economic developments of recent decades have put corporate failure and their consequences for economic well-being under the spotlight, to the extent that bankruptcy or business failure has become a crucial task in finance. This, in turn, has emphasized that financial institutions need effective prediction mechanisms in order to make an appropriate lending decision.

In general, the objective of corporate failure prediction is to forecast the likelihood that a firm will survive or fail with the minimum possible classification error. That is why corporate failure research aims at binary classification (Séverin & Veganzones, 2021; Ouenniche & Tone, 2017). From the binary classification point of view, the model’s output is a dichotomous variable that takes the value of 1 when the firm follows a bankruptcy procedure and is set to 0 when the firm survives. The explanatory variables to design corporate failure prediction models are often financial ratios, which measure the relationship between any two items on financial statements.

Since the pioneer studies of Beaver (1966) and Altman (1968) who documented the predictive power of ratio analysis, many prediction techniques have been employed to develop corporate failure prediction models, including statistical and artificial intelligence methods (Veganzones & Severin, 2020; Kumar & Ravi, 2007; Moula et al., 2017). On the one hand, researchers still employ well-known statistical methods, notably linear discriminant analysis and logistic regression, due to their simplicity and capacity to interpret the data, even though they are clearly outperformed by machine learning techniques. On the other hand, artificial intelligence techniques (i.e., support vector machine, decision trees, neural networks, fuzzy set theory, self-organizing map) have become indispensable tools in the field of corporate failure prediction, especially in this era of advanced informatics and computing technology (Abedin et al., 2021). Their superiority relies on the fact that they learn directly from the data, which makes it possible to test complex data using nonlinear approaches, and therefore, their predictions are more reliable. Nonetheless, these mentioned methods are not free of drawbacks: low learning rate, slow computational time, converge in local minima, etc. (Yu et al., 2014; Abedin et al., 2018), which could make corporate failure prediction time consuming and arduous.

To overcome these, we consider a novel prediction method, Extreme Learning Machine (ELM) (Huang et al., 2006a) to predict corporate failure. There are several reasons behind choosing ELM as the classifier for the prediction of corporate failures. Firstly, despite many existing methodologies for predicting corporate failure, new methods of research should be continually explored by researchers and practitioners. Secondly, the main concept behind ELM is the random initialization of the Single Layer Feed-Forward Neural Network (SLFN), which replaces the computationally cost procedure of training the hidden layer performed by other artificial intelligence techniques. Unlike the AI techniques, it does not need to calibrate parameters, such as the learning rate. For this reason, ELM has good performance with an extremely fast learning speed (Akusok et al., 2015) and it is proven to be a universal approximator given enough hidden neurons (Huang et al., 2006b).

However, as other techniques, ELM possesses a main drawback: the random initialization that allows ELM to be an extremely fast algorithm, it becomes ELM a highly unstable classifier as well. In ELM, even if we train the same training sample several times, it performs differently due to the random initialization of bias and weights between the input and hidden nodes. Although the reliance on a single ELM may be misguided, the ensemble of predictions might improve the generalization performance of the ELM. Indeed, ensemble methods are usually used as an instrument for improving the accuracy of the learning algorithm by constructing and combining a set of weak classifiers (Kim & Kang, 2010; Abedin et al., 2022). This rationale motivates our specific study of the performance of the ensemble extreme learning machine to predict corporate failure.

Consequently, the aim of this current work is to fully examine which is the best ensemble procedure to improve the performance of ELM for corporate failure prediction. This is of significant importance because the diversity generation method is key in the process of creating an ensemble of classifiers. According to Rokach (2010), diversity creation can be obtained in several ways: by manipulating the training sample, by manipulating the inducer, by varying the representation of the target attribute and by changing the search space. Of all possible ensemble techniques, we selected 4 based on their popularity in the literature (Verikas et al., 2010): Multiple classifiers, Bagging, Boosting, and Random Subspace. The fact that the chosen techniques rely on different ensemble procedures might provide further insight into the general characteristics of ensemble techniques that are influenced by the base classifier. In turn, a rigorous study of such methods would provide assistance in designing a model of corporate failure based on ensemble ELM. Furthermore, optimal performance of prediction models developed based on ensemble ELM models can be employed as a baseline prediction model for future research.

The rest of the paper is organized as follows. Section 2 presents the research methodology. Sections 3 and 4 describe the experimental design and results, respectively. Finally, in Sect. 5, the conclusions are summarized.

2 Research Methodology

In this section, we present the method employed in this study. In particular, we describe the extreme learning machine classifier as well as the ensemble modeling techniques.

2.1 Extreme Learning Machine

The Extreme Learning Machine (ELM) classifier was proposed by Huang et al. (2006a). The ELM represents a fast way of creating a Single Layer Hidden Feed-Forward Neural Network (SLFN) by the random initialization of the internal bias and weights. The hidden layer does not need to be iteratively tuned; it bypasses the time-consuming calibration setup performed by artificial intelligence algorithms. As a result, ELM is an extremely fast learning speed while being a simple method. The ELM algorithm can be described as follows:

Consider a set of N observations with features x_i ∈ ℝ^N and the corresponding output labels Y ∈ {−1, 1}^Nxc. A SLFN with m neurons in the hidden layer is written by the following sum:

$$ {\Sigma}_{j=1}^m\ {\beta}_j\ \phi \left({w}_j\ {\boldsymbol{x}}_i+{b}_j\right)={\boldsymbol{Y}}_{ik},i=1,\dots, N\ k=1,\dots, c, $$

(1)

where β_j are the output weights, ϕ is the activation function, w_j are the input weights and b_j represents the biases. The Eq. (1) can be expressed in the form of a matrix as Hβ = Y, where

$$ \boldsymbol{H}=\left(\begin{array}{ccc}\phi \left({w}_1\ {x}_1+{b}_1\right)& \cdots & \phi \left({w}_m\ {x}_1+{b}_m\right)\\ {}\vdots & \ddots & \vdots \\ {}\phi \left({w}_1\ {x}_N+{b}_1\right)& \cdots & \phi \left({w}_m\ {x}_N+{b}_m\right)\end{array}\right). $$

(2)

$$ \boldsymbol{\beta} ={\left({\beta}_1\kern0.5em \cdots \kern0.5em {\beta}_m\right)}^c\ \boldsymbol{Y}={\left({Y}_1\kern0.5em \cdots \kern0.5em {Y}_N\right)}^c. $$

Then, the output weights β can be calculated by the Ordinary Least Squares method using the Moore-Penrose pseudo inverse of H (Rao & Mitra, 1971):

$$ \boldsymbol{\beta} ={\mathbf{H}}^{\dagger}\mathbf{Y}. $$

(3)

2.2 Ensemble Techniques

2.2.1 Multiple Classifiers Technique

The multiple classifier technique relies on the simple idea that the combination of multiple classifiers leads to higher classification prediction and efficiency than the single classifier. This approach is equivalent to the wisdom of crowds: the combined opinion of diverse and independent experts usually outperforms the opinion of single individuals. According to Kitter et al. (1998), the multiple classifier technique achieves higher efficiency when learners generalize in different ways, i.e., the diversity of the ensemble is generated. As ELM is based on the random initialization of internal bias and weights, each learner will be different; there is diversity in the ensemble. Therefore, the forecast of several ELMs will be combined using majority voting to produce the final decision rule. Figure 1 shows the general architecture of the multiple classifier.

The classifiers C¹(X), …,C^M(X) are built based on the data set {(x₁, y₁), (x₂, y₂), …, (x_n, y_n)}. Each classifier provides an output $ {\hat{\boldsymbol{y}}}_M $ that will be combined into the final output $ \hat{\boldsymbol{y}} $.

An illustration of the multiple classifier. It has an input X that splits into E L M superscript 1 to M. Each of them gives y circumflex subscript 1 to y circumflex subscript M. All are combined to give the output y circumflex. — **Fig. 1**

2.2.2 Bagging

Bagging (short for bootstrap aggregating) is one of the primal ensemble techniques (Breiman, 1996). Its popularity lies in the fact that it is intuitive and simple to implement, with notably good performance. Bagging generates the essential diversity to create the ensemble process that manipulates the training set. In this regard, the training set samples are randomly resampled in order to generate several different bags of samples. Thus, each bag represents a set of training samples. Finally, the base classifier is applied to each bag, and the output classification is made by a majority vote of all the base classifier results.

Bagging technique generates an improvement in generalization performance due to the reduction in variance while maintaining steady or only slightly increasing the bias, in particular, when it is applied to weak classifiers (Grandvalet, 2004). The bagging algorithm can be expressed as follows:

Given a data set {(x₁, y₁), (x₂, y₂), …, (x_n, y_n)} .

1.
Repeat for i = 1, 2, …, I.
1. (a)
  Build a bootstrap sample $ \left\{\left({\boldsymbol{x}}_{\mathbf{1}}^{\ast },{\boldsymbol{y}}_{\mathbf{1}}^{\ast}\right),\left({\boldsymbol{x}}_{\mathbf{2}}^{\ast },{\boldsymbol{y}}_{\mathbf{2}}^{\ast}\right),\dots, \left({\boldsymbol{x}}_{\boldsymbol{n}}^{\ast },{\boldsymbol{y}}_{\boldsymbol{n}}^{\ast}\right)\right\} $ by randomly selecting n times with replacement from the data {(x₁, y₁), (x₂, y₂), …, (x_n, y_n)}.
2. (b)
  Fitting the bootstrapped classifier C_i on corresponding bootstrap sample.
2.
Calculate the output of the final classifier:

$$ \boldsymbol{C}\left(\boldsymbol{x}\right)={I}^{-1}{\sum}_i^I{C}_i(x). $$

(4)

2.2.3 Boosting

Unlike the bagging technique, the boosting technique combines inaccurate and relatively weak rules to produce highly accurate predictions. That is, it progressively gives more weight to observations that have been misclassified by previously generated classifiers in order to generate new classifiers and then combines the classifiers of different iterations with weighted voting to make final predictions. Since numerous algorithms for boosting have been proposed, we use the Adaboost algorithm (Freund & Schapire, 1996) which is one of the most popular boosting techniques applied to pattern recognition (Verikas et al., 2010). The Adaboost algorithm can be described as follows:

Given a data set {(x₁, y₁), (x₂, y₂), …, (x_n, y_n)} .

1.
Initialize the weight vector of the training set:

$$ {W}_1(i)=\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$N$}\right.\ \mathrm{for}\ i=1,\dots, N. $$

(5)

2.
For t = 1, …, T,
1. (a)
  Train the weak classifier C_t on the weighted training samples.
2. (b)
  Calculate the sum of weighted errors of C_t:

$$ {\varepsilon}_t={\sum}_{i=1}^N{W}_i^t,{Y}_i\ne {C}_t\ \left({X}_i\right). $$

(6)

(c)
Choose

$$ {\alpha}_t=\frac{1}{2}\ln \left(\frac{1-{\varepsilon}_t}{\varepsilon_t}\right). $$

(7)

(d)
Update the weights:

$$ {W}_i^{t+1}=\frac{W_i^t\exp \left(-{\alpha}_t{Y}_i{C}_t\ \left({X}_i\right)\right)}{Z_t}, $$

(8)

where Z_t is a normalization factor.

3.
Output:

$$ f(x)=\mathit{\operatorname{sign}}\ \left({\sum}_{t=1}^T{\alpha}_t{C}_t\ (x)\right). $$

(9)

2.2.4 Random Subspace

The random subspace (Ho, 1998) bases its ensemble process on the modification of the feature space. That is, it creates different bags of training samples by randomly selecting features drawn for the initial feature set that characterizes each sample. The training sample X_i(i = 1, …, n) in the training set X = (X₁, X₂, …, X_n) is a p-dimensional vector X_i = (x_i1, x_i2, …, x_ip), where p represents the feature components. Within the random subspace, the k-dimensional subspace is randomly selected from the original p-dimensional feature space, k < p. The new learning samples $ {\boldsymbol{X}}^b=\left({\boldsymbol{X}}_{\mathbf{1}}^{\boldsymbol{b}},{\boldsymbol{X}}_{\mathbf{2}}^{\boldsymbol{b}},\dots, {\boldsymbol{X}}_{\boldsymbol{n}}^{\boldsymbol{b}}\right) $in a k-dimensional subspace $ {\boldsymbol{X}}_{\boldsymbol{i}}^{\boldsymbol{b}}=\left({\boldsymbol{x}}_{\boldsymbol{i}\mathbf{1}}^{\boldsymbol{b}},{\boldsymbol{x}}_{\boldsymbol{i}\mathbf{2}}^{\boldsymbol{b}},\dots, {\boldsymbol{x}}_{\boldsymbol{i}\boldsymbol{n}}^{\boldsymbol{b}}\right) $, where $ {\boldsymbol{x}}_{\boldsymbol{ij}}^{\boldsymbol{b}}\left(j=1,\dots, r\right) $, are built and then, the classifiers in the random subspace X^b are combined using majority voting to create the final decision rule. Thus, the random subspace can be organized as follows:

1.
Repeat b times, with b = 1, 2, …, B
1. (a)
  Randomly select a k-dimensional subspace X^b among the initial p-dimensional feature space X.
2. (b)
  Design a classifier C^b(x) using the sample X^b.
2.
Combine the forecast of C^b(x) classifiers using majority voting to a final decision rule.

$$ \mathrm{Prev}(x)={\displaystyle \begin{array}{c}\mathrm{argmax}\\ {}y\in \left\{-1;1\right\}\end{array}}{\sum}_{b=1}^B{\delta}_{\mathit{\operatorname{sgn}}\left({\boldsymbol{C}}^{\boldsymbol{b}}\left(\boldsymbol{x}\right)\right),\boldsymbol{y}}. $$

(10)

3 Experimental Design

3.1 Data

Our empirical study uses non-listed French firms taken from the Diane database created by Bureau Van Dijk. The French companies must submit annual reports to the French Commercial Court under French law provide accounting and income statements to the Bureau Van Dijk authority. We drew firms from all sectors of activity (excluding financial companies) for the years 2016–2018, allowing us to examine the model’s capacity to create good prediction rules in a real-world scenario.

The Diane database provides the information on whether firms have failed or remain healthy; in the case of failure, it also provides the date. A firm is considered to be failed if it proceeded to be liquidated or reorganized, and non-failed firms were those that continued their activity for at least a year after the period studied. We decided to be conservative in the selection of non-failed firm in order to avoid the inclusion of healthy companies that may suddenly fail and ensure a reliable sample that does not fail. Moreover, firms that presented missing values in their financial statement, as well as outliers, were excluded to ensure the prediction model stability. Consequently, the collected dataset is composed of 3000 failed and 3000 non-failed firms.^{Footnote 1}

To minimize the bias effect and sample variability that might influence the model prediction performance, we carried out a tenfold cross-validation method in which the dataset is split into ten distinct training and test set in order to learn and evaluate the model prediction. This procedure was repeated ten times to ensure the reliability of our results. Therefore, the final prediction performance is calculated as the average of 100 testing results.

3.2 Variables

Financial dimensions characterize the main explanatory factors for corporate failure. Therefore, the balance sheets and income statements of the collected firms were used to calculate 30 financial ratios to use as explanatory variables. This representation layer is important because it guarantees that the variables, we have used actually represent all aspects of the phenomenon.

The initial set of financial ratios that we compute includes at least four indicators representing six categories: liquidity, solvency, profitability, financial structure, turnover, and activity. These variables are presented in Table 1.

Table 1 Initial set of variables

Full size table

However, using all financial ratios may result in very high-dimensional feature space, which may reduce model predictive capability. Therefore, a variable selection process has been performed in order to choose a subset of the most relevant financial ratios. Following the study by Kainulainen et al. (2011), a feed-forward variable selection process was performed to retain the necessary information for prediction.

3.3 Evaluation Metrics

The evaluation criteria of our experiments are adopted from standard measures established in the field of prediction (Shahriare et al., 2021). These measures include average accuracy, type error I, and type error II. The formula of these measures provided below can be explained with respect to the confusion matrix shown in Table 2.

$$ \mathrm{Accuracy}=\frac{\mathrm{TP}+\mathrm{TN}}{\mathrm{TP}+\mathrm{FP}+\mathrm{FN}+\mathrm{TN}}, $$

(11)

$$ \mathrm{Type}-\mathrm{I}\ \mathrm{error}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}, $$

(12)

$$ \mathrm{Type}-\mathrm{II}\ \mathrm{error}=\frac{\mathrm{TN}}{\mathrm{TN}+\mathrm{FP}}. $$

(13)

Table 2 Confusion matrix for the prediction of corporate failure

Full size table

In addition to these evaluation metrics, we also used the area under the receiver operating characteristic curve (AUC) to estimate the model performance. This is a graphical plot used to represent the model performance while changing the cutoff value. In this case, the proportion of true positive and false positive are plotted on the x-axis and y-axis of the curve. AUC has become a widely used evaluation metric in corporate failure prediction because it is insensitive to the matrix of misclassification cost^{Footnote 2} to assess the discrimination ability of a model. In summary, two classifiers can be easily compared according to differences in the ROC curve performance. A classifier should get as close to the top left corner as possible, where its value will be close to 1.

With the data set mentioned above, a cross-validation loop (tenfold cross-validation repeated ten times) was performed to estimate the average evaluation measures. To compare the classifier performance, Demšar (2006) recommends a Wilcoxon signed ranks non-parametric test because it only assumes limited commensurability and can be applied to prediction accuracy, misclassification errors or any other evaluation metric. It is expressed as follows:

Given R⁺ be the sum of ranks when the second classifier outperforms the first one, R⁻ be the sum of ranks for the opposite and the ranks of d_i = 0 are split evenly among the sums:

$$ {R}^{+}=\sum \limits_{d_i>0}\operatorname{rank}\left({d}_i\right)+\frac{1}{2}\sum \limits_{d_i=0}\operatorname{rank}\left({d}_i\right), $$

(14)

$$ {R}^{-}=\sum \limits_{d_i<0}\operatorname{rank}\left({d}_i\right)+\frac{1}{2}\sum \limits_{d_i=0}\operatorname{rank}\left({d}_i\right). $$

(15)

Let T be the smaller of the sums, T = min (R⁺, R⁻), the normal approximation can be used and the following statistic is used to calculate the z-statistics with a corresponding p-value:

$$ z=\frac{T-\frac{n\left(n+1\right)}{4}}{\sqrt{\frac{n\left(n+1\right)\left(2n+1\right)}{24}}}. $$

(16)

However, Garcia and Herrera (2008) caution that several repeated pairwise comparison tests between algorithms conducted by us may lead to loss of control over family-wise errors.

4 Results

Experimental analysis is designed to compare the prediction ability of different ensemble methods based on extreme learning machine classifier. Table 3 indicates the evaluation metrics achieved to assess the performance of the methods. Furthermore, this table is complemented by Table 4, which highlights whether the differences between the methods are statistically significant.^{Footnote 3}

We first analyze the overall performance of the methods. Boosting ELM and Bagging ELM achieve the best mean accuracy values, 82.2% and 82.6%, respectively, while Random subspace ELM attains mean accuracy value of 81.7% and that of 81.4% is achieved with Multiple ELM. All ensemble methods are more accurate than the single ELM (80.4% of the mean accuracy). Thus, it confirms that ensemble ELM methods produce greater predictive power compared to a single ELM classification. The fact that Bagging and Boosting ensembles lead to the best reduction in the generalization error is not entirely surprising, as it is well documented their robustness to overfitting (Xiao et al., 2013; González et al., 2020). In contrast, variation of the parameters of the classifiers, such as Multiple ensemble and Random Subspace, can generate greater diversity (Bi, 2012). Nonetheless, the information perceived by the varying diversity does not generate consistent guidance so that the ensemble classifier can obtain a good generalization. On the whole, the key of Boosting and Bagging is that they build a set of diverse classifiers, while they benefit from the balance between diversity and accuracy, which is an important determinant of the performance of ensemble classifiers.

Secondly, we find no uniform improvement among the ensemble methods. If the misclassification errors are analyzed, Boosting ELM and Bagging ELM, here as well, lead to lower misclassification error for failed firms, 18.8% and 18.2%, respectively, significant at 1% threshold in comparison with ELM. In contrast, we do not observe any significant differences in misclassification error for non-failed firms across ensemble methods; rather, the mean type-II error ranges from 16.5% with Bagging ELM and Random Subspace ELM to 18.8% with Bagging ELM.

Finally, the Bagging and Boosting ELM-based methods lead to higher AUC values than the other ensemble methods, which is in line with the previous results. In particular, Bagging ELM seems to be the most optimal ensemble method for corporate failure prediction as results are significantly better than those achieved with the other ensemble methods, but with respect to Boosting ELM.

Table 3 Performance of different ELM-based ensemble methods

Full size table

In sum, the better overall prediction of Bagging and Boosting methods over the other ensemble methods, as previously observed, is due to their capacity to better identify failed firms. The superiority of Bagging ELM is based on the creation of a unique training set for each ensemble member because the perturbation generated in the learning set causes a significant change in the prediction constructed. As a model’s prediction is order-correct for most of the replicated observation, the bagging-based ELM can be transformed into a nearly optimal predictor, in particular, for failed firms. Furthermore, one of major reasons why boosted ELM better identifies failed firms may be due to the fact that the new classifier generation gives more relevance to misclassified observation, mostly failed firms. That is, the likelihood of instances that have been misclassified by the previously generated classifier increases, and the set of classifiers grows progressively diverse. This trend explains why this method provides higher accuracy for the minority class without jeopardizing the accuracy of the majority class.

Table 4 Significance levels of a test of differences by method and evaluation metric

Full size table

4.1 Further Validation

In order to further evaluate the effectiveness of the ensemble extreme learning machine for the corporate failure prediction task, a new data set has been collected. In general, there is no universal accepted definition of corporate failure; bankruptcy, the more severe form of failure, is commonly used. The popularity of bankruptcy as the definition of failure is based on two concepts: on the one hand, it provides an objective criterion to distinguish failed and non-failed firms, and, on the other hand, the moment of failure can be dated when a firm fills in the bankruptcy procedure. Therefore, the bankruptcy notion offers a discrimination criterion for obtaining a well-defined dichotomy, or at least, a representation of corporate failure, that can be applied methodologically. Nonetheless, numerous studies (Sun et al., 2014; Brédart et al., 2021) consider that corporate failure begins when a firm experiences financial distress. That is, when a firm encounters financial difficulties or struggles to fulfill its obligations. Accordingly, we collected a data set considering financial distress as the definition of corporate failure. We consider the criterion provided by Balcaen et al. (2011), who define financial distress as a firm with negative recurring profit after taxes over two consecutive years. Consequently, the collected dataset is composed of 2500 failed and 2500 non-failed firms.^{Footnote 4}

The results presented in Tables 5 and 6 are consistent with those of the previous ones. Boosting ELM and Bagging ELM achieve the highest accuracy values, in particular, due to their effectiveness in the reducing the type-I error in comparison to the single ELM.^{Footnote 5} Moreover, it is important to mention that the prediction performance of the methods in this data set is inferior to the previous one. Thus, it is more arduous to differentiate failed firms from healthy ones in the initial steps of failure, when firms just experience financial distress. The literature documented that firms have shown a certain resilience for a long time, even though their financial situation resembles to a bankrupt one (Iftikhar et al., 2021). In contrast, firms that seem completely sound may suddenly fail. Therefore, the inability to know whether the echoes of financial distress may result in corporate failure makes it difficult to capture distinguishable factors that might reinforce model accuracy. That is why the performance of models is lower when corporate failure is represented as financial distress than when it is defined as bankruptcy.

Table 5 Performance of different prediction methods

Full size table

5 Conclusion

In this study, we propose to evaluate several ensemble methods applied to corporate failure prediction in order to improve the classification performance of ELM. An ensemble strategy that combines the predictions of individual models is more performance-based than relying on the prediction capacity of a single model. Our results confirm that the Extreme Learning Machine-based ensemble is more accurate and robust than the “individual best” ELM model using two real financial datasets. In particular, the ensemble methods used in this study increase, on average, the classification accuracy estimated for the single ELM by 1.6 and 2.1 percentage points for the bankruptcy data and financial distress data, respectively. An increase in prediction performance of these magnitudes may seem modest, but the readers need to understand that financial institutions and banks can save a huge amount of the limited financial resources with decision technology that can increase the prediction power by 2%.

Table 6 Significance levels of a test of differences by method and evaluation metric

Full size table

As Bagging ELM and Boosting ELM give similar results – there is some evidence that the bagging strategy is more effective for the prediction of corporate failure using ELM – it is arduous to make a design recommendation for which method is more optimal. However, we do notice that both methods, which operate by taking a base learner and invoking it multiple times using different training sets, are most effective in the ensemble ELM prediction method. We also notice that bagged ELM is more computationally efficient, as it requires 40–50 ensemble members, while 60–70 members as necessary for the boosting ensemble.

Notes

1.
Corporate failure is a rare phenomenon in the real world, so failed firms are clearly outnumbered by non-failed ones. That is why the sample selection process becomes a significant paradigm. If one design a model based on the actual population, the dataset must be imbalanced. However, this procedure has a main drawback: it is likely to lead to significant degradation of the prediction performance due to low percentage of failed firm in the entire sample (López et al., 2013; Shajalal et al., 2021). Therefore, we collect a stratified sample with same observations of failed and non-failed based on matched pair technique (Ciampi, 2015), in which failed firms are matched with non-failed firms according to industry sector, size, and firm age.
2.
The misclassification of a failed firm (predict that a firm is healthy when it fails) represent a loss in capital, while the misclassification of a healthy firm (predict that a firm is failed when it survives) represents only a loss of commercial bargain. That is why, misclassified a failed firm is considered to be more costly.
3.
Appendix 1 shows the results on the database using ELM and ELM-ensemble methods. Figures 2 and 3 indicates the testing results with different number of hidden nodes and the average classification error of the ELM-ensemble methods as a function of the number of ensemble members.
4.
To design the prediction methods, the same procedure used in Sect. 3.2 was followed. Then, they were evaluated based on a 10-cross validation and using the abovementioned evaluation metrics.
5.
The Appendix 2 shows graphically the testing results with different hidden nodes (Fig. 4) and the average classification error of ELM-ensemble methods as a function of ensemble members (Fig. 5).

References

Abedin, M. Z., Chi, G., Colombage, S., & Moula, F. E. (2018). Credit default prediction by using a support vector machine and a probabilistic neural network. Journal of Credit Risk, 14(2), 1–27.
Google Scholar
Abedin, M. Z., Hassan, M. K., Petr, H., & Uddin, M. M. (2021). Machine learning in finance and accounting. In The essentials of machine learning in finance and accounting, Taylor & Francis.
Google Scholar
Abedin, M. Z., Chi, G., Hajek, P., & Tong, Z. (2022). Combining weighted SMOTE with ensemble learning for the class-imbalanced prediction of small business credit risk. Complex & Intelligent Systems. https://doi.org/10.1007/s40747-021-00614-4
Akusok, A., Veganzones, D., Miche, Y., Björk, K. M., Du Jardin, P., Severin, E., & Lendasse, A. (2015). MD-ELM: Originally mislabeled samples detection using OP-ELM model. Neurocomputing, 159, 242–250.
Article Google Scholar
Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The Journal of Finance, 23(4), 589–609.
Article Google Scholar
Balcaen, S., Manigart, S., & Ooghe, H. (2011). From distress to exit: Determinants of the time to exit. Journal of Evolutionary Economics, 21, 407–446.
Article Google Scholar
Beaver, W. H. (1966). Financial ratios as predictors of failure. Journal of Accounting Research, 4, 71–111.
Article Google Scholar
Bi, Y. (2012). The impact of diversity on the accuracy of evidential classifier ensembles. International Journal of Approximate Reasoning, 53(4), 584–607.
Article Google Scholar
Brédart, X., Séverin, E., & Veganzones, D. (2021). Human resources and corporate failure prediction modeling: Evidence from Belgium. Journal of Forecasting, 40(7), 1325–1341.
Article Google Scholar
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
Article Google Scholar
Ciampi, F. (2015). Corporate governance characteristics and default prediction modeling for small enterprises: An empirical analysis of Italian firms. Journal of Business Research, 68(5), 1012–1025.
Article Google Scholar
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 7, 1–30.
Google Scholar
Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Thirteenth International Conference on Machine Learning (pp. 148–156). IEEE.
Google Scholar
Garcia, S., & Herrera, F. (2008). An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. Journal of Machine Learning Research, 9(12), 2677–2694.
Google Scholar
González, S., García, S., Del Ser, J., Rokach, L., & Herrera, F. (2020). A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities. Information Fusion, 64, 205–237.
Article Google Scholar
Grandvalet, Y. (2004). Bagging equalizes influences. Machine Learning, 55(3), 251–270.
Article Google Scholar
Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832–844.
Article Google Scholar
Huang, G. B., Zhu, Q. Y., & Siew, C. K. (2006a). Extreme learning machine: Theory and applications. Neurocomputing, 70(1), 489–501.
Article Google Scholar
Huang, G. B., Chen, L., & Siew, C. K. (2006b). Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Transactions on Neural Networks, 17(4), 879–892.
Article Google Scholar
Iftikhar, A., Purvis, L., & Giannoccaro, I. (2021). A meta-analytical review of antecedents and outcomes of firm resilience. Journal of Business Research, 135, 408–425.
Article Google Scholar
Kainulainen, L., Miche, Y., Eirola, E., Yu, Q., Frénay, B., Séverin, E., & Lendasse, A. (2011). Ensembles of local linear models for bankruptcy analysis and prediction. Case Studies in Business, Industry and Government Statistics, 4(2), 116–133.
Google Scholar
Kim, M. J., & Kang, D. K. (2010). Ensemble with neural networks for bankruptcy prediction. Expert Systems with Applications, 37(4), 3373–3379.
Article Google Scholar
Kittler, J., Hatef, M., Duin, R. P. W., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239.
Article Google Scholar
Kumar, P. R., & Ravi, V. (2007). Bankruptcy prediction in banks and firms via statistical and intelligent techniques: A review. European Journal of Operational Research, 180(1), 1–28.
Article Google Scholar
López, V., Fernández, A., García, S., Palade, V., & Herrera, F. (2013). An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences, 250, 113–141.
Article Google Scholar
Moula, F. E., Chi, G., & Abedin, M. Z. (2017). Credit default prediction modeling: An application of support vector machine. Risk Management, 19(2), 158–187.
Article Google Scholar
Ouenniche, J., & Tone, K. (2017). An out-of-sample evaluation framework for DEA with application in bankruptcy prediction. Annals of Operations Research, 254(1), 235–250.
Article Google Scholar
Rao, C. R., & Mitra, S. S. K. (1971). Generalized inverse of matrix and its application (Wiley Series in Probability and Mathematical Studies). Wiley.
Google Scholar
Rokach, L. (2010). Ensemble-based classifiers. Artificial Intelligence Review, 33(1), 1–39.
Article Google Scholar
Séverin, E., & Veganzones, D. (2021). Can earnings management information improve bankruptcy prediction models? Annals of Operations Research, 306(1), 247–272.
Article Google Scholar
Shahriare S, Khair A, Abedin MZ (2021, December 19–21). Performance analysis of machine learning techniques that predict hotel booking cancellations in hospitality industry. In ICCIT 2020: 23rd International Conference on Computer and Information Technology, Dhaka.
Google Scholar
Shajalal, M., Abedin, M. Z., & Uddin, M. M. (2021). Handling class imbalance data in business domain. In: The essentials of machine learning in finance and accounting. Taylor & Francis.
Google Scholar
Sun, J., Li, H., Huang, Q. H., & He, K. Y. (2014). Predicting financial distress and corporate failure: A review from the state-of-the-art definitions, modeling, sampling, and featuring approaches. Knowledge-Based Systems, 57, 41–56.
Article Google Scholar
Veganzones, D., & Severin, E. (2020). Corporate failure prediction models in the twenty-first century: A review. European Business Review, 33(2), 204–226.
Article Google Scholar
Verikas, A., Kalsyte, Z., Bacauskiene, M., & Gelzinis, A. (2010). Hybrid and ensemble-based soft computing techniques in bankruptcy prediction: A survey. Soft Computing, 14(9), 995–1010.
Article Google Scholar
Xiao, T., Zhu, J., & Liu, T. (2013). Bagging and boosting statistical machine translation systems. Artificial Intelligence, 195, 496–527.
Article Google Scholar
Yu, Q., Miche, Y., Séverin, E., & Lendasse, A. (2014). Bankruptcy prediction using extreme learning machine and financial expertise. Neurocomputing, 128, 296–302.
Article Google Scholar

Download references

Acknowledgments

We sincerely thank Prof. Abedin and Prof. Hajek for their assistance.

Author information

Authors and Affiliations

ESCE International Business School, OMNES Education, Paris La Défense, France
David Veganzones

Authors

David Veganzones
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Finance, Performance and Marketing, Teesside University International Business School, Teesside University, Middlesbrough, UK
Mohammad Zoynul Abedin
Faculty of Economics and Administration, University of Pardubice, Pardubice, Czech Republic
Petr Hajek

Appendices

1.1 Appendix 1

A line graph of testing accuracy versus hidden nodes. A line starts at (10, 0.725), increases to (40, 0.7980), reaches its highest peak at (60, 0.805), and ends at (70, 0.8). Data is approximate. — **Fig. 2**

A multiple line graph of classification errors versus ensemble members. The highest peak is on multiple E L M at (5, 0.225) and the lowest value is on bagging E L M at (45, 0.185). Data is approximate. — **Fig. 3**

1.2 Appendix 2

A line graph of testing accuracy versus hidden nodes. A line starts at (10, 0.684), increases to (40, 0.745), reaches its highest peak at (70, 0.78), and ends at (80, 0.769). Data is approximate. — **Fig. 4**

A multiple line graph of classification errors versus ensemble members. The highest peak is on random subspace E L M at (5, 0.228) and the lowest value is on bagging E L M at (35, 0.19). Data is approximate. — **Fig. 5**

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Veganzones, D. (2023). Predicting Corporate Failure Using Ensemble Extreme Learning Machine. In: Abedin, M.Z., Hajek, P. (eds) Novel Financial Applications of Machine Learning and Deep Learning. International Series in Operations Research & Management Science, vol 336. Springer, Cham. https://doi.org/10.1007/978-3-031-18552-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-18552-6_7
Published: 02 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18551-9
Online ISBN: 978-3-031-18552-6
eBook Packages: Economics and FinanceEconomics and Finance (R0)

Publish with us

Policies and ethics

Predicting Corporate Failure Using Ensemble Extreme Learning Machine

Abstract

Similar content being viewed by others

Forecasting Bank Failure: Base Learners, Ensembles and Hybrid Ensembles

Bank Failure Prediction: A Comparison of Machine Learning Approaches

Credit Scoring Models Using Ensemble Learning and Classification Approaches: A Comprehensive Survey

Keywords

1 Introduction