Abstract
Forecasting the creditworthiness of customers in new and existing loan contracts is a central issue of lenders’ activity. Credit scoring involves the use of analytical methods to transform historical loan application and loan performance data into credit scores that signal creditworthiness, inform, and determine credit decisions, determine credit limits, and loan rates, and assist in fraud detection, delinquency intervention, or loss mitigation. The standard approach to credit scoring is to pursue a “winner-take-all” perspective by which, for each dataset, a single believed to be the “best” statistical learning or machine learning classifier is selected from a set of candidate approaches using some method or criteria often neglecting model uncertainty. This paper empirically investigates the predictive accuracy of single-based classifiers against the stacking generalization approach in credit risk modelling using real-world peer-to-peer lending data. The findings show that stacking ensembles consistently outperform most traditional individual credit scoring models in predicting the default probability. Moreover, the findings show that adopting a feature selection process and hyperparameter tuning contributes to improving the performance of individual credit risk models and the super-learner scoring algorithm, helping models to be simpler, more comprehensive, and with lower classification error rates. Improving credit scoring models to better identify loan delinquency can substantially contribute to reducing loan impairments and losses leading to an improvement in the financial performance of credit institutions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ashofteh, A., Bravo, J.M.: A conservative approach for online credit scoring. Expert Syst. Appl. 176, 114835 (2021)
Ashofteh A., Bravo J.M.: A non-parametric-based computationally efficient approach for credit scoring. In: CAPSI 2019 - 19th Conference of the Portuguese Association for Information Systems, Lisbon, Code 160805 (2019)
Chamboko, R., Bravo, J.M.: On the modelling of prognosis from delinquency to normal performance on retail consumer loans. Risk Manage. 18, 264–287 (2016)
Chamboko, R., Bravo, J.M.: A multi-state approach to modelling intermediate events and multiple mortgage loan outcomes. Risks 8(2), 64 (2020). https://doi.org/10.3390/risks8020064
Thomas, L.C.: A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers. Int. J. Forecast. 16(2), 149–172 (2000)
Saunders, A., Allen, L.: Credit Risk Measurement-New Approaches to Value at Risk and Other Paradigms. Wiley, New York (2002)
Lessmann, S., Baesens, B., Seow, H., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur. J. Oper. Res. 247(1), 124–136 (2015)
Altman, E.I.: Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J. Financ. 23(4), 589–609 (1968)
Chamboko, R., Bravo, J.M.: Modelling and forecasting recurrent recovery events on consumer loans. Int. J. Appl. Decis. Sci. 12(3), 271–287 (2019)
Chamboko, R., Bravo, J.M.: Frailty correlated default on retail consumer loans in developing markets. Int. J. Appl. Decis. Sci. 12(3), 257–270 (2019)
Altman, E.I., Haldeman, R.G., Narayanan, P.: ZETATM analysis a new model to identify bankruptcy risk of corporations. J. Bank. Finan. 1(1), 29–54 (1977)
Ala’raj, M., Abbod, M.F.: Classifiers consensus system approach for credit scoring. Knowl.-Based Syst. 104, 89–105 (2016)
Barboza, F., Kimura, H., Altman, E.: Machine learning models and bankruptcy prediction. Expert Syst. Appl. 83, 405–417 (2017)
Zhang, D., Zhou, X., Leung, S., Zheng, J.: Vertical bagging decision trees model for credit scoring. Expert Syst. Appl. 37, 7838–7843 (2010)
Huang, C.L., Chen, M.C., Wang, C.J.: Credit scoring with a data mining approach based on support vector machines. Expert Syst. Appl. 33(4), 847–856 (2007)
Mukid, M., Widiharih, T., Rusgiyono, A., Prahutama, A.: Credit scoring analysis using weighted k-nearest neighbour. J. Phys. Conf. Ser. 1025, 012114 (2018)
West, D.: Neural network credit scoring models. Comput. Oper. Res. 27(11–12), 1131–1152 (2000)
Steel, M.F.J.: Model averaging and its use in economics. J. Econ. Lit. 58, 644–719 (2020)
Ashofteh, A., Bravo, J.M., Ayuso, M.: A new ensemble learning strategy for panel time-series forecasting with applications to tracking respiratory disease excess mortality during the COVID-19 pandemic. Appl. Soft Comput. 128, 109422 (2022)
Bravo, J.M., Ayuso, M., Holzmann, R., Palmer, E.: Addressing the life expectancy gap in pension policy. Insur. Math. Econ. 99, 200–221 (2021)
Bravo, J.M.: Pricing participating longevity-linked life annuities: a Bayesian model ensemble approach. Eur. Actuar. J. 12, 125–159 (2021)
Ayuso, M., Bravo, J.M., Holzmann, R., Palmer, E.: Automatic indexation of the pension age to life expectancy: when policy design matters. Risks 9(5), 96 (2021). https://doi.org/10.3390/risks9050096
Bravo, J.M., Ayuso, M.: Mortality and life expectancy forecasts using Bayesian model combinations: an application to the Portuguese population. RISTI - Revista Ibérica de Sistemas e Tecnologias de Informação, E40, 128–144 (2020). https://doi.org/10.17013/risti.40.128–145
Bravo, J.M., Ayuso, M.: Linking pensions to life expectancy: tackling conceptual uncertainty through Bayesian model averaging. Mathematics, 9(24), 3307 (2021). 1–27
Feng, X., Xiao, Z., Zhong, B., Qiu, J., Dong, Y.: Dynamic ensemble classification for credit scoring using soft probability. Appl. Soft Comput. J. 65, 139–151 (2018)
Xia, Y., Liu, C., Da, B., Xie, F.: A novel heterogeneous ensemble credit scoring model based on bstacking approach. Expert Syst. Appl. 93, 182–199 (2018)
Re, M., Valentini, G.: Ensemble methods: A review. Advances in Machine Learning and Data Mining for Astronomy, pp. 563–594. Chapman & Hall (2012). https://doi.org/10.1201/B11822-34
Zhou, Z.: Ensemble Methods: Foundations and Algorithms, pp. 15-16. Chapman and Hall (2012).https://doi.org/10.1201/b12207
Dietterich, T.G.: Ensemble methods in machine learning. In: Multiple Classifier Systems. MCS 2000, LNCS, pp. 1–15 (2000). https://doi.org/10.1007/3-540-45014-9_1
Wolpert, D.: Stacked generalization. Neural Netw. 5, 241–259 (1992)
Cox, D.R.: The regression analysis of binary sequences. J. R. Stat. Soc. Ser. B (Methodol.) 20(2), 215–232 (1958)
Cortes, C., Vapnik, V.: Support vector network. Mach. Learn. 20, 273–297 (1995)
Jijo, B.T., Abdulazeez, A.M.: Classification based on decision tree algorithm for machine learning. J. Appl. Sci. Technol. Trends 2(01), 20–28 (2021)
Zhang, Y., Wang, J.: K-nearest neighbors and a kernel density estimator for GEFCom2014 probabilistic wind power forecasting. Int. J. Forecast. 32(3), 1074–1080 (2016)
Jiang, W., Chen, Z., Xiang, Y., Shao, D., Ma, L., Zhang, J.: SSEM: a novel self-adaptive stacking ensemble model for classification. IEEE Access 7, 120337–120349 (2019)
Marqués, A.I., García, V., Sánchez, J.S.: On the suitability of resampling techniques for the class imbalance problem in credit scoring. J. Oper. Res. Soc. 64(7), 1060–1070 (2013)
Mishra, S., Sarkar, U., Taraphder, S., Datta, S., Swain, D., Saikhom, R., et al.: Multivariate statistical data analysis- principal component analysis (PCA). Int. J. Livestock Res. 7(5), 60–78 (2017)
Abdou, H., Pointon, J.: Credit scoring, statistical techniques and evaluation criteria: a review of the literature. Int. Syst. Acc. Finan. Manag. 18, 59–88 (2011)
Powers, D.M.W.: Evaluation: From precision, recall and f-measure to ROC., informedness, markedness & correlation. J. Mach. Learn. Technol. 2, 37–63 (2011)
Luo, G.: A review of automatic selection methods for machine learning algorithms and hyper-parameter values. Netw. Model. Anal. Health Inform. Bioinform. 5(1), 1–16 (2016). https://doi.org/10.1007/s13721-016-0125-6
Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. (JAIR) 16, 321–357 (2002)
Mienye, D., Sun, Y.: Performance analysis of cost-sensitive learning methods with application to imbalanced medical data. Inform. Med. Unlocked 25, 1–10 (2021)
Yu, H., Sun, C., Yang, X., Zheng, S., Wang, Q., Xi, X.: LW-ELM: a fast and flexible cost-sensitive learning framework for classifying imbalanced data. IEEE Access 6, 28488–28500 (2018)
Ampountolas, A., Nyarko Nde, T., Date, P., Constantinescu, C.: A machine learning approach for micro-credit scoring. Risks 9(3), 50 (2021)
Bravo, J.M., Ayuso, M.: Forecasting the retirement age: a Bayesian model ensemble approach. In: Rocha, Á., Adeli, H., Dzemyda, G., Moreira, F., Ramalho Correia, A.M. (eds.) WorldCIST 2021. AISC, vol. 1365, pp. 123–135. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72657-7_12
Ashofteh, A. Bravo, J.M.: Life table forecasting in COVID-19 times: an ensemble learning approach. In: Rocha, A., Gonçalves, R., Penalvo, F.G., Martins, J. (eds.), Proceedings of CISTI 2021 - Iberian Conference on Information Systems and Technologies. IEEE Computer Society Press (2021). https://doi.org/10.23919/CISTI52073.2021.9476583
Bravo, J.M., El Mekkaoui, N.: Short-term CPI Inflation forecasting: probing with model combinations. In: Rocha, A. et al. (eds.) Information Systems and Technologies. WorldCIST 2022. Lecture Notes in Networks and Systems, vol. 468, pp. 564–578. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-04826-5_56
Ashofteh, A., Bravo, J.M. Ayuso, M.: A novel layered learning approach for forecasting respiratory disease excess mortality during the COVID-19 pandemic. In: CAPSI 2021 Proceedings, Volume 2021 – October 2021, Code 183080 (2021)
Bravo, J.M.: Longevity-linked life annuities: a Bayesian model ensemble pricing approach. In: CAPSI 2020 Proceedings. 29. https://aisel.aisnet.org/capsi2020/29 (Atas da 20ª Conferência da Associação Portuguesa de Sistemas de Informação 2020) (2020)
Bouttier, F., Marchal, H.: Probabilistic thunderstorm forecasting by blending multiple ensembles. Tellus A 72(1), 1–19 (2020)
Acknowledgements
This work has been supported by Fundação para a Ciência e a Tecnologia, grants UIDB/04152/2020 - Centro de Investigação em Gestão de Informação (MagIC) and UIDB/00315/2020 (BRU-ISCTE-IUL).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Raimundo, B., Bravo, J.M. (2024). Credit Risk Scoring: A Stacking Generalization Approach. In: Rocha, A., Adeli, H., Dzemyda, G., Moreira, F., Colla, V. (eds) Information Systems and Technologies. WorldCIST 2023. Lecture Notes in Networks and Systems, vol 799. Springer, Cham. https://doi.org/10.1007/978-3-031-45642-8_38
Download citation
DOI: https://doi.org/10.1007/978-3-031-45642-8_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45641-1
Online ISBN: 978-3-031-45642-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)