Abstract
The problem of class imbalance along with class overlapping has become a major issue in the domain of supervised learning. Most classification algorithms assume equal cardinality of the classes under consideration while optimising the cost function, and this assumption does not hold true for imbalanced datasets, which results in suboptimal classification. Therefore, various approaches, such as undersampling, oversampling, cost-sensitive learning and ensemble-based methods, have been proposed for dealing with imbalanced datasets. However, undersampling suffers from information loss, oversampling suffers from increased runtime and potential overfitting, while cost-sensitive methods suffer due to inadequately defined cost assignment schemes. In this paper, we propose a novel boosting-based method called Locality Informed Under-Boosting (LIUBoost). LIUBoost uses undersampling for balancing the datasets in every boosting iteration like Random Undersampling with Boosting (RUSBoost), while incorporating a cost term for every instance based on their hardness into the weight update formula minimising the information loss introduced by undersampling. LIUBoost has been extensively evaluated on 18 imbalanced datasets, and the results indicate significant improvement over existing best performing method RUSBoost.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Afza, A.A., Farid, D.M., Rahman, C.M.: A hybrid classifier using boosting, clustering, and naïve bayesian classifier. World Comput. Sci. Inf. Technol. J. (WCSIT) 1, 105–109 (2011). ISSN: 2221-0741
Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult. Valued Logic Soft Comput. 17 (2011)
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIDKDD Explor. Newsl. 6(1), 20–29 (2004)
Błaszczyński, J., Stefanowski, J., Idkowiak, Ł.: Extending bagging for imbalanced data. In: Proceedings of the 8th International Conference on Computer Recognition Systems CORES 2013, pp. 269–278. Springer (2013)
Borowska, K., Stepaniuk, J.: Rough sets in imbalanced data problem: Improving re–sampling process. In: IFIP International Conference on Computer Information Systems and Industrial Management, pp. 459–469. Springer (2017)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. Advances in Knowledge Discovery and Data Mining, pp. 475–482 (2009)
Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: Smoteboost: improving prediction of the minority class in boosting. In: European Conference on Principles of Data Mining and Knowledge Discovery, pp. 107–119. Springer (2003)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Fan, W., Stolfo, S.J., Zhang, J., Chan, P.K.: Adacost: misclassification cost-sensitive boosting. ICML 99, 97–105 (1999)
Farid, D., Nguyen, H.H., Darmont, J., Harbi, N., Rahman, M.Z.: Scaling up detection rates and reducing false positives in intrusion detection using NBtree. In: International Conference on Data Mining and Knowledge Engineering (ICDMKE 2010), pp. 186–190 (2010)
Farid, D.M., Al-Mamun, M.A., Manderick, B., Nowe, A.: An adaptive rule-based classifier for mining big biological data. Expert Syst. Appl. 64, 305–316 (2016)
Farid, D.M., Nowé, A., Manderick, B.: A new data balancing method for classifying multi-class imbalanced genomic data. In: Proceedings of 5th Belgian-Dutch Conference on Machine Learning (Benelearn), pp. 1–2 (2016)
Farid, D.M., Zhang, L., Hossain, A., Rahman, C.M., Strachan, R., Sexton, G., Dahal, K.: An adaptive ensemble classifier for mining concept drifting data streams. Expert Syst. Appl. 40(15), 5895–5906 (2013)
Farid, D.M., Zhang, L., Rahman, C.M., Hossain, M., Strachan, R.: Hybrid decision tree and naïve bayes classifiers for multi-class classification tasks. Expert Syst. Appl. 41(4), 1937–1946 (2014)
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Applications and Reviews) 42(4), 463–484 (2012)
García, V., Sánchez, J., Mollineda, R.: An empirical study of the behavior of classifiers on imbalanced and overlapped data sets. In: Proceedings of Progress in Pattern Recognition, Image Analysis and Applications, pp. 397–406 (2007)
Han, H., Wang, W.Y., Mao, B.H.: Borderline-smote: a new over-sampling method in imbalanced data sets learning. Advances in Intelligent Computing, pp. 878–887 (2005)
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE World Congress on Computational Intelligence. IEEE International Joint Conference on Neural Networks, 2008, IJCNN 2008, pp. 1322–1328. IEEE (2008)
Hopfield, J.J.: Artificial neural networks. IEEE Circ. Devices Mag. 4(5), 3–10 (1988)
Karakoulas, G.I., Shawe-Taylor, J.: Optimizing classifers for imbalanced training sets. In: Proceedings of Advances in Neural Information Processing Systems, pp. 253–259 (1999)
Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 41(3), 552–568 (2011)
Liu, Y.H., Chen, Y.T.: Total margin based adaptive fuzzy support vector machines for multiview face recognition. In: 2005 IEEE International Conference on Systems, Man and Cybernetics, vol. 2, pp. 1704–1711. IEEE (2005)
Mani, I., Zhang, I.: KNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of Workshop on Learning from Imbalanced Datasets, vol. 126 (2003)
Napierala, K., Stefanowski, J.: Types of minority class examples and their influence on learning classifiers from imbalanced data. J. Intell. Inf. Syst. 46(3), 563–597 (2016)
Prati, R.C., Batista, G., Monard, M.C., et al.: Class imbalances versus class overlapping: an analysis of a learning system behavior. In: Proceedings of MICAI, vol. 4, pp. 312–321. Springer (2004)
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 40(1), 185–197 (2010)
Sun, Y., Kamel, M.S., Wong, A.K., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40(12), 3358–3378 (2007)
Sun, Z., Song, Q., Zhu, X., Sun, H., Xu, B., Zhou, Y.: A novel ensemble method for classifying imbalanced data. Pattern Recogn. 48(5), 1623–1637 (2015)
Tomašev, N., Mladenić, D.: Class imbalance and the curse of minority hubs. Knowl. Based Syst. 53, 157–172 (2013)
Wilcoxon, F.: Individual comparisons by ranking methods. Biom. Bull. 1(6), 80–83 (1945)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ahmed, S., Rayhan, F., Mahbub, A., Rafsan Jani, M., Shatabda, S., Farid, D.M. (2019). LIUBoost: Locality Informed Under-Boosting for Imbalanced Data Classification. In: Abraham, A., Dutta, P., Mandal, J., Bhattacharya, A., Dutta, S. (eds) Emerging Technologies in Data Mining and Information Security. Advances in Intelligent Systems and Computing, vol 813. Springer, Singapore. https://doi.org/10.1007/978-981-13-1498-8_12
Download citation
DOI: https://doi.org/10.1007/978-981-13-1498-8_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1497-1
Online ISBN: 978-981-13-1498-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)