Abstract
Software Defect Prediction (SDP) is one of the key tasks in the testing phase of Software Development Life Cycle (SDLC) that discovers modules that are more susceptible to defects and therefore requires significant testing to identify these flaws early in order to cut up the extra cost for software development. Much research has been performed on Cross-Project Defect Prediction (CPDP), which seeks to predict defects in the target application that lacks historical defect prediction information or has restricted defect information to construct an efficient generalized model for forecasting defects in a software project. The proposed research work focuses on defect forecast using a heterogeneous metric set so that there are no common metrics between the source and the target applications. This paper also discusses the Class Imbalance Problem (CIP) that occurs in a dataset because of the disproportionate number of favorable and unfavorable cases. If trained using imbalance dataset, a classifier will offer biased outcomes. We used Adaptive Boost (AdaBoost) method to manage CIP in Heterogeneous Cross-Project Defect Prediction (HCPDP), and after managing CIP, experimental findings demonstrate significant improvements.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
D. Han, I.P. Hoh, S. Kim, T. Lee, J. Nam, Micro interaction metrics for defect prediction, in Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ACM, New York, USA, 2011)
P. He, B. Li, Y. Ma, Towards cross-project defect prediction with imbalanced feature sets. CoRR, abs/1411.4228 (2014)
W. Fu, S. Kim, T. Menzies, J. Nam, L. Tan, Heterogeneous defect prediction, in Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE (ACM, New York, NY, USA, 2015), pp. 508–519
A.B. Bener, T. Menzies, J. Di Stefano, B. Turhan, On the relative value of cross- company and within-company data for defect prediction. Empirical Softw. Eng. 14, 540–578 (2009)
X. Guo, Y. Yin, C. Dong, G. Yang, G. Zhou, On the class imbalance problem, in Fourth International Conference on Natural Computation (School of Computer Science and Technology, Shandong University, Jinan, 250101, China, 2008)
M.W. Mwadulo, A review on feature selection methods for classification tasks. Int. J. Comput. Appl. Technol. Res. 5(6), 395–402 (2015)
F.J. Massey, The Kolmogorov-Smirnov test for goodness of fit. J. Am. Stat. Assoc. 46(253), 68–78 (1951)
C. Spearman, The proof and measurement of association between two things. Int. J. Epidemiol. 39(5), 1137–1150 (2010)
N. Rout, D. Mishra, M.K. Mallick, Handling imbalanced data: a survey, in International Proceedings on Advances in Soft Computing, Intelligent Systems and Applications, Advances in Intelligent Systems and Computing, vol. 628. https://doi.org/10.1007/978-981-10-5272-9_39 (2018)
https://towardsdatascience.com/methods-for-dealing-with-imbalanced-data-5b761be45a18
L.C. Briand, W.L. Melo, J. Wurst, Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans. Softw. Eng. 28, 706–720 (2002)
A.B. Bener, T. Menzies, J.S. Di Stefano, B. Turhan, On the relative value of cross- company and within-company data for defect prediction. Empirical Softw. Eng. 14(5), 540–578 (2009)
Z. Xu, P. Yuan, T. Zhang, Y. Tang, S. Li, Z. Xia, HDA: cross project defect prediction via heterogeneous domain adaptation with dictionary learning. IEEE Access 6, 57597–57613 (2018)
W. Fu, T. Menzies, X. Shen, Tuning for software analytics: is it really necessary? Inf. Softw. Technol. 76, 135–146 (2016)
J.E.T. Akinsola, F.Y. Osisanwo, O. Awodele, J.O. Hinmikaiye, O. Olakanmi, J. Akinjobi, Supervised machine learning algorithms: classification and comparison. Int. J. Comput. Trends Technol. (IJCTT) 48(3), 128–138 (2017)
https://machinelearningmastery.com/gentle-introduction,gradient-boosting-algorithm-machine-learning/
M.J. Justin, M.K. Taghi, Survey on deep learning with class imbalance. J Big Data 27(6), 1–54 (2019)
S. Maheshwari, R.C. Jain, R.S. Jandon, A review of class imbalance problem: analysis and potential solution. Int. J. Comput. Trends Technol. (IJCTT) 14(6), 3 (2017)
F. Rayhan, S. Ahmed, A. Mahbub, M.R. Jani, S. Shatabda, D.M. Farid, C.M. Rahman: ME boosting: mixed estimators with boosting for imbalance data classification. arXiv:1712.06658v2[cs.LG], 13 January 2018
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Vashisht, R., Rizvi, S.A.M. (2021). Handling Class Imbalance Problem in Heterogeneous Cross-Project Defect Prediction. In: Gupta, D., Khanna, A., Bhattacharyya, S., Hassanien, A.E., Anand, S., Jaiswal, A. (eds) International Conference on Innovative Computing and Communications. Advances in Intelligent Systems and Computing, vol 1165. Springer, Singapore. https://doi.org/10.1007/978-981-15-5113-0_7
Download citation
DOI: https://doi.org/10.1007/978-981-15-5113-0_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5112-3
Online ISBN: 978-981-15-5113-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)