Skip to main content

Handling Class Imbalance Problem in Heterogeneous Cross-Project Defect Prediction

  • Conference paper
  • First Online:
International Conference on Innovative Computing and Communications

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1165))

Abstract

Software Defect Prediction (SDP) is one of the key tasks in the testing phase of Software Development Life Cycle (SDLC) that discovers modules that are more susceptible to defects and therefore requires significant testing to identify these flaws early in order to cut up the extra cost for software development. Much research has been performed on Cross-Project Defect Prediction (CPDP), which seeks to predict defects in the target application that lacks historical defect prediction information or has restricted defect information to construct an efficient generalized model for forecasting defects in a software project. The proposed research work focuses on defect forecast using a heterogeneous metric set so that there are no common metrics between the source and the target applications. This paper also discusses the Class Imbalance Problem (CIP) that occurs in a dataset because of the disproportionate number of favorable and unfavorable cases. If trained using imbalance dataset, a classifier will offer biased outcomes. We used Adaptive Boost (AdaBoost) method to manage CIP in Heterogeneous Cross-Project Defect Prediction (HCPDP), and after managing CIP, experimental findings demonstrate significant improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. D. Han, I.P. Hoh, S. Kim, T. Lee, J. Nam, Micro interaction metrics for defect prediction, in Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ACM, New York, USA, 2011)

    Google Scholar 

  2. P. He, B. Li, Y. Ma, Towards cross-project defect prediction with imbalanced feature sets. CoRR, abs/1411.4228 (2014)

    Google Scholar 

  3. W. Fu, S. Kim, T. Menzies, J. Nam, L. Tan, Heterogeneous defect prediction, in Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE (ACM, New York, NY, USA, 2015), pp. 508–519

    Google Scholar 

  4. A.B. Bener, T. Menzies, J. Di Stefano, B. Turhan, On the relative value of cross- company and within-company data for defect prediction. Empirical Softw. Eng. 14, 540–578 (2009)

    Article  Google Scholar 

  5. X. Guo, Y. Yin, C. Dong, G. Yang, G. Zhou, On the class imbalance problem, in Fourth International Conference on Natural Computation (School of Computer Science and Technology, Shandong University, Jinan, 250101, China, 2008)

    Google Scholar 

  6. M.W. Mwadulo, A review on feature selection methods for classification tasks. Int. J. Comput. Appl. Technol. Res. 5(6), 395–402 (2015)

    Google Scholar 

  7. F.J. Massey, The Kolmogorov-Smirnov test for goodness of fit. J. Am. Stat. Assoc. 46(253), 68–78 (1951)

    Article  Google Scholar 

  8. C. Spearman, The proof and measurement of association between two things. Int. J. Epidemiol. 39(5), 1137–1150 (2010)

    Article  Google Scholar 

  9. N. Rout, D. Mishra, M.K. Mallick, Handling imbalanced data: a survey, in International Proceedings on Advances in Soft Computing, Intelligent Systems and Applications, Advances in Intelligent Systems and Computing, vol. 628. https://doi.org/10.1007/978-981-10-5272-9_39 (2018)

  10. https://towardsdatascience.com/methods-for-dealing-with-imbalanced-data-5b761be45a18

  11. L.C. Briand, W.L. Melo, J. Wurst, Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans. Softw. Eng. 28, 706–720 (2002)

    Article  Google Scholar 

  12. A.B. Bener, T. Menzies, J.S. Di Stefano, B. Turhan, On the relative value of cross- company and within-company data for defect prediction. Empirical Softw. Eng. 14(5), 540–578 (2009)

    Article  Google Scholar 

  13. Z. Xu, P. Yuan, T. Zhang, Y. Tang, S. Li, Z. Xia, HDA: cross project defect prediction via heterogeneous domain adaptation with dictionary learning. IEEE Access 6, 57597–57613 (2018)

    Article  Google Scholar 

  14. W. Fu, T. Menzies, X. Shen, Tuning for software analytics: is it really necessary? Inf. Softw. Technol. 76, 135–146 (2016)

    Google Scholar 

  15. https://www.toppr.com/guides/business-mathematics-and-statistics/correlation-and-regression/karl-pearsons-coefficient-correlation/

  16. J.E.T. Akinsola, F.Y. Osisanwo, O. Awodele, J.O. Hinmikaiye, O. Olakanmi, J. Akinjobi, Supervised machine learning algorithms: classification and comparison. Int. J. Comput. Trends Technol. (IJCTT) 48(3), 128–138 (2017)

    Google Scholar 

  17. https://machinelearningmastery.com/gentle-introduction,gradient-boosting-algorithm-machine-learning/

  18. M.J. Justin, M.K. Taghi, Survey on deep learning with class imbalance. J Big Data 27(6), 1–54 (2019)

    Google Scholar 

  19. S. Maheshwari, R.C. Jain, R.S. Jandon, A review of class imbalance problem: analysis and potential solution. Int. J. Comput. Trends Technol. (IJCTT) 14(6), 3 (2017)

    Google Scholar 

  20. F. Rayhan, S. Ahmed, A. Mahbub, M.R. Jani, S. Shatabda, D.M. Farid, C.M. Rahman: ME boosting: mixed estimators with boosting for imbalance data classification. arXiv:1712.06658v2[cs.LG], 13 January 2018

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rohit Vashisht .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vashisht, R., Rizvi, S.A.M. (2021). Handling Class Imbalance Problem in Heterogeneous Cross-Project Defect Prediction. In: Gupta, D., Khanna, A., Bhattacharyya, S., Hassanien, A.E., Anand, S., Jaiswal, A. (eds) International Conference on Innovative Computing and Communications. Advances in Intelligent Systems and Computing, vol 1165. Springer, Singapore. https://doi.org/10.1007/978-981-15-5113-0_7

Download citation

Publish with us

Policies and ethics