Abstract
Cross-project software defect prediction solves the problem of insufficient training data for traditional defect prediction, and overcomes the challenge of applying models learned from multiple different source projects to target project. At the same time, two new problems emerge: (1) too many irrelevant and redundant features in the model training process will affect the training efficiency and thus decrease the prediction accuracy of the model; (2) the distribution of metric values will vary greatly from project to project due to the development environment and other factors, resulting in lower prediction accuracy when the model achieves cross-project prediction. In the proposed method, the Pearson feature selection method is introduced to address data redundancy, and the metric compensation based transfer learning technique is used to address the problem of large differences in data distribution between the source project and target project. In this paper, we propose a software defect prediction method with metric compensation based on feature selection and transfer learning. The experimental results show that the model constructed with this method achieves better results on area under the receiver operating characteristic curve (AUC) value and F1-measure metric.
摘要
跨项目软件缺陷预测解决了传统缺陷预测中训练数据不足的问题, 克服了将多个不同源项目中学习的模型应用于目标项目的挑战。与此同时, 出现两个新问题: (1) 模型训练过程中过多无关和冗余特征影响训练效率, 降低了模型预测精度; (2) 由于开发环境等因素, 度量值的分布因项目而异, 当模型用于跨项目预测时, 预测精度较低。本文引入皮尔逊特征选择方法解决数据冗余问题, 采用基于迁移学习的度量补偿技术解决源项目和目标项目之间数据分布差异较大的问题。提出一种基于特征选择和迁移学习的度量补偿软件缺陷预测方法。实验结果表明, 用该方法构建的模型在AUC (接收器工作特性曲线下面积) 值和F1度量指标上取得较好结果。
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Amasaki S, Kawata K, Yokogawa T, 2015. Improving cross-project defect prediction methods with data simplification. Proc 41st Euromicro Conf on Software Engineering and Advanced Applications, p.96–103. https://doi.org/10.1109/SEAA.2015.25
Briand LC, Melo WL, Wüst J, 2002. Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans Softw Eng, 28(7):706–720. https://doi.org/10.1109/TSE.2002.1019484
Cai JC, Xu K, Zhu YH, et al., 2020. Prediction and analysis of net ecosystem carbon exchange based on gradient boosting regression and random forest. Appl Energy, 262:114566. https://doi.org/10.1016/j.apenergy.2020.114566
Chen JY, Yang YT, Hu KK, et al., 2019. Multiview transfer learning for software defect prediction. IEEE Access, 7:8901–8916. https://doi.org/10.1109/ACCESS.2018.2890733
Chen JY, Hu KK, Yu Y, et al., 2020. Software visualization and deep transfer learning for effective software defect prediction. Proc ACM/IEEE 42nd Int Conf on Software Engineering, p.578–589. https://doi.org/10.1145/3377811.3380389
Chen X, Zhao YQ, Wang QP, et al., 2018. MULTI: multi-objective effort-aware just-in-time software defect prediction. Inform Softw Technol, 93:1–13. https://doi.org/10.1016/j.infsof.2017.08.004
Fukushima T, Kamei Y, McIntosh S, et al., 2014. An empirical study of just-in-time defect prediction using cross-project models. Proc 11th Working Conf on Mining Software Repositories, p.172–181. https://doi.org/10.1145/2597073.2597075
Grimm LG, Nesselroade KP Jr, 2018. Statistical Applications for the Behavioral and Social Sciences (2nd Ed.). John Wiley & Sons, Hoboken, USA.
Guo YC, Shepperd M, Li N, 2018. Bridging effort-aware prediction and strong classification: a just-in-time software defect prediction study. Proc 40th Int Conf on Software Engineering: Companion Proceeedings, p.325–326. https://doi.org/10.1145/3183440.3194992
Habibi PA, Amrizal V, Bahaweres RB, 2018. Cross-project defect prediction for web application using naive Bayes (case study: petstore web application). Proc Int Workshop on Big Data and Information Security, p.13–18. https://doi.org/10.1109/IWBIS.2018.8471701
Hall T, Beecham S, Bowes D, et al., 2012. A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng, 38(6):1276–1304. https://doi.org/10.1109/TSE.2011.103
He P, Li B, Liu X, et al., 2015. An empirical study on software defect prediction with a simplified metric set. Inform Softw Technol, 59:170–190. https://doi.org/10.1016/j.infsof.2014.11.006
Herbold S, Trautsch A, Grabowski J, 2018. A comparative study to benchmark cross-project defect prediction approaches. Proc 40th Int Conf on Software Engineering, p.1063. https://doi.org/10.1145/3180155.3182542
Iqbal T, Cao Y, Kong QQ, et al., 2020. Learning with out-of-distribution data for audio classification. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.636–640. https://doi.org/10.1109/ICASSP40776.2020.9054444
Kamei Y, Fukushima T, McIntosh S, et al., 2016. Studying just-in-time defect prediction using cross-project models. Empir Softw Eng, 21(5):2072–2106. https://doi.org/10.1007/s10664-015-9400-x
Li K, Xiang ZL, Chen T, et al., 2020a. BILO-CPDP: bi-level programming for automated model discovery in cross-project defect prediction. Proc 35th IEEE/ACM Int Conf on Automated Software Engineering, p.573–584. https://doi.org/10.1145/3324884.3416617
Li K, Xiang ZL, Chen T, et al., 2020b. Understanding the automated parameter optimization on transfer learning for cross-project defect prediction: an empirical study. Proc ACM/IEEE 42nd Int Conf on Software Engineering, p.566–577. https://doi.org/10.1145/3377811.3380360
Liu C, Yang D, Xia X, et al., 2019. A two-phase transfer learning model for cross-project defect prediction. Inform Softw Technol, 107:125–136. https://doi.org/10.1016/j.infsof.2018.11.005
Lv WD, 2019. Method and application of data defect analysis based on linear discriminant regression of far subspace. Cluster Comput, 22(2):4277–4282. https://doi.org/10.1007/s10586-018-1861-4
Madeyski L, Jureczko M, 2015. Which process metrics can significantly improve defect prediction models? An empirical study. Softw Qual J, 23(3):393–422. https://doi.org/10.1007/s11219-014-9241-7
Malhotra R, 2015. A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput, 27:504–518. https://doi.org/10.1016/j.asoc.2014.11.023
Marian Z, Mircea IG, Czibula IG, et al., 2016. A novel approach for software defect prediction using fuzzy decision trees. Proc 18th Int Symp on Symbolic and Numeric Algorithms for Scientific Computing, p.240–247. https://doi.org/10.1109/SYNASC.2016.046
McBride R, Wang K, Ren ZY, et al., 2019. Cost-sensitive learning to rank. Proc 33rd AAAI Conf on Artificial Intelligence, p.4570–4577. https://doi.org/10.1609/aaai.v33i01.33014570
Nam J, Pan SJ, Kim S, 2013. Transfer defect learning. Proc 35th Int Conf on Software Engineering, p.382–391. https://doi.org/10.1109/ICSE.2013.6606584
Peng ML, Zhang Q, Xing XY, et al., 2019. Trainable undersampling for class-imbalance learning. Proc 33rd AAAI Conf on Artificial Intelligence, p.4707–4714. https://doi.org/10.1609/aaai.v33i01.33014707
Purnami SW, Trapsilasiwi RK, 2017. SMOTE-least square support vector machine for classification of multiclass imbalanced data. Proc 9th Int Conf on Machine Learning and Computing, p.107–111. https://doi.org/10.1145/3055635.3056581
Rahman F, Devanbu P, 2013. How, and why, process metrics are better. Proc 35th Int Conf on Software Engineering, p.432–441. https://doi.org/10.1109/ICSE.2013.6606589
Ryu D, Choi O, Baik J, 2014. Improving prediction robustness of VAB-SVM for cross-project defect prediction. Proc IEEE 17th Int Conf on Computational Science and Engineering, p.994–999. https://doi.org/10.1109/CSE.2014.198
Ryu D, Choi O, Baik J, 2016. Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empir Softw Eng, 21(1):43–71. https://doi.org/10.1007/s10664-014-9346-4
Ryu D, Jang JI, Baik J, 2017. A transfer cost-sensitive boosting approach for cross-project defect prediction. Softw Qual J, 25(1):235–272. https://doi.org/10.1007/s11219-015-9287-1
Saidi R, Bouaguel W, Essoussi N, 2019. Hybrid feature selection method based on the genetic algorithm and Pearson correlation coefficient. In: Hassanien AE (Ed.), Machine Learning Paradigms: Theory and Application. Springer, Cham, p.3–24. https://doi.org/10.1007/978-3-030-02357-7_1
Shippey T, Bowes D, Hall T, 2019. Automatically identifying code features for software defect prediction: using AST N-grams. Inform Softw Technol, 106:142–160. https://doi.org/10.1016/j.infsof.2018.10.001
Shuai B, Li HF, Li MJ, et al., 2013. Software defect prediction using dynamic support vector machine. Proc 9th Int Conf on Computational Intelligence and Security, p.260–263. https://doi.org/10.1109/CIS.2013.61
Siers MJ, Islam Z, 2015. Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem. Inform Syst, 51:62–71. https://doi.org/10.1016/j.is.2015.02.006
Tabassum S, Minku LL, Feng DY, et al., 2020. An investigation of cross-project learning in online just-in-time software defect prediction. Proc ACM/IEEE 42nd Int Conf on Software Engineering, p.554–565. https://doi.org/10.1145/3377811.3380403
Thejas GS, Garg R, Iyengar SS, et al., 2021. Metric and accuracy ranked feature inclusion: hybrids of filter and wrapper feature selection approaches. IEEE Access, 9:128687–128701. https://doi.org/10.1109/ACCESS.2021.3112169
Tsuda N, Washizaki H, Honda K, et al., 2019. WSQF: comprehensive software quality evaluation framework and benchmark based on SQuaRE. Proc IEEE/ACM 41st Int Conf on Software Engineering: Software Engineering in Practice, p.312–321. https://doi.org/10.1109/ICSE-SEIP.2019.00045
Wahono RS, 2015. A systematic literature review of software defect prediction: research trends, datasets, methods and frameworks. J Softw Eng, 1(1):1–16.
Wan ZY, Xia X, Hassan AE, et al., 2020. Perceptions, expectations, and challenges in defect prediction. IEEE Trans Softw Eng, 46(11):1241–1266. https://doi.org/10.1109/TSE.2018.2877678
Wang HJ, Khoshgoftaar TM, Napolitano A, 2010. A comparative study of ensemble feature selection techniques for software defect prediction. Proc 9th Int Conf on Machine Learning and Applications, p.135–140. https://doi.org/10.1109/ICMLA.2010.27
Watanabe S, Kaiya H, Kaijiri K, 2008. Adapting a fault prediction model to allow inter languagereuse. Proc 4th Int Workshop on Predictor Models in Software Engineering, p.19–24. https://doi.org/10.1145/1370788.1370794
Wu F, Jing XY, Dong XW, et al., 2017. Cross-project and within-project semi-supervised software defect prediction problems study using a unified solution. Proc IEEE/ACM 39th Int Conf on Software Engineering Companion, p.195–197. https://doi.org/10.1109/ICSE-C.2017.72
Yang XL, Lo D, Xia X, et al., 2017. TLEL: a two-layer ensemble learning approach for just-in-time defect prediction. Inform Softw Technol, 87:206–220. https://doi.org/10.1016/j.infsof.2017.03.007
Yu JL, Benesty J, Huang GP, et al., 2015. Optimal single-channel noise reduction filtering matrices from the Pearson correlation coefficient perspective. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.201–205. https://doi.org/10.1109/ICASSP.2015.7177960
Author information
Authors and Affiliations
Contributions
Jinfu CHEN and Saihua CAI designed the research. Xiaoli WANG, Saihua CAI, and Jiaping XU processed the data. Jinfu CHEN, Xiaoli WANG, and Saihua CAI drafted the paper. Xiaoli WANG, Jiaping XU, Jingyi CHEN, and Haibo CHEN finished the experiments. Jingyi CHEN and Haibo CHEN helped organize the paper. Jinfu CHEN, Xiaoli WANG, and Saihua CAI revised and finalized the paper.
Corresponding author
Additional information
Compliance with ethics guidelines
Jinfu CHEN, Xiaoli WANG, Saihua CAI, Jiaping XU, Jingyi CHEN, and Haibo CHEN declare that they have no conflict of interest.
Project supported by the National Natural Science Foundation of China (Nos. 62172194 and U1836116), the National Key R&D Program of China (No. 2020YFB1005500), the Leadingedge Technology Program of Jiangsu Provincial Natural Science Foundation, China (No. BK20202001), the China Postdoctoral Science Foundation (No. 2021M691310), the Postdoctoral Science Foundation of Jiangsu Province, China (No. 2021K636C), and the Future Network Scientific Research Fund Project, China (No. FNSRFP-2021-YB-50)
Rights and permissions
About this article
Cite this article
Chen, J., Wang, X., Cai, S. et al. A software defect prediction method with metric compensation based on feature selection and transfer learning. Front Inform Technol Electron Eng 23, 715–731 (2022). https://doi.org/10.1631/FITEE.2100468
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.2100468