Abstract
The localisation of the bug position in a source code and the prediction of which specific parts of a source code might be the cause of defects play an important role in maintaining software quality. Both approaches are based on applying information retrieval techniques and machine learning or deep learning methods. The prerequisite for using these approaches is the availability of a consistent bug dataset of sufficient size. This paper presents an overview of available public bug datasets and analyses their specific application areas. The paper also suggests possible future research directions in this field.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Weka ARFF, https://www.cs.waikato.ac.nz/ml/weka/arff.html.
References
D’Ambros, M., Lanza, M., Robbes, R.: An extensive comparison of bug prediction approaches. In: 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), pp. 31–41 (2010)
Ferenc, R., Gyimesi, P., Gyimesi, G., Tóth, Z., Gyimóthy, T.: An automatically created novel bug dataset and its validation in bug prediction. J. Syst. Softw. 169, 110691 (2020)
Ferenc, R., Tóth, Z., Ladányi, G., Siket, I., Gyimóthy, T.: A public unified bug dataset for java and its assessment regarding metrics and bug prediction, March 2020
Goues, C., Forrest, S., Weimer, W.: Current challenges in automatic software repair. Software Qual. J. 21, 421–443 (2013)
Gray, D., Bowes, D., Davey, N., Sun, Y., Christianson, B.: Reflections on the NASA MDP data sets. IET Softw. 6, 549–558 (2012)
Kitchenham, B., Charters, S.: Guidelines for performing systematic literature reviews in software engineering (2017)
Madeiral, F., Urli, S., Maia, M., Monperrus, M.: Bears: an extensible java bug benchmark for automatic program repair studies. In: 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), February 2019
Martinez, M., Durieux, T., Sommerard, R., Xuan, J., Monperrus, M.: Automatic repair of real bugs in java: a large-scale experiment on the Defects4J dataset. Empir. Softw. Eng. 22(4), 1936–1964 (2017)
Matias, M., Thomas, D., Romain, S., Jifeng, X., Martin, M.: Proceedings of the 15th International Conference on Mining Software Repositories, MSR 2018, pp. 10–13 (2018)
Murillo-Morera, J., Quesada-López, C., Castro-Herrera, C., Jenkins, M.: An empirical evaluation of nasa-mdp data sets using a genetic defect-proneness prediction framework. In: 2016 IEEE 36th Central American and Panama Convention (CONCAPAN XXXVI), pp. 1–6 (2016)
Muvva, S., Rao, A.E., Chimalakonda, S.: BuGL–a cross-language dataset for bug localization. arXiv preprint arXiv:2004.08846 (2020)
Muvva, S., Sangle, S., Chimalakonda, S.: BuGC: C dataset for bug localization. Zenodo, October 2020
Radu, A., Nadi, S.: A dataset of non-functional bugs. In: 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), pp. 399–403 (2019)
Ramadhina, S., Bahaweres, R., Hermadi, I., Suroso, A., Rodoni, A., Arkeman, Y.: Software defect prediction using process metrics systematic literature review: dataset and granularity level, pp. 1–7, September 2021
Rudolf, F., Péter, G., Gábor, G., Zoltán, T., Tibor, G.: An automatically created novel bug dataset and its validation in bug prediction. J. Syst. Softw. 169, 110691 (2020)
Sayyad Shirabad, J., Menzies, T.: The PROMISE Repository of Software Engineering Databases. University of Ottawa, Canada, School of Information Technology and Engineering (2005)
Schröter, A., Zimmermann, T., Premraj, R., Zeller, A.: If your bug database could talk. In: Proceedings of the 5th International Symposium on Empirical Software Engineering, pp. 18–20 (2006)
Shepperd, M., Song, Q., Sun, Z., Mair, C.: NASA MDP software defects data sets (2018)
Thapaliyal, D., Verma, G.: Software defects and object oriented metrics - an empirical analysis. Int. J. Comput. Appl. 9, 41–44 (2010)
Thomas, D., Martin, M.: IntroClassJava: a benchmark of 297 small and buggy Java programs, pp. 10–13. Universite Lille 1 (2016)
Tóth, Z., Gyimesi, P., Ferenc, R.: A public bug database of github projects and its application in bug prediction. In: Gervasi, O., et al. (eds.) ICCSA 2016. LNCS, vol. 9789, pp. 625–638. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42089-9_44
Zhou, J., Zhang, H., Lo, D.: Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In: Proceedings - International Conference on Software Engineering, pp. 14–24, June 2012
Zimmermann, T., Premraj, R., Zeller, A.: Predicting defects for eclipse. In: Third International Workshop on Predictor Models in Software Engineering (PROMISE 2007: ICSE Workshops 2007), p. 9 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Holek, T., Bures, M., Cerny, T. (2024). Review of Open Software Bug Datasets. In: Rocha, A., Adeli, H., Dzemyda, G., Moreira, F., Colla, V. (eds) Information Systems and Technologies. WorldCIST 2023. Lecture Notes in Networks and Systems, vol 801. Springer, Cham. https://doi.org/10.1007/978-3-031-45648-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-45648-0_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45647-3
Online ISBN: 978-3-031-45648-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)