Abstract
In this paper, we present a new approach to localize a bug in the software source file hierarchy. The proposed approach uses log files of the revision control system and bug reports information in open bug repository of open source projects to train a Support Vector Machine (SVM) classifier. Our approach employs textual information in summary and description of bugs reported to the bug repository, in order to form machine learning features. The class labels are revision paths of fixed issues, as recorded in the log file of the revision control system. Given an unseen bug instance, the trained classifier can predict which part of the software source file hierarchy (revision path) is more likely to be related to this issue. Experimental results on more than 2000 bug reports of ‘UI’component of the Eclipse JDT project from the initiation date of the project until November 24, 2009 (about 8 years) using this approach, show weighted precision and recall values of about 98% on average.
Chapter PDF
Similar content being viewed by others
Keywords
- Support Vector Machine
- Open Source Project
- Open Source Software Project
- Version Control System
- Concurrent Version System
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Deshpande, A., Riehle, D.: The Total Growth of Open Source. In: The 4th International Conference on Open Source Systems, OSS 2008 (2008), http://homepages.uc.edu/%7Edeshpaaa/oss-2008-total-growth-final.pdf (Retrieved on November 27, 2009)
Kroah-Hartman, G., Corbet, J., McPherson, A.: Linux Kernel Development, How Fast it is Going. The Linux Foundation Publications (2008), https://www.linuxfoundation.org/publications/linuxkerneldevelopment.php (Retrieved on November 27, 2009)
Eclipse Bug Repository, https://bugs.eclipse.org/bugs (Verified on November 24, 2009)
Anvik, J., Hiew, L., Morphy, G.C.: Who Should Fix This Bug? In: Proc. 28th International Conference on Software Engineering, ICSE 2006 (2006)
Anvik, J., Morphy, G.C.: Determining Implementation Expertise from Bug Reports. In: 4th IEEE International Workshop on Mining Software Repositories, MSR 2007 (2007)
Baysal, O., Godfrey, M.W., Cohen, R.: A Bug You Like: A Framework for Automated Assignment of Bugs. In: 17th IEEE International Conference on Program Comprehension, ICPC 2009 (2009)
Fogel, K.: Producing open source software, 1st edn., pp. 60–79. O’Reilly, Sebastopol (2005)
Debian Bug Tracking System, http://www.debian.org/Bugs/ (Verified on December 9, 2009)
Jeong, G., Kim, S., Zimmermann, T.: Improving Bug Triage with Bug Tossing Graphs. In: The 7th joint meeting of the European Software Engineering Conference (ESEC) and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, FSE (2009)
Anatomy of Eclipse Bugs, Retrieved from http://www.bugzilla.org/docs/2.18/html/bug_page.html (December 19, 2009)
Life-cycle of Eclipse Bugs, Retrieved from http://www.bugzilla.org/docs/2.18/html/lifecycle.html (December 19, 2009)
Witten, I.H., Frank, E.: Data Mining, Practical Machine Learning Tools & Techniques, 2nd edn. Elsevier, Amsterdam (2005)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)
Clarke, E., Grumberg, O., Peled, D.: Model Checking. MIT Press, Cambridge (1999)
Visser, W., Havelund, K., Brat, G., Park, S.: Model checking programs. In: Proceeding of the 15th IEEE International Conference on Automated Software Engineering, ASE 2000 (2000)
Musuvathi, M., Park, D., Chou, A., Engler, D., Cmc, D.D.: A pragmatic approach to model checking real code. In: Proceeding of the 5th Symposium on Operating System Design and Implementation, OSDI 2002 (2002)
Liu, C., Yan, X., Fei, L., Han, J., Midkiff, S.P.: SOBER: Statistical Model-Based Bug Localization. In: The 3rd joint meeting of the European Software Engineering Conference (ESEC) and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, FSE (2005)
Gyimothy, T., Ferenc, R., Siket, I.: Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction. IEEE Trans. on Software Eng. 31(10), 897–910 (2005)
Cleve, H., Zeller, A.: Locating causes of program failures. In: Inverardi, P., Jazayeri, M. (eds.) ICSE 2005. LNCS, vol. 4309. Springer, Heidelberg (2006)
Liblit, B., Naik, M., Zheng, A., Aiken, A., Jordan, M.: Scalable statistical bug isolation. In: Proc. of ACM SIGPLAN 2005 International Conference on Programming Language Design and Implementation, PLDI 2005 (2005)
Brun, Y., Ernst, M.D.: Finding Latent Code Errors via Machine Learning over Program Executions. In: Proc. of 26th International Conference on Software Engineering (ICSE 2004) (2004)
Ernst, M.D., Perkins, J.H., Guo, P.J., McCamant, S., Pacheco, C., Tschantz, M.S., Xiao, C.: The Daikon System for Dynamic Detection of Likely Invariants. Science of Computer Programming (2006)
Kim, S., Whitehead Jr., E.J., Zhang, Y.: Classifying Software Changes: Clean or Buggy? IEEE Trans. on Software Eng. 34(2), 181–196 (2008)
Shivaji, S., Whitehead Jr., E.J., Akella, R., Kim, S.: Reducing Features to Imrove Bug Prediction. In: Proceeding of the 15th IEEE International Conference on Automated Software Engineering, ASE 2009 (2009)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24(5), 513–523 (1988)
Plat, J.C.: Technical Report, MSR-TR-98-14, Microsoft Research (April 21, 1998)
Plat, J.C.: Advances in Kernel Methods - Support Vector Learning, pp. 41–65. MIT Press, Cambridge (1998)
WEKA 3-7-0 source comments, weka.classifiers.functions.SMO
The official WEKA manual, Retrieved from http://www.cs.waikato.ac.nz/ml/weka/ (December 25, 2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 IFIP
About this paper
Cite this paper
Moin, A.H., Khansari, M. (2010). Bug Localization Using Revision Log Analysis and Open Bug Repository Text Categorization. In: Ågerfalk, P., Boldyreff, C., González-Barahona, J.M., Madey, G.R., Noll, J. (eds) Open Source Software: New Horizons. OSS 2010. IFIP Advances in Information and Communication Technology, vol 319. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13244-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-13244-5_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13243-8
Online ISBN: 978-3-642-13244-5
eBook Packages: Computer ScienceComputer Science (R0)