Abstract
Bug reports are widely employed to facilitate software tasks in software maintenance. Since bug reports are contributed by people, the authorship characteristics of contributors may heavily impact the perfor-mance of resolving software tasks. Poorly written bug reports may delay developers when fixing bugs. However, no in-depth investigation has been conducted over the authorship characteristics. In this study, we first leverage byte-level N-grams to model the authorship characteristics and employ Normalized Simplified Profile Intersection (NSPI) to identify the similarity of the authorship characteristics. Then, we investigate a series of properties related to contributors’ authorship characteristics, including the evolvement over time and the variation among distinct products in open source projects. Moreover, we show how to leverage the authorship characteristics to facilitate a well-known task in software maintenance, namely Bug Report Summarization (BRS). Experiments on open source projects validate that incorporating the authorship characteristics can effectively improve a state-of-the-art method in BRS. Our findings suggest that contributors should retain stable authorship characteristics and the authorship characteristics can assist in resolving software tasks.
创新点
本文创造性的利用比特级N元文法来为缺陷仓库中的贡献者的写作风格建模, 同时引入NSPI来度量两种写作风格之间的相似度。本文研究了贡献者写作风格的一些性质, 包括贡献者写作风格随时间的变化情况以及在不同产品的变化情况等。进而利用贡献者写作风格来帮助解决一个典型的软件维护任务, 即缺陷报告摘要。本文的实验数据已经公开。实验结果表明, 利用开发者写作风格能够有效的提升缺陷报告摘要的效果
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Pressman R S, Ince D. Software Engineering: A Practitioner’s Approach. New York: McGraw-Hill, 2010
Anvik J, Hiew L, Murphy G C. Who should fix this bug? In: Proceedings of the 28th International Conference on Software Engineering, Shanghai, 2006. 361–370
Anvik J, Murphy G C. Reducing the effort of bug report triage: recommenders for development-oriented decisions. ACM Trans Softw Eng Methodol, 2011, 20: 10
Bishnu P S, Bhattacherjee V. Software fault prediction using Quad Tree-based K-means clustering algorithm. IEEE Trans Knowl Data Eng, 2012, 24: 1146–1150
Shivaji S, Whitehead J, Akella R, et al. Reducing features to improve code change based bug prediction. IEEE Trans Softw Eng, 2012, 22: 1–17
Artzi S, Kiezun A, Dolby J, et al. Finding bugs in web applications using dynamic test generation and explicit state model checking. IEEE Softw, 2010, 36: 474–494
Zhou J, Zhang H Y, Lo D. Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the 34th International Conference on Software Engineering, Zurich, 2012. 14–24
Mani S, Catherine R, Sinha V S, et al. AUSUM: approach for unsupervised bug report summarization. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, New York, 2012. 11–21
Rastkar S, Murphy G C, Murray G. Automatic summarization of bug reports. IEEE Trans Softw Eng, 2014, 40: 366–380
Lotufo R, Malik Z, Czarnecki K. Modelling the ‘hurrie’ bug report reading process to summarize bug report. In: Pro-ceedings of the International Conference on Software Maintenance, Trento, 2012. 430–439
Zimmermann T, Premraj R, Bettenburg N, et al. What makes a good bug report? IEEE Trans Softw Eng, 2010, 36: 618–643
Keselj V, Peng F, Cercone N, et al. N-gram based author profiles for authorship attribution. In: Proceedings of Pacific Association for Computational Linguistics, Harifax, 2003. 255–264
Frantzeskou G, Stamatatos E, Gritzalis S, et al. Effective identification of source code authors using byte-level infor-mation. In: Proceedings of the 28th International Conference on Software Engineering, Shanghai, 2006. 893–896
Herzig K, Just S, Zeller A. It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 35th International Conference on Software Engineering, San Francisco, 2013. 392–401
Rahman F, Devanbu P. Ownership, experience and defects: a fine-grained study of authorship. In: Proceedings of the 33rd International Conference on Software Engineering, New York, 2011. 491–500
Bird C, Nagappan N, Murphy B, et al. Don’t touch my code!: examining the effects of ownership on software quality. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, New York, 2011. 4–14
Burrows S, Uitdenbogerd A L, Turpin A. Comparing techniques for authorship attribution of source code. Softw Pract Exper, 2014, 44: 1–32
Zou W Q, Xia X, Zhang W Q, et al. An empirical study of bug fixing rate. In: Proceedings of the 39th Annual International Computers, Software & Applications Conference, Taichung, 2015. 254–263
Zhang R, Yu W Z, Sha C F, et al. Product-oriented review summarization and scoring. Front Comput Sci, 2015, 9: 210–223
Nenkova A, Passonneau R. Evaluating content selection in summarization: the pyramid method. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Boston, 2004. 145–152
Carenini G, Ng R T, Zhou X. Summarizing emails with conversational cohesion and subjectivity. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, New York, 2008. 353–361
Xie T, Thummalapenta S, Lo D, et al. Data mining for software engineering. Computer, 2009, 8: 55–62
Zhang W Q, Nie L M, Jiang H, et al. Developer social networks in software engineering: construction, analysis, and applications. Sci China Inf Sci, 2014, 57: 121101
Jeong G, Kim S, Zimmermann T. Improving bug triage with tossing graphs. In: Proceedings Joint Meeting of 12th Eu-ropean Software Engineering Conference & 17th ACMSIGSOFT Symposium on Foundations of Software Engineering, Amsterdam, 2009. 111–120
Xuan J F, Jiang H, Ren Z L, et al. Developer prioritization in bug repositories. In: Proceedings of 34th International Conference on Software Engineering, Zurich, 2012. 25–35
Lotufo R, Czarnecki K. Improving Bug Report Comprehension. Technical Report GSDLAB-TR 2012-09-01, University of Waterloo, 2012
Stamatatos E. A survey of modern authorship attribution methods. J Amer Soc Inf Sci Technol, 2009, 60: 538–556
Stamatatos E, Fakotakis N, Kokkinakis G. Computer-based authorship attribution without lexical measures. Comput Hum, 2001, 35: 193–214
Zheng R, Li J X, Chen H C, et al. A framework for authorship identification of online messages: writing style features and classification techniques. J Amer Soc Inf Sci Technol, 2006, 57: 378–393
Kothari J, Shevertalov M, Stehle E, et al. A probabilistic approach to source code authorship identification. In: Pro-ceedings of the 4th International Conference on Information Technology, Las Vegas, 2007. 243–248
Lange R, Mancoridis S. Using code metric histograms and genetic algorithms to perform author identification for software forensics. In: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, London, 2007. 2082–2089
Shevertalov M, Kothari J, Stehle E, et al. On the use of discretised source code metrics for author identification. In: Proceedings of the 1st International Symposium on Search Based Software Engineering, Windsor, 2009. 69–78
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jiang, H., Zhang, J., Ma, H. et al. Mining authorship characteristics in bug repositories. Sci. China Inf. Sci. 60, 012107 (2017). https://doi.org/10.1007/s11432-014-0372-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-014-0372-y