Abstract
In this paper, we propose a novel multi-document summarization strategy based on Basic Element (BE) vector clustering. In this strategy, sentences are represented by BE vectors instead of word or term vectors before clustering. BE is a head-modifier-relation triple representation of sentence content, and it is more precise to use BE as semantic unit than to use word. The BE-vector clustering is realized by adopting the k-means clustering method, and a novel clustering analysis method is employed to automatically detect the number of clusters, K. The experimental results indicate a superiority of the proposed strategy over the traditional summarization strategy based on word vector clustering. The summaries generated by the proposed strategy achieve a ROUGE-1 score of 0.37291 that is better than those generated by traditional strategy (at 0.36936) on DUC04 task-2.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Dragomir, R., Hongyan, J., Malgorzata, B.: Centroid-Based Summarization of Multiple Documents: Sentence Extraction, Utility-Based Evaluation and User Studies. Information Processing and Management 40, 919–938 (2004)
Hilda, H.: Cross-Document Summarization by Concept Classification. In: Proceedings of the 25th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 121–128. ACM Press, New York (2002)
Mitra, M., Amit, S., Chris, B.: Automatic Text Summarization by Paragraph Extraction. In: ACL/EACL 1997 Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, pp. 31–36 (1997)
Kevin, K., Daniel, M.: Summarization Beyond Sentence Extraction: a Probabilistic Approach to Sentence Compression. Artificial Intelligence 139, 91–107 (2002)
Regina, B., McKeown Kathleen, R., Elhadad, M.: Information Fusion in the Context of Multi-Document Summarization. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pp. 550–557. Association for Computational Linguistics, New Jersey (1999)
Manuel, J.: MAN‘A-LO‘PEZ: Multi-document Summarization: An Added Value to Clustering in Interactive Retrieval. ACM Transactions on Information Systems 22, 215–241 (2004)
Hu, P., He, T., Ji, D., Wang, M.: A Study of Chinese Text Summarization Using Adaptive Clustering of Paragraphs. In: Proceeding of the Fourth International Conference on Computer and Information Technology (CIT 2004), Wuhan, pp. 1159–1164 (2004)
Hovy, E., Lin, C.-Y., Zhou, L., Fukumoto, J.: Basic Elements. Technical Report (2005), http://www.isi.edu/~cyl/BE/index.html
Lin, D.: Minipar (1998), http://www.cs.ualberta.ca/~lindek/minipar.htm
Baeza Yates, R., Ribeiro Neto, B.: Modern Information Retrieval, pp. 27–30. Addison Wesley, New York (1999)
Pantel, P., Lin, D.: Document Clustering with Committees. In: Proceedings of ACM, SIGIR 2002, pp. 199–206. ACM, New York (2002)
Webb, A.R.: Statistical Pattern Recognition, 2nd edn., pp. 376–379. John Wiley & Sons, Chichester (2002)
Paul, O., James, Y.: An Introduction to DUC-2004. In: Proceedings of the 4th Document Understanding Conference, DUC 2004 (2004)
Lin, C.-Y., Hovy, E.: Automatic Evaluation of Summaries Using N-gram Co-Occurrence Statistics. In: Proceedings of the Human Technology Conference (HLTNAACL 2003), Edmonton, Canada (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, D., He, Y., Ji, D., Yang, H. (2006). Multi-document Summarization Based on BE-Vector Clustering. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2006. Lecture Notes in Computer Science, vol 3878. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11671299_49
Download citation
DOI: https://doi.org/10.1007/11671299_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32205-4
Online ISBN: 978-3-540-32206-1
eBook Packages: Computer ScienceComputer Science (R0)