Abstract
In this paper we address the automatic summarization task. Recent research works on extractive-summary generation employ some heuristics, but few works indicate how to select the relevant features. We will present a summarization procedure based on the application of trainable Machine Learning algorithms which employs a set of features extracted directly from the original text. These features are of two kinds: statistical - based on the frequency of some elements in the text; and linguistic - extracted from a simplified argumentative structure of the text. We also present some computational results obtained with the application of our summarizer to some well known text databases, and we compare these results to some baseline summarization procedures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Barzilay, R.; Elhadad, M. Using Lexical Chains for Text Summarization. In Mani, I.; Maybury, M. T. (eds.). In Proceedings of the ACL/EACL-97 Workshop on Intelligent Scalable Text Summarization, Association of Computional Linguistics (1997)
Brandow, R.; Mitze, K., Rau, L. Automatic condensation of electronic publications by sentence selection. Information Processing and Management 31(5) (1994) 675–685
Brill, E. A simple rule-based part-of-speech tagger. In Proceedings of the Third Conference on Applied Comp. Linguistics. Assoc. for Computational Linguistics (1992)
Carbonell, J. G.; Goldstein, J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of SIGIR-98 (1998)
Edmundson, H. P. New methods in automatic extracting. Journal of the Association for Computing Machinery 16(2) (1969) 264–285
Harman, D. Data Preparation. In Merchant, R. (ed.). The Proceedings of the TIPSTER Text Program Phase I. Morgan Kaufmann Publishing Co. (1994)
Kupiec, J.; Pedersen, J. O.; Chen, F. A trainable document summarizer. In Proceedings of the 18th ACM-SIGIR Conference, Association of Computing Machinery (1995) 68–73
Larocca Neto, J.; Santos, A. D.; Kaestner, CA.; Freitas, A.A.. Document clustering and text summarization. Proc. of 4th Int. Conf. Practical Applications of Knowledge Discovery and Data Mining (PADD-2000) London: The Practical Application Company (2000) 41–55
Luhn, H. The automatic creation of literature abstracts. IBM Journal of Research and Development 2(92) (1958) 159–165
Mani, I.; House, D.; Klein, G.; Hirschman, L.; Obrsl, L.; Firmin, T.; Chrzanowski, M.; Sundheim, B. The TIPSTER SUMMAC Text Summarization Evaluation. MITRE Technical Report MTR 98W0000138. The MITRE Corporation (1998)
Mani, I.; Bloedorn, E. Machine Learning of Generic and User-Focused Summarization. In Proceedings of the Fifteenth National Conference on AI (AAAI-98) (1998) 821–826
Mani, I. Automatic Summarization. J. Benjamins Publ. Co. Amsterdam Philadelphia (2001)
Marcu, D. Discourse trees are good indicators of importance in text. In Mani., I.; Maybury, M. (eds.). Adv. in Automatic Text Summarization. The MIT Press (1999) 123–136
Mitchell, T. Machine Learning. McGraw-Hill (1997)
Mitra, M.; Singhal, A.; Buckley, C. Automatic text summarization by paragraph extraction. In Proceedings of the ACL’97VEACL’97 Workshop on Intelligent Scalable Text Summarization. Madrid (1997)
Nevill-Manning, C. G.; Witten, I. H. Paynter, G. W. et al. KEA: Practical Automatic Keyphrase Extraction. ACMDL 1999 (1999) 254–255
Porter, M.F. An algorithm for suffix stripping. Program 14, 130–137. 1980. Reprinted in: Sparck-Jones, K.; Willet, P. (eds.) Readings in Information Retrieval. Morgan Kaufmann (1997) 313-316
Quinlan, J. C4.5: Programs for Machine Learning. Morgan Kaufmann SaoMateo California (1992)
Rath, G. J.; Resnick A.; Sawage R. The formation of abstracts by the selection of sentences. American Documentation 12(2) (1961) 139–141
Saltón, G.; Buckley, C. Term-weighting approaches in automatic text retrieval. Information Processing and Management 24, 513–523. 1988. Reprinted in: Sparck-Jones, K.; Willet, P. (eds.) Readings in Retrieval. Morgan Kaufmann (1997) 323-328
Sparck-Jones, K. Automatic summarizing: factors and directions. In Mani, I.; Maybury, M. Advances in Automatic Text Summarization. The MIT Press (1999) 1–12
Strzalkowski, T.; Stein, G.; Wang, J.; Wise, B. A Robust Practical Text Summarizer. In Mani, I.; Maybury, M. (eds.), Adv. in Autom. Text Summarization. The MIT Press (1999)
Teufel, S.; Moens, M. Argumentative classification of extracted sentences as a first step towards flexible abstracting. In Mani, I.; Maybury M. (eds.). Advances in automatic text summarization. The MIT Press (1999)
Yaari, Y. Segmentation of Expository Texts by Hierarchical Agglomerative Clustering. Technical Report, Bar-Ilan University Israel (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Neto, J.L., Freitas, A.A., Kaestner, C.A.A. (2002). Automatic Text Summarization Using a Machine Learning Approach. In: Bittencourt, G., Ramalho, G.L. (eds) Advances in Artificial Intelligence. SBIA 2002. Lecture Notes in Computer Science(), vol 2507. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36127-8_20
Download citation
DOI: https://doi.org/10.1007/3-540-36127-8_20
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00124-9
Online ISBN: 978-3-540-36127-5
eBook Packages: Springer Book Archive